Finding balance in the era of tokenmaxxing

by Evan Conaway

June 2, 2026

Posted in: AI Metrics Productivity Adoption Culture

Introduction

Some of the world’s most competitive tech companies are now using internal leaderboards to track (and reward!) the highest consumers of AI tokens. It has become known as “tokenmaxxing,” and proponents of this new practice that emerged in early 2026 view it as a necessary material push to shift not only individual developers’ mindsets but also broader organizational culture toward more AI usage. This new wave of incentivizing raw AI usage to spur adoption is partly fueled by a persistent cultural challenge: a significant segment of developers remain skeptical of AI’s output and resist consistent usage of AI. In fact, DORA’s State of AI-assisted Software Development report found that ~30% of developers trust AI outputs “a little” or “not at all.”

Some tech leaders argue that the ultimate goal of token leaderboards is to force developers to shift their own mental models to ensure a company’s survival in the current fiercely competitive and rapidly changing market.¹ In other words, the message is: if you aren’t burning tokens, you aren’t evolving.

At DORA, we’ve spent over a decade studying what makes software teams successful. When we look at the trend of tokenmaxxing through the lens of our own research, we see a familiar pattern. Whether it’s lines of code or frequency of commits, the history of development is littered with these so-called “vanity metrics” that have promised a shortcut to measuring, understanding, and optimizing productivity.^2,3 But measuring software development is more complex than a single metric, especially one like token consumption that prioritizes quantifying activity over examining actual outcomes. Activity metrics alone do not provide enough context to judge performance, and trying to capture productivity with a single metric is a mistake we have learned from in the past.⁴

As we all navigate the noise of the AI transition, we think it is too early for us to be spending all that much energy (and tokens) on hyper-efficiency and over-optimization. We recommend the industry move past the “maxxing” rhetoric and focus on what actually drives improvements in performance and outcomes: balance and team-wide capability.

The (potential) pros and the (very real) cons

The clearest benefit of tokenmaxxing is its ability to trigger a foundational shift in how employees approach their work. Through a sort of gamification of the learning curve, the token leaderboard does provide a relatively low-stakes way to encourage exploration in “AI-hesitant” developers and to overcome the cultural inertia that can happen when developers are not trying out new tools. Adoption does not guarantee impact, but teams cannot achieve impact without baseline adoption.² If a company needs to force a mindset shift quickly, tracking usage is one way to motivate developers to actively (and perhaps “aggressively”) experiment with new tools.⁵

Beyond simply forcing that baseline adoption, tracking token usage across individuals can also serve as a highly effective diagnostic tool for identifying power users of AI. When you identify the developers consuming the most tokens, you may find the individuals who have developed unique, high-leverage workflows or highly effective prompting strategies. By transforming these top consumers into “AI champions”, leadership can tap them to mentor colleagues and distribute their AI literacy across the rest of the team. In this light, tracking token spend becomes a valuable tool for finding pioneers who can guide their peers toward thoughtful experimentation and team empowerment.

However, the most immediate danger of tokenmaxxing is the rise of “productivity theater”.⁶ Because raw token count is purely a measure of input, it is incredibly easy to game, making it a vanity metric that is essentially just for show. Token leaderboards can therefore undermine the productivity they seek to boost. We have seen reports of developers running autonomous agents on nonsense projects simply to calibrate their spend and stay above the company average, or climb their way to the top of the charts.⁷ This is a classic manifestation of Goodhart’s Law: “when a metric becomes a target, it ceases to be a good measure.” More specifically in this scenario, when a token metric becomes a target for performance, it ceases to be a reliable measure of actual work, as tokenmaxxers can win out by gaming the system.

Beyond performative usage, there is a financial and technical cost. We agree with Adnan Masood’s assertion that high token spend yields diminishing returns.⁸ The teams with the highest budgets often achieve only a marginal increase in software delivery throughput at a 10x increase in cost, not to mention the additional consequences of low ROI and infrastructure outages. In addition to being a burden on budgets, Masood also makes note that this unchecked consumption is a generator of immense technical debt. When developers are incentivized to maximize output, it is likely they could become more willing to accept low-quality, AI-generated code without rigorous review.

The impacts are not just technical—they’re cultural. Leaderboards can backfire, fostering a culture of job insecurity.⁷ When employees fear that low token metrics could negatively impact performance reviews or lead to being replaced by AI, it creates significant social pressure, stress, and burnout. Over-rewarding performance on leaderboards can also incentivize knowledge hoarding; instead of collaborating, we have heard reports of developers withholding their most effective AI use cases, strategies, and prompts to out-compete colleagues.⁹ Ultimately, tokenmaxxing rewards the loudest, most expensive activities and can lead to negative social consequences for the organization while also drowning out the quieter, high-precision engineering that actually drives a healthy software delivery pipeline.

While tokenmaxxing and token leaderboards certainly have their potential benefits, especially for spurring AI adoption and experimentation, we believe the (very real) cons largely outweigh the (potential) pros, especially if AI usage metrics are used to motivate inefficient token usage. This has unfortunate consequences, like wasted resources, accumulation of technical debt, and developer burnout.

Alternatives to tokenmaxxing

At its core, the tokenmaxxing debate is really a debate about measurability. When an industry struggles to quantify the value of a new paradigm, it falls back on the easiest thing to count. But “maxxing” of any kind is fundamentally the wrong mindset for this moment in software development because it inherently eschews balance. Instead of obsessing over maximizing a single, gameable input, we need to move the conversation back toward thinking holistically about how we measure outcomes with AI.

Fortunately, we are already seeing some engineering organizations realize this and pivot away from raw consumption metrics. Salesforce, for instance, has introduced the concept of Agentic Work Units (AWUs), a framework designed to translate raw AI inputs into concrete, completed work outputs. By measuring AWUs, they demonstrate one way of looking at the actual impact delivered, completely divorced from the compute used to get there.¹⁰

Other companies are focusing on healthier strategies for safely encouraging adoption. Shopify recently took a more pragmatic route: they ditched the concept of an AI “leaderboard,” replacing it with a “usage dashboard” to reduce unnecessary competition and promote responsible AI leveraging toward better outcomes. To protect their systems and their budgets, they implemented “circuit breakers” to catch runaway autonomous agents and flag abnormal daily spend spikes. These approaches encourage exploration while installing guardrails against performative waste.⁷ By not totally throwing out the idea of making AI usage metrics public within a company, this also shows that there is still value to be found in some kind of leaderboard or dashboard, as long as it is implemented and socialized responsibly. At DORA, we believe company-wide aggregates are meaningless, and individual scores are harmful. So consider: instead of measuring tokens across an entire company or comparing individuals, what might be the impacts of measuring token usage by team, by application, or by use case?

For teams looking to move from just adopting and using AI to really finding success with it, the DORA AI Capabilities Model provides data-backed guidance on the specific technical and cultural practices that substantially amplify the benefits of AI. By putting these capabilities (for example, clarifying and socializing your AI policies, prioritizing user centricity, investing in your internal platform) into practice, companies can set the right organizational conditions for AI-assisted developers to thrive and start delivering meaningful outcomes, rather than just raw adoption and wasteful usage.

Ultimately, the ideal state of AI adoption isn’t a few isolated tokenmaxxers dominating symbolic leaderboards, while their peers appear to fall behind. We need to start treating AI adoption and experimentation as a team activity, rather than an individual sport. When we align incentives around measurable teamwide outcomes, we encourage developers to share their most effective prompts and workflows, and promote knowledge sharing and collaboration. We should be pushing for a more even, balanced distribution of tokens, as well as changes to AI policy and practice at the organizational and team levels.

Conclusion: Find balance with DORA; leave “maxxing” behind

This brings us back to what we do at DORA. You don’t need a new, highly gameable metric to understand if generative AI is making your team better; you just need to look at your existing outcome metrics.

Take the data platform Starburst, for example: they ignore token maximization entirely, opting instead to simply measure their AI success against standard DORA metrics, developer velocity, and incident resolution times.¹¹ By pairing core DORA metrics with specific efficiency and risk indicators (think cost per accepted change and code rework rates) organizations can get a true, accurate picture of AI’s impact on their software delivery pipeline.⁸

In addition to taking a closer look at measuring the impact and efficiency gains of AI, engineering leadership might benefit from some introspection about the best ways to motivate AI usage. Are we trying to build a culture of forced, raw adoption, or are we trying to build a culture of testing, learning, and innovation?

Be careful what you incentivize, you just might get it.

Gamifying token consumption might give you a temporary spike in visible activity, but it comes at the cost of technical debt and developer burnout. True, sustainable performance comes from empowering teams to collaborate, measuring holistic outcomes, and finding balance in how we build. It’s time to leave “maxxing” behind.

References (all accessed between May 4 and June 1, 2026)

^{1. Bousquette, Isabelle. “Why Some Companies Say AI ‘Tokenmaxxing’ Is Key to Survival.” The Wall Street Journal, 14 April 2026.↩}
^{2. Carey, Scott. “#3 LeadDev’s The Shift: Tokenmaxxing is the new lines of code.” Substack, 15 May 2026.↩}
^{3. Fernholz, Tim. “‘Tokenmaxxing’ is making developers less productive than they think.” TechCrunch, 17 April 2026.↩}
^{4. Forsgren, Nicole, et al. “The SPACE of Developer Productivity.” ACM Queue, 6 March 2021.↩}
^{5. Whittemore, Nathaniel. “In Defense of Tokenmaxxing.” The AI Daily Brief; Artificial Intelligence News and Analysis, 13 May 2026.↩}
^{6. Reis, Joe. “Why Tokenmaxxing is For Fools. A Rant on Fake Productivity.” Substack, 2 May 2026.↩}
^{7. Orosz, Gergely. “The Pulse: ‘Tokenmaxxing’ as a weird new trend.” The Pragmatic Engineer, 23 April 2026.↩}
^{8. Masood, Adnan. “Tokenmaxxing: The Productivity Paradox of Generative AI Consumption.” Medium, 20 April 2026.↩}
^{9. Chakrabarti, Meghna. “Why the tech world is ‘tokenmaxxing’.” On Point podcast, 28 April 2026.↩}
^{10. Mills, Madison. “Exclusive: Salesforce takes on “tokenmaxxing”.” Axios, 15 April 2026.↩}
^{11. Steinschaden, Jakob. “Tokenmaxxing: Is AI Token Consumption a Productivity Metric or Vanity Trap?.” Trending Topic EU, 23 April 2026.↩}

Last updated: June 2, 2026