AI Got Cheap. Now What?

Cheaper inference is rapidly increasing the economic viability of AI agents. Those who recognize this will have a huge advantage over competitors who still treat LLMs like a chatbot.

In the last few weeks, Alibaba dropped Qwen 3.5 and claimed 60% cheaper than the last version. MiniMax said its newest model can run continuously for roughly a dollar an hour. OpenAI's pricing page now reads like a menu designed to get you to automate more.

To put some numbers on it: GPT-5 mini is $0.25 per million input tokens and $2.00 per million output tokens, with cached inputs dropping to $0.025. Qwen 3.5-397B comes in at $0.60 and $3.60. MiniMax M2.5-Lightning sits at $0.30 and $2.40, and they frame it as about a dollar an hour at high throughput. We're at the point where you can run a serious agent workflow for single-digit dollars a day. That changes what people will actually build, and lead to significantly higher API consumption across the board.

Three things are driving the drop. Architecture is getting smarter about compute. Qwen 3.5 has 397 billion parameters total but only activates 17 billion per token using mixture-of-experts. Big tech is also flooding the zone with infrastructure. The major US cloud and AI companies are guiding toward something like $600 to $650 billion in combined capex for 2026, with Alphabet alone flagging $175 to $185 billion. That much supply pushes unit costs down fast. And at the chip level, hardware deals are locking in scale. NVIDIA just signed a multiyear deal to supply Meta with millions of AI chips.

This matters because agents aren't priced like chatbots. They burn tokens on planning, reading long documents, calling tools, retrying when things break, and logging everything. The price drop is the difference between running something for a demo and running it 24/7 on every support ticket. Once costs are this low, the differentiator stops being which model you pick. It becomes whether you've actually wired AI into your workflows with real monitoring, permissions, and feedback loops.

There are two risks worth watching. First, security gets harder as agents spread. The OpenClaw saga was a preview. Rapid adoption of autonomous tooling led to bans at some companies because nobody had thought through permissions and data exposure. More agents running means more surface area for things to go wrong, and cheap can mask waste. Second, an NBER working paper found that about 70% of firms say they use AI, but over 80% report zero impact on employment or productivity. Spending $300 on something useless still wastes time, creates rework, and builds bad habits. Cheap is not a strategy.

But, cheap opens up new strategies. I've found that cheaper tokens means I can shamelessly ask an agent to iterate unsupervised for longer stretches of time, usually with better results. To take a real-world example: I'm currently building an n8n workflow that reviews hundreds of articles per week and identifies those that are of particular interest to my client. Data points from those articles are extracted in JSON and uploaded to the client's frontend for users to see. There are many ways this pipeline can go wrong, and the workflow quickly became dense and difficult to debug.

So, I had my coding agent write its own testing scripts for the workflow, refactor the workflow as needed, and run the tests itself programmatically to examine how it responded to changes. The result was that I got several days' worth of debugging and development done in about 30 minutes, but it used several million tokens in total. The $20 of tokens used in this feedback loop might seem expensive for a single run, but the workflow itself is set to save one of the client's analysts 5-10 hours per week.

That's the math that matters now. Not whether AI is impressive, but whether the cost of letting it run is obviously less than the cost of doing the work yourself. For a growing number of tasks, the answer is yes, and it wasn't six months ago. The companies and consultants that figure this out first won't just save money. They'll operate at a speed that makes everyone else look like they're still doing things by hand.

If you found this newsletter useful, forward it to someone who would too. Click here to learn how Applied.AI can help your business make AI work for you.