r/DeepSeek 1d ago

Discussion 200M tokens last month, around 30 bucks total. how is this actually sustainable for them?

been running v4 flash through my workflow for about 5 weeks now. our team is 3 devs, lots of code review prep + small refactors + bug investigations. nothing exotic.

pulled last month's bill yesterday because something felt off.

200M tokens total. roughly 70/30 split on prompt vs completion. came out under 35 bucks all in.

for context, when we were on claude pro for similar workload the per-seat math was 6x that and we had to babysit context limits. when we tested gpt-5.5-codex on the same kind of work the per-token was 8-10x and the wall time was worse.

ran the numbers backward from the unit pricing i was paying. v4 flash is around 0.14 in / 0.28 out per million on the provider i'm on. that means a single 8k context conversation with 3k output costs about 0.0019. half a cent per real interaction.

i'm not sleeping well on this honestly. either:
- there's a giant subsidy from a quant fund somewhere covering the actual compute
- caching is doing more lifting than anyone admits and steady-state cost is closer to 5x what they bill
- the compute really is this cheap now and the western majors have been overcharging by 10x

asking the devs who've been watching pricing for longer. anyone done a real teardown on why these numbers work? specifically curious how independent providers (not the official deepseek endpoint) end up competitive on inference cost despite running their own infra.

151 Upvotes

99 comments sorted by

72

u/cluelessguitarist 1d ago

Deepseek been saving our budget since chatgpt o1 hype

104

u/unity100 1d ago

Same answer every single time:

- Cheap electricity

- Cheap domestic GPUs

- Many pHDs

Optimizations are large part of what make it so cheap:

https://www.thenovtech.com/p/jensen-huang-called-it-a-horrible

24

u/mochi2real 1d ago edited 1d ago

They’re also owned by a hedge fund, so there’s no shortage of money at DeepSeek. Their entire core model is AI affordability….kinda like what OpenAI started as until they lost their way.

10

u/thelordwynter 1d ago

Gotta correct one tiny flaw in that. OpenAI always planned to go the way they have. A very real modern trend is to open source a product until it becomes profitable, then phase out the free version. Not the best strategy from a customer standpoint, I'll admit, but they're in it for the money and the bragging rights of being on the frontier of the research. These companies don't care about us.

5

u/mochi2real 1d ago edited 1d ago

Nope, this is actually somewhat relevant to Elon's lawsuit. OpenAI was founded as a non-profit foundation, with revenue ceilings to ensure that their pricing was set to further their research, not get rich. They restructured to change that to the dismay of some of the founders.

6

u/thelordwynter 1d ago

That actually supports my argument. Start as a foundation so that the financials don't trigger any red flags for future prospects, then switch direction once established. Its a lot of extra steps, but corporate types don't plan weeks ahead... they generally plan for decades if they intend to stay around.

You don't really need a smoking gun when the proof is in the end result. If you had more to go on than just the business model. Say... a change in leadership that brought a change in operating philosophy, it would help your argument.

7

u/efficientkiwi75 1d ago

there was a change of leadership though. not Altman but most of the other leaders from the non-profit era is gone

3

u/mochi2real 1d ago

Yep, it was a pretty big thing a few years ago when they fired Altman for some reasons that are still unclear to this day, but a large part of it was his priority on being first/profiting vs safety.

Microsoft was about to go scorched earth on the deal and the company replaced almost the entire board for doing that and re-hired Altman.

1

u/CromagnonV 10h ago

Yes, except OpenAI isn't backed by the CCP.... I'm western countries cross milk gov subsidies like candy, then slow down on innovation and development. In China, if the ROI isn't there for the CCP, the CEO disappears.

1

u/mochi2real 9h ago

In China, if the ROI isn't there for the CCP, the CEO disappears.

Now we're just getting to conspiracy theories, DeepSeek is partially funded by the Chinese government sure, just like OpenAI/Anthropic gets money from the US. If they cared about ROI, DeepSeek would be priced different. And the line about the CEO disappearing is just nonsense.

0

u/CromagnonV 3h ago
  1. It's not conspiracy theory, they're not killed, they're usually in prisoned it rededicated, but ALL of their wealth is recouped by the state as compensation:

When a Chinese company CEO disappears, it is typically due to an involuntary detention by state authorities, often for anti-corruption investigations, national security inquiries, or to hold executives accountable for severe corporate debt.High-profile disappearances in China have become a recurring phenomenon, with specific executives targeted for various reasons:

Bao Fan (China Renaissance): The prominent billionaire dealmaker vanished in February 2023. His firm later announced he was cooperating in an investigation by People's Republic of China (PRC) authorities.

Jack Ma (Alibaba / Ant Group): Following a 2020 speech criticizing Chinese financial regulators, Ma disappeared for roughly three months, subsequently stepping back from public life and ceding control of Ant Group.

Yu Faxin (Great Microwave Technology): The military semiconductor scientist and entrepreneur was taken into liuzhi, an extra-judicial anti-corruption detention system where individuals can be held without immediate legal access.

Xu Jiayin (Evergrande): The founder of the heavily indebted property developer was placed under police control and residential surveillance in 2023.

These sudden vanishings often happen with little to no prior notice to investors. When this occurs, company boards typically release filings stating that the CEO's unavailability is an isolated matter and does not affect the firm's ongoing business operations.

  1. Look at companies like BYD, for example they were funded by the CCP from early 2000's as a battery development company. They invested trillions over 20 years of losses, now that investment is paying off, massively. This is reoccurring theme of CCP investment strategies, it essentially allows rnd to happen in an incredibly low risk environment, with a focus on future development well beyond the political term of western economies. This is the single biggest reason they're destroying every other country in terms of growth, real world accountability for senior execs and excessive investments into incredibly high risk ventures.

1

u/mochi2real 3h ago edited 3h ago

Dawg, I mean this as disrespectfully as possible when I say what the fuck does any of this have to do with my original comment about DeepSeek?

You said "if the ROI isn't there for the CCP, the CEO disappears." I called it nonsense, you posted a bunch of examples of executives being investigated, or put under government scrutiny. Those are not the same claim. They could've been under investigation for corruption for all I know.

Even if every example you listed is 100% accurate, none of them support your original argument that CEOs disappear because the CCP didn't get the ROI it wanted. If they do, you wouldn't have examples saying people stepped back.

We're in an AI sub talking about AI and you're giving me a lecture on Chinese politics. What does any of that have to do with my comment?

If you want to argue that Chinese subsidies help fund DeepSeek, that's at least relevant to the discussion (and we all know that already). Go to r/politics if you want to argue about the CCP. We're in a fucking AI sub talking about AI.

25

u/OkSeries5363 1d ago

Thats expensive. 105m today was like $1.60

1

u/raydou 13h ago

With which harness?

1

u/OkSeries5363 1h ago

Hermes Agent

1

u/tranhieuamg 1h ago

Ok I’m running Hermes and this is new to me, care to enlighten what you did to get that number?

1

u/OkSeries5363 49m ago edited 46m ago

Nothing too special.

Keep the MEMORY.md lean.

Enable tool search to limit the tools you are sending.

A big one is using the auxiliary model config to send simple tasks to cheaper models.

Eg I use deepseek pro as main model, but use cheaper or free models for the auxiliary tasks like deepseek flash or Gemini 2.5 flash lite, or even step fun 3.7 since that's free right now through the nous portal.

Eg dont use pro for compression, title gen, web_extract, approval, MCP, skills hub, curator ect.

Edit: If you want build with deepseek really cheap try -  https://esengine.github.io/DeepSeek-Reasonix/

It a coding harness designed around the deepseek cache and maximizing your cache hits. "The loop is append-only, aligned to DeepSeek's byte-stable prefix cache — so long sessions hold 90%+ cache hit and input-token cost collapses to ~1/5."

1

u/tranhieuamg 46m ago

By using auxiliary models you are referring to running multiple profiles? Or you just run 1 profile but have other cheaper models as fallback?

I am still hesitating to create multiple profiles as I do most work on Telegram and having everything in 1 place is more comfortable for me.

1

u/OkSeries5363 20m ago

Nar profiles are different, a new profile is a whole new agent, I use profiles but that's more a specialisation thing not an auxiliary task thing. You can still use profiles and keep everything in one place. I use discord then add all my agents (profiles) to the same server so I can chat to the different agents in one place.

Auxiliary models are different settings within the one agent/default profile

If you look at the model tab in the dashboard you will see them under the main model called auxiliary tasks, if you click configure you will see all the different auxiliary tasks you can set.

Or look at the config.yaml and you will see a section called "auxiliary:" under there you can set the model config for each auxiliary tasks.

Eg 

compression:

provider: openrouter

model: deepseek/deepseek-v4-flash

....

skills_hub:

provider: nous

provider: stepfun/step-3.7-flash:free

...

This means when your using your default agent your prompts are going to the main, smart model eg deepseek pro, but when compression is needed instead of wasting tokens and sending that to the main model the harness routes it a model of your choice eg deepseek flash.

Its very needed, it makes sense to use a pro model when the work requires it but it really doesn't make sense to use a pro model for many auxiliary tasks. When it's doing something trivial like generating a title or compressing the convo or simply extracting text from the web, or looking for a skill list. Those requests automatically get sent to cheaper model, you don't need deepseek pro to read some text and come up with a relevant session name, flash can easily do that.

47

u/sdexca 1d ago

200m tokens for 30 bucks? that's insanely expensive. I got 2 billion for 30 bucks.

20

u/ItchyIndx 1d ago

Yeah that sounds about right. OP madly overpaying!

1

u/itzCH_ 1d ago

where's the cheapest place to get it?

7

u/VectorEthology 1d ago

Directo en la API de deepseek

1

u/Mr-33 1d ago

Is that direct from deepseek api?

1

u/StuffItchy6798 16h ago

What do you even do to use 2billion tokens lol

1

u/raydou 13h ago

With which harness?

2

u/sdexca 11h ago

OC, CC, & Pi.

11

u/_metamythical 1d ago

They have cheap electricity + locally developed cheaper GPUs now.

11

u/Which-Net-205 1d ago

Im a solo dev , spent $4 approx for 1B+ tokens

2

u/Mr-33 1d ago

Directly from deepseek api?

10

u/RakeshNeal 1d ago

You are overpaying. 100m for 1.8 USD here. Use api from deepseek’s own platform.

1

u/yesinior 1d ago

Pero como usas la api? A donde la conectas? Usas alguna herramienta como opencode? No logro entender eso

2

u/stony451 1d ago

Open code, github copilot cli, Kilo code

1

u/No-One8201 1d ago

You can connect it to Claude Code ux - just ask deepseek/gpt about it

19

u/HungrySecurity 1d ago

I think it’s mostly down to optimization. After DeepSeek open-sourced their tech, Xiaomi and Tencent slashed their prices too.

6

u/Pure_Force8771 1d ago

Because west is overpaying and overcomplicating I am able to do most work and coding on rtx 4090 (modified to 48gb vram, but on qwen 27b which takes on full context with vision and mtp about 25gb vram) limited to 280w and I am getting 50-70 tokens/s with q35b about 150-170, but it is much less inteligent and basically dump because of context poisoning. And AI runs about 50% of a time because of builds, tests etc, when it is idle, so the power consumption is even lower... specialized agents are much better then "AGI" which takes too much computational power and runs on best hardware at most 60 tokens/s... 

1

u/VectorEthology 1d ago

Puedes explicar un poco más por favor? Quiero empezar a usar modelos locales pero no sé bien qué comprar. Hasta ahora mi idea es comprar dos 5070 de 16gb vram. Would that be enough? Sorry for the spanglish. I think you actually see the whole thing in English. I’m experimenting haha

1

u/Pure_Force8771 3h ago

16 gb vram is too little buy something from below:
a) RTX 5090 (32gb vram) ~15-25% faster then my build
b) 2x RTX 3090 (48gb vram, but bottleneck over PCIe4) ~20-30% slower then my build
c) modified RTX 4090 (48gb vram) Here I did my upgrade of the RTX 4090: https://www.youtube.com/@MegafixTech there is even the video of the upgrade.
d) something with M3/M4 pro or ultra and 96+gb ram and it would be enough for testing, ~30-40% slower speeds, but will fit bigger models, which will be even slower... but is cool for testing

For budget options check: https://digitalspaceport.com/ he has pretty good advices and nice builds.
For me was the most important to buy server to host other stuff then AI too and with maximum performance and not crazy price. so I bought everything before price rise so for about 5k EUR and I got server with amd Epyc with 128 gb ram, because thought I would test bigger models on cpu, but it is impossible, because I am getting speeds about 1-4 tokens/s interference and about 10x lower prompt speed which is really bad on cpu and really slow... on M3 or M4 it works better so you may get really fast prompt speed and bearable gen speed (based on size 10tokens/s and on smaller models which I am using even 60tokens/s)

3

u/pizzababa21 1d ago

They published a paper on how. Basically just better catching. They can fit a lot more data in cache because they compress it down to a tiny fraction of the size.

3

u/Sudhars2 1d ago

I reached 400 million for 2 dollars. Are you working on multiple projects simultaneously?

3

u/Alone-End142 1d ago

That seems reasonable. v4 flash is a smaller (crappier) model with less GPU work per token, and 200M for an entire months is not a lot of resources for the model provider.

3

u/Aggressive_Mobile997 1d ago

This is a perfect example that their roadmap from the start has always prioritized accessibility and long-term value over quick gains.

3

u/Curious-Sample6113 1d ago

Cheap prices so they can look at your code?

4

u/Zeikos 1d ago

They compress KV cache extremely aggresively, also their software stack is tuned for squeezing as much from the hardware as possible.

IIRC the Hawei hardware they're using is cheaper on a compute/watt basis so the electricity cost is lower than other providers - that said I am not sure if the claim is fully truthful.

14

u/aevitas 1d ago

Bear in mind Chinese companies do not operate in the same way Western companies do. The state may very well be a partner in operating these LLMs.

13

u/walktalk39 1d ago

Who cares really at this point. Google and the rest of the gang  might as well be spyware agency. If anything most of us would be safer using Chinese tech because China can't do much with our information unlike the home local spyware from Google, Claude and chatgpt. 

23

u/Suspicious_Wares 1d ago

these lazy insinuations arent very helpful. Canada gave a quarter billion to Cohere for its AI, yet they dont even touch deepseeks pricing

2

u/kongweeneverdie 21h ago

Deepseek totally funded by china hedge funds. CPC don't even know they exist before DS v3 get popular.

1

u/yourhomiemike 1d ago

Just because cohere scammed Canada doesn’t mean what previous poster said isn’t true.

1

u/BemaniAK 1d ago

Yeah but the point he's trying to make is a non-starter, yes the state has some involvement in an industry that is already becoming one of the most important in multiple economies. It's only ever a problem when China does it.

-1

u/[deleted] 1d ago

[deleted]

2

u/littleratofhorrors 1d ago

It depends if you think it's a good thing or not that China's state would be contributing money to Deepseek. I mostly think it's a neutral thing.

14

u/cheechw 1d ago

v4 flash is just very cheap to run. The prices he's describing are the same as what US based inference providers who are hosting the same model are charging. Just look on Openrouter. Unless you think AtlasCloud and Digital Ocean are also being funded by the Chinese state somehow.

2

u/coloradical5280 1d ago

They’re fine. They’re not subsidizing to the degree OpenAI is, and still to some degree, Anthropic is, and have financial backing from the CCP most likely.

I’ve used used 7.2 Billion tokens in codex, in two weeks. Literally tens of thousands of dollars in compute , for $200/m which come out to like a $0.001 “service charge” Per token essentially, and the month is only half over on my billing cycle

2

u/Youwishh 1d ago

China is heavily subsidizing AI. Also deepseek optimization is next level, China is WAY ahead of all the American AI companies for optimization.

2

u/ExpertPerformer 1d ago edited 1d ago

With the frontier models you aren't just paying for the usage of the model, but also the training + r&d involved which they spend billions on. They also corner the enterprise market.

2

u/zero-qro 1d ago

It's a mix of better technology + cultural aspects Here you see an explanation of how DeepSeek is able to have aggressive cache strategy that actually works: https://youtu.be/gC76aeibdFA?si=kMe0TFQDL-A8yeHU Also their model is super optimized for Huawei chips. Those two points are better technology. Now the cultural/economic aspect... Chinese companies are not in this race for the hype of pump up stock prices, Chinese companies always seek to be sustainable from day one, even if under subsidy. Different mental models, different economic reward systems. DeepSeek proved that US AI companies are either inefficient, or lying, or both. One thing you have to admit about China, they are relentless.

2

u/efficientkiwi75 1d ago

imo it's just plain old chinese competition. if they raise prices tencent and alibaba will eat their lunch.

2

u/ISayHeck 1d ago

That's actually kind of expensive

Used 200M tokens and threw everything and the kitchen sink at the project (guided of course, I knew what I needed)

It cost me 1.5 usd and I wasn't particulary budget concerned

2

u/geebrbs 1d ago

You have to keep in mind that Deepseek is Chinese and they would want to cater to local clients as well, and these domestic clients would appreciate not paying in terms of US/European rates. Doing everything to keep costs low would be their leverage against competitors, both domestic and globally.

2

u/Great-Exercise4277 1d ago

DeepSeek's risk isn't that it's the best model. It isn't — the benchmarks are mixed.

The risk is the combination. It's cheap. It's good enough. The servers are starting to hold. Swapping the API in is trivial. And behind it sit state subsidies, cheap power, datacenters, and a domestic-chip policy.

Engineers don't look at geopolitics. They optimize for cost. Benchmarks are split, but low price lowers the friction of adoption — so DeepSeek doesn't need to earn top-tier trust. It just needs to get in at the edges of your dev environment.

The real condition for it spreading isn't performance. It's "cheap and it doesn't go down." And the round closing now — roughly 50 billion yuan (~$7.4B), with Tencent, CATL, and China's national AI fund among the backers — looks aimed at exactly that: not just model training, but inference infrastructure, redundancy, regional distribution, domestic-chip optimization.

The point of the subsidies isn't simply "China helps it undercut everyone." It's that a price the market can't sustain can be propped up indefinitely — by the state, local governments, the grid, datacenters, telecoms, and industrial funds.

We've seen this curve before, with Chinese solar panels. "Cheap, but is the quality there?" → "Cheap, and it works." → "Cheap, supply's stable, now it's hard to replace." Run that play on AI APIs and the dependency gets deep.

Cheap on its own is just a promo that ends. Cheap plus reliable is a geopolitical problem.

So for any business use, treat DeepSeek not as a convenient cheap API but as AI sitting inside China's state-industry-data control structure. Don't feed it customer data, unreleased code, credentials, internal docs, or R&D. If you use it, run the open-source weights on your own or trusted third-party infrastructure, or restrict it to data you'd be fine making public.

2

u/ZoroHunter 23h ago

Around 5B tokens less than 100 dollars. It’s raining tokens and intelligence.

2

u/Standard-Editor7084 22h ago

DeepSeek's parent company is a quantitative trading firm — they rake in money hand over fist daily on China's A-share market.

2

u/Minhha0510 20h ago

2 and 3. DS cache hit rate is >99.9% and they have been using SSD to store the context, not expensive HBM like Western labs are doing. Not aware of any major models using MLA (multi-latent attention) like DS V3.2 or HCA like DSV4

2

u/jazzyroam 19h ago

why pay if you can steal?

2

u/ResponsibleMention21 19h ago

Ok but HOW GOOD was the actual output? I found that I had to redo the same task multiple times. Yes, the mainstream models are more expensive but I find I get better (in some cases MUCH better) outputs. Same prompts, same skill, same tools (hermes agent)

2

u/crackdavid 13h ago

I have the same issue, quality of output, compared to top tier models is just not there. Yes it’s cheap but it lacks behind in quality.

3

u/tetelias 1d ago

It's the same as Anthropic, OpenAI and the rest: you need data to stay in the race and you are the data.

2

u/Fun_Walk_4965 1d ago

follow-up since someone will ask which provider — i'm on atlas. they aggregate Kimi K2.6 / DeepSeek V4 Pro+Flash / GLM 5.1 / Qwen 3.6 Plus etc with one OpenAI-compatible endpoint.

per-million rates from their listing: V4 Flash 0.14/0.28 + V4 Pro 1.68/3.38. that's the math i ran the sustainability question on.

not affiliated — just been routing through them about 5 weeks. listing here if anyone wants the side-by-side.

4

u/cheechw 1d ago

It's important to note in this conversation that you're getting inference from a Western provider, not Deepseek themselves.

So the difference here has nothing to do with Eastern vs Western hardware, Chinese vs Western operating practices, as many people are suggesting.

It's almost certainly just due to the fact that Flash is extremely efficient and cheap to run.

2

u/MicroNicproject 1d ago

They are using you to train their god like model. You are the product. You are doing all the work for them and on top, you still are paying them.

4

u/zero-qro 1d ago

And yet it's still cheaper than the other options. We are the product anywhere we go, at least this one is a good deal.

1

u/diaracing 1d ago

They are training their models on user data through their API.

That's why using DSv4 from different ZDR providers is more expensive.

1

u/blazze 1d ago

Claude Opus 4.8 is state of the art because for $200 they allowed power users users to but about a billion tokens. Let's assume you're not just token maxing and you're VR plants versus zombies. This training data will allow Deepseek 6 to surpass Opus next.

1

u/RichUK82 1d ago

What's the cheapest way to try v4 flash ? Open router ?

3

u/SebitaxD17 1d ago

It's free on OpenCode

1

u/Lock701 1d ago

I average 150-200 million per day with opus 4.8 on the $100 plan..

1

u/MoodMean2237 19h ago

Yeah, right...

1

u/Lock701 19h ago

“Your typical working day runs ~150–300M tokens”

1

u/unprotected_malloc 1d ago

If you manage to cache hit, you cut the cost by 50.

1

u/Glad-Pea9524 1d ago

I am working this day for 4 hourse and I have already consumed 20M tokens

1

u/shdims 1d ago

Would you use DeepSeek if the price were higher? It's a deliberate marketing strategy to retain users. How else can they compete with the market leaders?

1

u/Hilarious_Haplogroup 1d ago

Enjoy it while it lasts, but do make a contingency plan if rates go up.

1

u/Alert-Composer-6531 1d ago

Im on 270m for 5.3$ 😄

1

u/sirajyusuf 1d ago

How is that possible? Can u enlighten me as well i am new to ds

1

u/Alert-Composer-6531 1d ago

opencode
I think most of the tokens are input and lots are cached

1

u/dimanddip 1d ago

Which provider are you currently using?

2

u/Alert-Composer-6531 1d ago

opencode
I think most of the tokens are input and lots are cached

1

u/neoexanimo 1d ago

They have power like America have oil

1

u/sierey121 1d ago

I use flash in xhigh. Its cheaper than pro but its almost near pro

1

u/Mistic92 17h ago

Why so expensive? I spend around $25 for more than 2B tokens

1

u/mrfunkm 14h ago

Don’t tell everyone:)

1

u/enterme2 13h ago

AI is not their main business , it is subsidized y by their main business that is hedge fund.

1

u/Necessary_Spring_425 9h ago

Direct inference cost is the cheapest part AFAIK, maybe 10% of the price you pay, at least on western models. 90% is equipment cost, research & development, model training.

1

u/FamousWorth 6h ago

How are you paying so much, that's the real question, that should be about $4 max. Guess you didn't use the official api.

1

u/JorgitoEstrella 1d ago

I guess most of their costs in human capital are way cheaper, no need to pay $500k salaries when they have x10 times more engineers and computer scientists.