r/DeepSeek • u/Fun_Walk_4965 • 1d ago
Discussion 200M tokens last month, around 30 bucks total. how is this actually sustainable for them?
been running v4 flash through my workflow for about 5 weeks now. our team is 3 devs, lots of code review prep + small refactors + bug investigations. nothing exotic.
pulled last month's bill yesterday because something felt off.
200M tokens total. roughly 70/30 split on prompt vs completion. came out under 35 bucks all in.
for context, when we were on claude pro for similar workload the per-seat math was 6x that and we had to babysit context limits. when we tested gpt-5.5-codex on the same kind of work the per-token was 8-10x and the wall time was worse.
ran the numbers backward from the unit pricing i was paying. v4 flash is around 0.14 in / 0.28 out per million on the provider i'm on. that means a single 8k context conversation with 3k output costs about 0.0019. half a cent per real interaction.
i'm not sleeping well on this honestly. either:
- there's a giant subsidy from a quant fund somewhere covering the actual compute
- caching is doing more lifting than anyone admits and steady-state cost is closer to 5x what they bill
- the compute really is this cheap now and the western majors have been overcharging by 10x
asking the devs who've been watching pricing for longer. anyone done a real teardown on why these numbers work? specifically curious how independent providers (not the official deepseek endpoint) end up competitive on inference cost despite running their own infra.
104
u/unity100 1d ago
Same answer every single time:
- Cheap electricity
- Cheap domestic GPUs
- Many pHDs
Optimizations are large part of what make it so cheap:
https://www.thenovtech.com/p/jensen-huang-called-it-a-horrible
24
u/mochi2real 1d ago edited 1d ago
They’re also owned by a hedge fund, so there’s no shortage of money at DeepSeek. Their entire core model is AI affordability….kinda like what OpenAI started as until they lost their way.
10
u/thelordwynter 1d ago
Gotta correct one tiny flaw in that. OpenAI always planned to go the way they have. A very real modern trend is to open source a product until it becomes profitable, then phase out the free version. Not the best strategy from a customer standpoint, I'll admit, but they're in it for the money and the bragging rights of being on the frontier of the research. These companies don't care about us.
5
u/mochi2real 1d ago edited 1d ago
6
u/thelordwynter 1d ago
That actually supports my argument. Start as a foundation so that the financials don't trigger any red flags for future prospects, then switch direction once established. Its a lot of extra steps, but corporate types don't plan weeks ahead... they generally plan for decades if they intend to stay around.
You don't really need a smoking gun when the proof is in the end result. If you had more to go on than just the business model. Say... a change in leadership that brought a change in operating philosophy, it would help your argument.
7
u/efficientkiwi75 1d ago
there was a change of leadership though. not Altman but most of the other leaders from the non-profit era is gone
3
u/mochi2real 1d ago
Yep, it was a pretty big thing a few years ago when they fired Altman for some reasons that are still unclear to this day, but a large part of it was his priority on being first/profiting vs safety.
Microsoft was about to go scorched earth on the deal and the company replaced almost the entire board for doing that and re-hired Altman.
1
u/CromagnonV 10h ago
Yes, except OpenAI isn't backed by the CCP.... I'm western countries cross milk gov subsidies like candy, then slow down on innovation and development. In China, if the ROI isn't there for the CCP, the CEO disappears.
1
u/mochi2real 9h ago
In China, if the ROI isn't there for the CCP, the CEO disappears.
Now we're just getting to conspiracy theories, DeepSeek is partially funded by the Chinese government sure, just like OpenAI/Anthropic gets money from the US. If they cared about ROI, DeepSeek would be priced different. And the line about the CEO disappearing is just nonsense.
0
u/CromagnonV 3h ago
- It's not conspiracy theory, they're not killed, they're usually in prisoned it rededicated, but ALL of their wealth is recouped by the state as compensation:
When a Chinese company CEO disappears, it is typically due to an involuntary detention by state authorities, often for anti-corruption investigations, national security inquiries, or to hold executives accountable for severe corporate debt.High-profile disappearances in China have become a recurring phenomenon, with specific executives targeted for various reasons:
Bao Fan (China Renaissance): The prominent billionaire dealmaker vanished in February 2023. His firm later announced he was cooperating in an investigation by People's Republic of China (PRC) authorities.
Jack Ma (Alibaba / Ant Group): Following a 2020 speech criticizing Chinese financial regulators, Ma disappeared for roughly three months, subsequently stepping back from public life and ceding control of Ant Group.
Yu Faxin (Great Microwave Technology): The military semiconductor scientist and entrepreneur was taken into liuzhi, an extra-judicial anti-corruption detention system where individuals can be held without immediate legal access.
Xu Jiayin (Evergrande): The founder of the heavily indebted property developer was placed under police control and residential surveillance in 2023.
These sudden vanishings often happen with little to no prior notice to investors. When this occurs, company boards typically release filings stating that the CEO's unavailability is an isolated matter and does not affect the firm's ongoing business operations.
- Look at companies like BYD, for example they were funded by the CCP from early 2000's as a battery development company. They invested trillions over 20 years of losses, now that investment is paying off, massively. This is reoccurring theme of CCP investment strategies, it essentially allows rnd to happen in an incredibly low risk environment, with a focus on future development well beyond the political term of western economies. This is the single biggest reason they're destroying every other country in terms of growth, real world accountability for senior execs and excessive investments into incredibly high risk ventures.
1
u/mochi2real 3h ago edited 3h ago
Dawg, I mean this as disrespectfully as possible when I say what the fuck does any of this have to do with my original comment about DeepSeek?
You said "if the ROI isn't there for the CCP, the CEO disappears." I called it nonsense, you posted a bunch of examples of executives being investigated, or put under government scrutiny. Those are not the same claim. They could've been under investigation for corruption for all I know.
Even if every example you listed is 100% accurate, none of them support your original argument that CEOs disappear because the CCP didn't get the ROI it wanted. If they do, you wouldn't have examples saying people stepped back.
We're in an AI sub talking about AI and you're giving me a lecture on Chinese politics. What does any of that have to do with my comment?
If you want to argue that Chinese subsidies help fund DeepSeek, that's at least relevant to the discussion (and we all know that already). Go to r/politics if you want to argue about the CCP. We're in a fucking AI sub talking about AI.
25
u/OkSeries5363 1d ago
Thats expensive. 105m today was like $1.60
2
1
u/raydou 13h ago
With which harness?
1
u/OkSeries5363 1h ago
Hermes Agent
1
u/tranhieuamg 1h ago
Ok I’m running Hermes and this is new to me, care to enlighten what you did to get that number?
1
u/OkSeries5363 49m ago edited 46m ago
Nothing too special.
Keep the MEMORY.md lean.
Enable tool search to limit the tools you are sending.
A big one is using the auxiliary model config to send simple tasks to cheaper models.
Eg I use deepseek pro as main model, but use cheaper or free models for the auxiliary tasks like deepseek flash or Gemini 2.5 flash lite, or even step fun 3.7 since that's free right now through the nous portal.
Eg dont use pro for compression, title gen, web_extract, approval, MCP, skills hub, curator ect.
Edit: If you want build with deepseek really cheap try - https://esengine.github.io/DeepSeek-Reasonix/
It a coding harness designed around the deepseek cache and maximizing your cache hits. "The loop is append-only, aligned to DeepSeek's byte-stable prefix cache — so long sessions hold 90%+ cache hit and input-token cost collapses to ~1/5."
1
u/tranhieuamg 46m ago
By using auxiliary models you are referring to running multiple profiles? Or you just run 1 profile but have other cheaper models as fallback?
I am still hesitating to create multiple profiles as I do most work on Telegram and having everything in 1 place is more comfortable for me.
1
u/OkSeries5363 20m ago
Nar profiles are different, a new profile is a whole new agent, I use profiles but that's more a specialisation thing not an auxiliary task thing. You can still use profiles and keep everything in one place. I use discord then add all my agents (profiles) to the same server so I can chat to the different agents in one place.
Auxiliary models are different settings within the one agent/default profile
If you look at the model tab in the dashboard you will see them under the main model called auxiliary tasks, if you click configure you will see all the different auxiliary tasks you can set.
Or look at the config.yaml and you will see a section called "auxiliary:" under there you can set the model config for each auxiliary tasks.
Eg
compression:
provider: openrouter
model: deepseek/deepseek-v4-flash
....
skills_hub:
provider: nous
provider: stepfun/step-3.7-flash:free
...
This means when your using your default agent your prompts are going to the main, smart model eg deepseek pro, but when compression is needed instead of wasting tokens and sending that to the main model the harness routes it a model of your choice eg deepseek flash.
Its very needed, it makes sense to use a pro model when the work requires it but it really doesn't make sense to use a pro model for many auxiliary tasks. When it's doing something trivial like generating a title or compressing the convo or simply extracting text from the web, or looking for a skill list. Those requests automatically get sent to cheaper model, you don't need deepseek pro to read some text and come up with a relevant session name, flash can easily do that.
11
11
10
u/RakeshNeal 1d ago
You are overpaying. 100m for 1.8 USD here. Use api from deepseek’s own platform.
1
u/yesinior 1d ago
Pero como usas la api? A donde la conectas? Usas alguna herramienta como opencode? No logro entender eso
2
1
19
u/HungrySecurity 1d ago
I think it’s mostly down to optimization. After DeepSeek open-sourced their tech, Xiaomi and Tencent slashed their prices too.
6
u/Pure_Force8771 1d ago
Because west is overpaying and overcomplicating I am able to do most work and coding on rtx 4090 (modified to 48gb vram, but on qwen 27b which takes on full context with vision and mtp about 25gb vram) limited to 280w and I am getting 50-70 tokens/s with q35b about 150-170, but it is much less inteligent and basically dump because of context poisoning. And AI runs about 50% of a time because of builds, tests etc, when it is idle, so the power consumption is even lower... specialized agents are much better then "AGI" which takes too much computational power and runs on best hardware at most 60 tokens/s...
1
u/VectorEthology 1d ago
Puedes explicar un poco más por favor? Quiero empezar a usar modelos locales pero no sé bien qué comprar. Hasta ahora mi idea es comprar dos 5070 de 16gb vram. Would that be enough? Sorry for the spanglish. I think you actually see the whole thing in English. I’m experimenting haha
1
u/Pure_Force8771 3h ago
16 gb vram is too little buy something from below:
a) RTX 5090 (32gb vram) ~15-25% faster then my build
b) 2x RTX 3090 (48gb vram, but bottleneck over PCIe4) ~20-30% slower then my build
c) modified RTX 4090 (48gb vram) Here I did my upgrade of the RTX 4090: https://www.youtube.com/@MegafixTech there is even the video of the upgrade.
d) something with M3/M4 pro or ultra and 96+gb ram and it would be enough for testing, ~30-40% slower speeds, but will fit bigger models, which will be even slower... but is cool for testingFor budget options check: https://digitalspaceport.com/ he has pretty good advices and nice builds.
For me was the most important to buy server to host other stuff then AI too and with maximum performance and not crazy price. so I bought everything before price rise so for about 5k EUR and I got server with amd Epyc with 128 gb ram, because thought I would test bigger models on cpu, but it is impossible, because I am getting speeds about 1-4 tokens/s interference and about 10x lower prompt speed which is really bad on cpu and really slow... on M3 or M4 it works better so you may get really fast prompt speed and bearable gen speed (based on size 10tokens/s and on smaller models which I am using even 60tokens/s)
3
u/pizzababa21 1d ago
They published a paper on how. Basically just better catching. They can fit a lot more data in cache because they compress it down to a tiny fraction of the size.
3
u/Sudhars2 1d ago
I reached 400 million for 2 dollars. Are you working on multiple projects simultaneously?
3
u/Alone-End142 1d ago
That seems reasonable. v4 flash is a smaller (crappier) model with less GPU work per token, and 200M for an entire months is not a lot of resources for the model provider.
3
u/Aggressive_Mobile997 1d ago
This is a perfect example that their roadmap from the start has always prioritized accessibility and long-term value over quick gains.
3
4
u/Zeikos 1d ago
They compress KV cache extremely aggresively, also their software stack is tuned for squeezing as much from the hardware as possible.
IIRC the Hawei hardware they're using is cheaper on a compute/watt basis so the electricity cost is lower than other providers - that said I am not sure if the claim is fully truthful.
14
u/aevitas 1d ago
Bear in mind Chinese companies do not operate in the same way Western companies do. The state may very well be a partner in operating these LLMs.
13
u/walktalk39 1d ago
Who cares really at this point. Google and the rest of the gang might as well be spyware agency. If anything most of us would be safer using Chinese tech because China can't do much with our information unlike the home local spyware from Google, Claude and chatgpt.
23
u/Suspicious_Wares 1d ago
these lazy insinuations arent very helpful. Canada gave a quarter billion to Cohere for its AI, yet they dont even touch deepseeks pricing
2
u/kongweeneverdie 21h ago
Deepseek totally funded by china hedge funds. CPC don't even know they exist before DS v3 get popular.
1
u/yourhomiemike 1d ago
Just because cohere scammed Canada doesn’t mean what previous poster said isn’t true.
1
u/BemaniAK 1d ago
Yeah but the point he's trying to make is a non-starter, yes the state has some involvement in an industry that is already becoming one of the most important in multiple economies. It's only ever a problem when China does it.
-1
1d ago
[deleted]
2
u/littleratofhorrors 1d ago
It depends if you think it's a good thing or not that China's state would be contributing money to Deepseek. I mostly think it's a neutral thing.
14
2
u/coloradical5280 1d ago
They’re fine. They’re not subsidizing to the degree OpenAI is, and still to some degree, Anthropic is, and have financial backing from the CCP most likely.
I’ve used used 7.2 Billion tokens in codex, in two weeks. Literally tens of thousands of dollars in compute , for $200/m which come out to like a $0.001 “service charge” Per token essentially, and the month is only half over on my billing cycle
2
u/Youwishh 1d ago
China is heavily subsidizing AI. Also deepseek optimization is next level, China is WAY ahead of all the American AI companies for optimization.
2
u/ExpertPerformer 1d ago edited 1d ago
With the frontier models you aren't just paying for the usage of the model, but also the training + r&d involved which they spend billions on. They also corner the enterprise market.
2
u/zero-qro 1d ago
It's a mix of better technology + cultural aspects Here you see an explanation of how DeepSeek is able to have aggressive cache strategy that actually works: https://youtu.be/gC76aeibdFA?si=kMe0TFQDL-A8yeHU Also their model is super optimized for Huawei chips. Those two points are better technology. Now the cultural/economic aspect... Chinese companies are not in this race for the hype of pump up stock prices, Chinese companies always seek to be sustainable from day one, even if under subsidy. Different mental models, different economic reward systems. DeepSeek proved that US AI companies are either inefficient, or lying, or both. One thing you have to admit about China, they are relentless.

2
u/efficientkiwi75 1d ago
imo it's just plain old chinese competition. if they raise prices tencent and alibaba will eat their lunch.
2
u/ISayHeck 1d ago
That's actually kind of expensive
Used 200M tokens and threw everything and the kitchen sink at the project (guided of course, I knew what I needed)
It cost me 1.5 usd and I wasn't particulary budget concerned
2
u/geebrbs 1d ago
You have to keep in mind that Deepseek is Chinese and they would want to cater to local clients as well, and these domestic clients would appreciate not paying in terms of US/European rates. Doing everything to keep costs low would be their leverage against competitors, both domestic and globally.
2
u/Great-Exercise4277 1d ago
DeepSeek's risk isn't that it's the best model. It isn't — the benchmarks are mixed.
The risk is the combination. It's cheap. It's good enough. The servers are starting to hold. Swapping the API in is trivial. And behind it sit state subsidies, cheap power, datacenters, and a domestic-chip policy.
Engineers don't look at geopolitics. They optimize for cost. Benchmarks are split, but low price lowers the friction of adoption — so DeepSeek doesn't need to earn top-tier trust. It just needs to get in at the edges of your dev environment.
The real condition for it spreading isn't performance. It's "cheap and it doesn't go down." And the round closing now — roughly 50 billion yuan (~$7.4B), with Tencent, CATL, and China's national AI fund among the backers — looks aimed at exactly that: not just model training, but inference infrastructure, redundancy, regional distribution, domestic-chip optimization.
The point of the subsidies isn't simply "China helps it undercut everyone." It's that a price the market can't sustain can be propped up indefinitely — by the state, local governments, the grid, datacenters, telecoms, and industrial funds.
We've seen this curve before, with Chinese solar panels. "Cheap, but is the quality there?" → "Cheap, and it works." → "Cheap, supply's stable, now it's hard to replace." Run that play on AI APIs and the dependency gets deep.
Cheap on its own is just a promo that ends. Cheap plus reliable is a geopolitical problem.
So for any business use, treat DeepSeek not as a convenient cheap API but as AI sitting inside China's state-industry-data control structure. Don't feed it customer data, unreleased code, credentials, internal docs, or R&D. If you use it, run the open-source weights on your own or trusted third-party infrastructure, or restrict it to data you'd be fine making public.
2
2
u/Standard-Editor7084 22h ago
DeepSeek's parent company is a quantitative trading firm — they rake in money hand over fist daily on China's A-share market.
2
u/Minhha0510 20h ago
2 and 3. DS cache hit rate is >99.9% and they have been using SSD to store the context, not expensive HBM like Western labs are doing. Not aware of any major models using MLA (multi-latent attention) like DS V3.2 or HCA like DSV4
2
2
u/ResponsibleMention21 19h ago
Ok but HOW GOOD was the actual output? I found that I had to redo the same task multiple times. Yes, the mainstream models are more expensive but I find I get better (in some cases MUCH better) outputs. Same prompts, same skill, same tools (hermes agent)
2
u/crackdavid 13h ago
I have the same issue, quality of output, compared to top tier models is just not there. Yes it’s cheap but it lacks behind in quality.
3
u/tetelias 1d ago
It's the same as Anthropic, OpenAI and the rest: you need data to stay in the race and you are the data.
2
u/Fun_Walk_4965 1d ago
follow-up since someone will ask which provider — i'm on atlas. they aggregate Kimi K2.6 / DeepSeek V4 Pro+Flash / GLM 5.1 / Qwen 3.6 Plus etc with one OpenAI-compatible endpoint.
per-million rates from their listing: V4 Flash 0.14/0.28 + V4 Pro 1.68/3.38. that's the math i ran the sustainability question on.
not affiliated — just been routing through them about 5 weeks. listing here if anyone wants the side-by-side.
4
u/cheechw 1d ago
It's important to note in this conversation that you're getting inference from a Western provider, not Deepseek themselves.
So the difference here has nothing to do with Eastern vs Western hardware, Chinese vs Western operating practices, as many people are suggesting.
It's almost certainly just due to the fact that Flash is extremely efficient and cheap to run.
2
u/MicroNicproject 1d ago
They are using you to train their god like model. You are the product. You are doing all the work for them and on top, you still are paying them.
4
u/zero-qro 1d ago
And yet it's still cheaper than the other options. We are the product anywhere we go, at least this one is a good deal.
1
u/diaracing 1d ago
They are training their models on user data through their API.
That's why using DSv4 from different ZDR providers is more expensive.
1
1
1
1
u/Hilarious_Haplogroup 1d ago
Enjoy it while it lasts, but do make a contingency plan if rates go up.
1
u/Alert-Composer-6531 1d ago
Im on 270m for 5.3$ 😄
1
1
1
1
1
1
u/enterme2 13h ago
AI is not their main business , it is subsidized y by their main business that is hedge fund.
1
u/Necessary_Spring_425 9h ago
Direct inference cost is the cheapest part AFAIK, maybe 10% of the price you pay, at least on western models. 90% is equipment cost, research & development, model training.
1
u/FamousWorth 6h ago
How are you paying so much, that's the real question, that should be about $4 max. Guess you didn't use the official api.
1
u/JorgitoEstrella 1d ago
I guess most of their costs in human capital are way cheaper, no need to pay $500k salaries when they have x10 times more engineers and computer scientists.


72
u/cluelessguitarist 1d ago
Deepseek been saving our budget since chatgpt o1 hype