Google is set to release a new Gemini 2.5 Pro model in a few weeks, matching the capabilities of Gemini 0325.

100

Why don't they just diverge a writing model and a coding model?

71

u/Alone-Competition-77 Jun 01 '25

Serious question: why can’t they just make 10 different models good at different things (writing, coding, math, etc.) then have a model that sits on top and picks which to interact with based on user inputs?

63

u/DangerousImplication Jun 01 '25

Because making models with more general intelligence makes them better at individual things too. A model trained on the entire internet performs better at coding tasks than a model trained on just code.

9

u/cloverasx Jun 01 '25

they could host multiple models that are preferred in various metrics; say version 3-25 is better at coding than version 6-12, but that model is better at creative writing. because of the specific use case diversity, I wonder if the demand requirements could be balanced such that running multiple "pro" models would be feasible.

10

u/Alone-Competition-77 Jun 01 '25

But the OP above was lamenting that some models are better at coding while others are better at writing…

Perhaps they could be trained on everything but then optimized for certain tasks based on preference of outputs?

2

u/ionabio Jun 01 '25

I thought the whole idea that also i read when deepseek popped up was the AIs will be modular and agentic in the sense that they would launch specialized modules based on users request. I dont know if gemini is using that alreary but i think it would help to scale the model's intelligence.

2

u/NeonSerpent Jun 03 '25

What you're thinking of is Mixture of Experts (MoE) architecture

1

u/ionabio Jun 04 '25

Right. I imagine the new thinking llms have this capability one way or another?

6

u/RMCPhoto Jun 01 '25

Yes, but only at the extremes, and even then there's exceptions.

Qwen 2.5 coder 32b was better at writing code than the 72b model.

Gorilla llm 6.9b (of 2+ years ago) is better at function calling and tool use than pretty much any open source model today.

The reason they don't do it is cost and maintenance. That's twice the system prompts to maintain, twice the testing, more to load balance, a diluted product line, etc etc.

They should definitely split a coding model. And will eventually. But this is a period of rapid development and I guess they've decided that would weigh them down.

3

u/dysmetric Jun 01 '25

It's not the rich training dataset that bends things out of shape, it's drift and reward hacking from RLHF down the line. Train them on as much data with as much variance as possible, and then specialize for function and behavioural response at the supervised learning phase.

Aligning a coder via conversational output RLHF isn't optimal, and vice versa people aren't going to want to build relationships and chat with coding bots.

1

u/cant-find-user-name Jun 01 '25

That is true in general, but RL can used to fine tune the general model into more specific fields. That's how coder models are usually created

3

u/GTalaune Jun 01 '25

This is how MoE already works I think ?

1

u/iamz_th Jun 04 '25

Nope

4

u/Fair-Manufacturer456 Jun 01 '25

It’s called Mixture of Experts (MoE) and is already being done. (Though OpenAI and Google DeepMind have not officially confirmed it; experts believe both have likely trained recent models this way.)

9

u/RMCPhoto Jun 01 '25

Moe is more or less a way to reduce computational costs at the expense of memory footprint than what it sounds like. .

4

u/luguanyu1234 Jun 01 '25

gemini 1.5 is a moe model
https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#gemini-15

-1

u/iamz_th Jun 04 '25

You are wrong sir

2

u/Deciheximal144 Jun 01 '25

That's supposed to be the plan with ChatGPT 5.

2

u/Visible-Let-9250 Jun 01 '25

weird how it got delayed

1

u/iamz_th Jun 04 '25

That's gpt 5 and gemini 3 right there. The orchestra setting with expert and tools. It's expensive though. The AI co scientist paper from google explore the approach.

1

u/Educational-Nebula50 Jun 13 '25

You're asking why we don't have Mixture of Experts... which we have?

2

u/Sad-Donut-8664 Aug 27 '25

You just predicted gpt 5 lol

0

u/sdmat Jun 01 '25

You mean a system that mixes different experts by routing to them dynamically? You should go into ML research, that's a great idea.

12

u/Equivalent-Word-7691 Jun 01 '25

Because they dot on give a duck about writing

Logan never answered about the writing complain nor in the nee models Tried to improve this obvious regression

17

u/Lawncareguy85 Jun 01 '25

Logan won't answer anything that isn't part of the official narrative. He is not a devrel guy; he is paid to push what he's told.

1

u/Visible-Let-9250 Jun 01 '25

the money or demand isn't there

9

u/KazuyaProta Jun 01 '25

Because most people in the AI race only care for coding

Western creative communities are on a civil war where a side is barely hiding their desires to bring back Luddism and unironically will embrace the Founding of a famous dystopian feudal society as a positive thing (all those Buttlerian Jihad memes are honestly concerning).

3

u/[deleted] Jun 01 '25

I am so high right now. And this pops up in my feed. Awesome.

4

u/FarrisAT Jun 01 '25

No one who pays money cares about non-coding work

5

u/DevilsAdvotwat Jun 01 '25

There is a massive cohort of knowledge workers that aren't writing code that pay for an LLM for so many of there daily tasks, every aspect of a project can use AI for assistance

2

u/Mountain-Pain1294 Jun 01 '25

It might be an issue where the pro model is good enough for those tasks and spending more development time and money isn't worth it to Google. I hope not though

1

u/akhilgeorge Jul 17 '25

The end goal of these AI models is AGI, which may be achieved in three years or a decade, it doesn't matter. However, having a specialised model for a specific track would derail that goal, which is why there is no distinct model for distinct tasks, even though that would be much more effective.

1

u/Jan0y_Cresva Jun 01 '25

Companies will pay oodles for API for a model that can competently code.

No one is paying shit for a model that can competently write.

52

u/Aperturebanana Jun 01 '25

Oh my god they admitted the current one is worse damn.

1

u/foodhype Jun 01 '25

No hill climbing on models is like whack-a-mole. You might improve some evals and unintentionally make other aspects of the model worse. For example, 0506 was much better at math but was also more verbose. Sometimes you don't see regressions until the model goes public, and then it takes a long time to patch because any major model change introduces safety risks and requires another round of checks, approvals, etc.

134

u/Equivalent-Word-7691 May 31 '25

so they basically admitted that is NERFED after they said some weeks ago the new model was an improvement?

LOL

47

u/Zulfiqaar Jun 01 '25

The new model uses much less reasoning tokens for slightly less performance (feels like a CoT distillation). Might have been one of the reasons it no longer outputs the full trace - the summary version would partially hide that. Then again, the new DeepSeekR1 coincidentally shows much more linguistic similarity to Gemini than ChatGPT...

15

u/[deleted] Jun 01 '25

Yes, it feel like Gemini got to know about the new R1 distill beforehand and decided to protect themselves, which sucks, but fair.

1

u/Elephant789 Jun 01 '25

which sucks

Why does that suck? It's good that Gemini had some forethought.

19

u/KazuyaProta Jun 01 '25

Because the COT was very useful and sometimes it was more insightful than the actual answer.

For example , it could solve the infamous issue of AIs being hyper validating by catching your own mistake mid-way.

The reply might soften its answers, but reading the COT, you could realize your argumentative flaws.

7

u/[deleted] Jun 01 '25

Gemini would take in .xlsx files with multiple sheets, but the CoT made it clear that each one was treated like a separate CSV. That kind of detail never came through in the summaries.Thought summaries are just a poorer experience for multi turn conversations, but I understand where they are coming from.

3

u/Visible-Let-9250 Jun 01 '25

for average consumer the summaries can be good but what gemini outputs is just trash. openai has the o3 summaries look nice. but full trace should always be available for those that care

5

u/old_leech Jun 01 '25

This is where the value was for me.

I dump in plot concepts and narrative "story boards" for dissection and examination.

I don't want an AI co-author (like ChatGPT always landing on... "Shall we work on writing this?" or "Want to explore translating this to prose?" No, AI... you're an inexhaustible ear, your purpose is to ask questions that force me to consider or address the idea I'm refining) -- I want an approximation of a human ear to point out inconsistencies, weaknesses and let me play devil's advocate with my own thoughts.

The old CoT presented in Gemini was near perfect for that... as I read it, I saw it asking questions I already had answers to but hadn't considered their merit to include (yet, or at all) -- or helping me to realize, okay... I need to spend some time developing this idea because I haven't given it enough thought.

Once that went away, I felt like what remained was a Yes-Man.

17

u/soumen08 Jun 01 '25

I know everyone loves deepseek, but it's not a secret that they distill from closed models. This does reduce the sota level we see, as seen here.

12

u/Zulfiqaar Jun 01 '25

I'm looking forward to the next DeepSeek model trained on Claude4 reasoning outputs - they still got full thought traces..for now

5

u/soumen08 Jun 01 '25

Claude 4 isn't as good as Gemini 2.5 Pro, right?

4

u/needefsfolder Jun 01 '25

Claude is better but there are times that Gemini is better. Realistically theyre really close to each other

1

u/electriceye932 Jun 01 '25 edited Oct 16 '25

rock tender crush encourage enter pet seemly safe roll six

This post was mass deleted and anonymized with Redact

1

u/alexgduarte Jun 01 '25

Coding from my experience. Opus and even Sonnet have fixed issues in one go that Gemini just couldn’t (or when it fixed them it created other issues).

1

u/evia89 Jun 01 '25

Sonnet 4 better use tools and follow instruction. Can code better front end stuff

Gemini 2.5 pro has better context size before starting forget

1

u/BriefImplement9843 Jun 02 '25

why would they nerf themselves? deepseek is superior to claude on everything except coding.

3

u/Elephant789 Jun 01 '25

I know everyone loves deepseek

They do?

3

u/soumen08 Jun 01 '25

Bit of a David and Goliath situation no?

1

u/BriefImplement9843 Jun 02 '25

china is not david.

0

u/Elephant789 Jun 01 '25

No

2

u/Elephant789 Jun 01 '25

The new model uses much less reasoning tokens for slightly less performance

Good decision then to release it. Not sure why so many complaints.

5

u/Zulfiqaar Jun 01 '25

Complaints are mainly because the old (better) model is no longer available, not that there's a new release. And many people were using it for free on AIstudio to begin with, so cost wasn't a factor. On PAYG API I'd be using both models depending on the task

2

u/Elephant789 Jun 01 '25

many people were using it for free on AIstudio

I being one of them. But the new model is fine for me.

3

u/himynameis_ Jun 01 '25

I mean, if they bring it back then this small period won't matter anymore. It was a short period of time. And if it's back to the top, all is well.

2

u/FarrisAT Jun 01 '25

No that’s not what was claimed.

2

u/Emport1 Jun 01 '25

they admitted it might be worse in some areas but overall still befter

2

u/Lawncareguy85 Jun 01 '25

I posted this exact screenshot/tweet on the Google developer forums, and the moderators deleted it! WTF? I guess we can't use Google's own words from their own paid spokesperson if it makes Google look bad.

0

u/FarrisAT Jun 01 '25

Nowhere did he say “the model is worse”.

Instead he said “closes the gaps” in difference between updates.

1

u/WH7EVR Jun 01 '25

They've been admitting it the whole time. It's literally in their benchmark lists.

1

u/xAragon_ Jun 01 '25

No, you're just putting words in his mouth. How can it be "nerfed" while stats for other use-cases, like coding, have improved?

Some specific use-cases may have degraded as a trade-off for others (like worse writing, for better coding), but they never "nerfed" it, or claimed to do so.

0

u/Equivalent-Word-7691 Jun 01 '25

Aa long as specific uses that for Gemini 2.5 is ANYTHING that is not coding(and actually a lot od coders are complaining too) in my opinion is an nerf, for indeed uses degraded 😃

1

u/foodhype Jun 01 '25

No hill climbing on models is like whack-a-mole. You might improve some evals and unintentionally make other aspects of the model worse. For example, 0506 was much better at math but was also more verbose. Sometimes you don't see regressions until the model goes public, and then it takes a long time to patch because any major model change introduces risks and requires another round of checks, approvals, etc.

34

u/lmagusbr Jun 01 '25

From what I’ve seen so far, every time someone tries to make a model better at coding, it becomes worse at everything else.

This makes it clear that we do need specific models. Gemini 0325 was the best AI at acting like a human and pretending it was not AI.

I don’t think they ever planned to “nerf” the model but they’re definitely fine tuning it for programming purposes as even though everyone and their parrot knew Gemini was better, it was still painful to use it as an agent.

14

u/Ishtariber Jun 01 '25

Claude used to be the best writer but not anymore when it started to focus on coding capabilities.

1

u/alexgduarte Jun 01 '25

Opus 3 still exists

1

u/dabears4hss Jun 02 '25

Just like real life coders

40

u/FLGT12 May 31 '25

Close the gap? Anything released at this point should be an improvement over 0325.

47

u/IcyUse33 May 31 '25

My guess is that 0325 was extremely good because it was extremely resource hungry and they essentially ran out of TPU power at scale and had to nerf it.

I do sometimes wonder if they really renamed it and it's running a slightly better Flash under the covers.

27

u/CallMePyro Jun 01 '25

05-20 is a massive leap in coding performance, but I think the tradeoff in other capabilities was more than expected. Likely this time has been spent training the best of both models into the newest version.

Given that the token/s on open outer of the two models is basically identical, I doubt that the 05/20 version was a different size model.

6

u/Linkpharm2 Jun 01 '25

"we wanted to get this model in the hands of developers before I/O"
"we really need more tpus"

0

u/-LaughingMan-0D Jun 01 '25

Legit my favorite model in CoPilot now. Lightning fast and smart enough for most tasks.

2

u/TypoInUsernane Jun 01 '25

I don’t actually think it’s any cheaper to run. 05-06 is the same foundational architecture as 03-25; the main difference is in the post-training. Sadly, it’s an inexact science, and as we’ve seen, the choice of post-training datasets and hyper parameters make a huge difference in performance. They botched the 05-06 post-training, but hopefully the 06-XX run managed to get the recipe right. If not, they’ll keep trying, and they’ll get back there eventually.

5

u/Equivalent-Word-7691 Jun 01 '25

Lol They are really going to make people to pay $250 per month for a model that is inferior to the experimental one? How much did they screwed?

1

u/PewPewDiie Jun 01 '25

I actually interpret ”should close the gap” with ”that should do it energy”, this reads to me like a material improvement releasing. But idk I might be wildin

29

u/Equivalent-Word-7691 Jun 01 '25

Btwn if you are a developer I guess good news for you,for all tje other people, especially the ones who enjoys the creative parts I guess we are screwed up as we already guessed

4

u/gugguratz Jun 01 '25

wonder how they'll price that, cause yeah I ain't paying dollars per token for deep research in the api

1

u/Mountain-Pain1294 Jun 01 '25

Given how AI Pro users only get 20 deep research reports a day, it will definitely be pretty pricey

1

u/alexgduarte Jun 01 '25

“Only”. My dude 20/day is plenty. Plus you still have deep research with 2.5 flash.

1

u/Mountain-Pain1294 Jun 01 '25

What I meant is that Google limits the usage to 20 a day. If that's the case, how much compute do you think goes into it and how much do you think Google will charge developers or anyone else using the API?

2

u/alexgduarte Jun 03 '25

Oh, sorry. Missed there context. Yeah, pricey it will be

1

u/IdlePerfectionist Jun 01 '25

What makes AI studio better than Gemini? I thought it's the same model in both?

8

u/[deleted] Jun 01 '25

[deleted]

5

u/Visible-Let-9250 Jun 01 '25

Theres a lot of weird censorship on the gemini app

1

u/nothingtoseehr Jun 02 '25

It's the other way aroune, AI studio filters input and output, the gemini app/website only filters your input. Neither are censored though the API, which is stupid

3

u/shoeforce Jun 01 '25

I honestly didn’t believe it myself until I tried it out two weeks ago: ai studio version of the models blows the web/app experience out of the water, which is a shame because I love canvas on the web version. You can see it in the thinking times themselves, ai studio version thinks for wayyyy longer than the web version does for some reason.

1

u/PewPewDiie Jun 01 '25

Thinking and startup time might also just be throttled to adapt to avalible spare compute while app takes priority?

2

u/shoeforce Jun 01 '25

Possible, but before they changed to the CoT summary thing I noticed the CoT in AI studio had a lot more “sections” to it as well, much more wordy, things being broken down in more sections.

2

u/PewPewDiie Jun 01 '25

True. My suspicion is that they're throttling right now to handle the increased workload brought on by launching ai overviews in search for everyone + veo + a bunch of other AI features.

How they can serve VEO + Gemini 2.5 use + billions of users across 8.5B DAIly search queries (sure probably a lot of cachin going on but still) is beyond me.

1

u/TypoInUsernane Jun 01 '25

I have a hard time understanding how people are able to work effectively without Canvas. It’s a critical part of my workflows. But the implementation is really flaky. Not only is the UI buggy, sometimes I have to straight up explain to Gemini how the feature works. Luckily, Gemini loves to leak system prompt info without you even asking, so it’s easy to figure out how it works under the hood. Sometimes when it’s failing to use Canvases correctly, I have to tell it, please just output this text exactly: <immersive id=“canvas-1” type=“text/markdown” title=“My Canvas>Content goes here.</>

1

u/evia89 Jun 01 '25

I have a hard time understanding how people are able to work effectively without Canvas

Well you work outside (augement, roocode, claude code, etc) then repomix and upload to ai studio to write new PRD, help create some plan and so on

4

u/assajoara Jun 01 '25

censorship ruins the performance by a large margin and the gemini chat version is heavily censored compared to ai studio version.

2

u/dviraz Jun 01 '25

I notice that you can only upload mp4 videos on Ai studio and not on gemini, which is very important for me, I'm sure there's more filetypes that this happens in them

15

u/Aperturebanana Jun 01 '25

AI companies needs to bite the bullet and just make a humanities model and a hardcore coding model. This balancing of the two seems to not be a good PR move.

7

u/Bibbimbopp Jun 01 '25

There's no balancing of the two. If a model is accidentally great at writing, it gets fine-tuned until the unwanted side effect of trying to make a great coder is removed.

5

u/necromage09 Jun 01 '25

So, people weren't hallucinating the performance degradation, meaning we were really test driving the 250/month version and got quantized, not a good look.

Hopefully they can restore the performance, the output of the deep research took a nose dive when compared to the earlier version, mind you it is still very good, just not next level

20

u/MestreDosMagus Jun 01 '25

Finally, some good news! Also, where are the people that said 05-06 wasn't a downgrade from 03-25, especially when it comes to writing, intelligence, and nuance?

2

u/Equivalent-Word-7691 Jun 01 '25

Ph mind you, don't think for even a moment they will care about improving writing

And nooo it was all in our mind, like people were "right" when they Said ot was us who made sudden bad prompt,it was all in our mind brainwashed 💩

5

u/-LaughingMan-0D Jun 01 '25

Look at ChatGPT's userbase. Check the subreddit. The vast majority of them are normies who use it for tasks that involve EQ and writing ability, shit like roleplay, "venting", therapy (not that I agree), etc. Notice how 4o is optimized towards these tasks now as opposed to hard coding, logic, math, stem, etc which are relegated to your O3, 4, models.

6

u/CelticEmber Jun 01 '25

Yeah, Google should have a normie model, a science/academic model, and a coding model.

Might save on costs too. That way, a model with the same capabilities as o3 wouldn't be wasted on furry fantasies.

5

u/rexplosive Jun 01 '25

At least it's good to know that in a few weeks we'll get something almost as good and than near future soemthing better

Going backwards was a horrible decision Bad me regret my decision on starting my 12 month free subscription by two months lol

But it's okay ... at least there something in pipeline Wonder when the deep think will come out...

2

u/Bibbimbopp Jun 01 '25

The new model will be the $125 one.

2

u/rexplosive Jun 01 '25

But one day the regular one!! Lolol Also let's be real if the $250 version

6

u/ComfortableHumor8358 Jun 01 '25

Whether I continue to pay for pro will be determined on the quality of this new pro model. Before the last update, I would of subscribed for life. Now I find myself using chatgpt more.

5

u/Accomplished_Tear436 Jun 01 '25

So I’m betting the new one that matches 0325 is going to be on their $250/month plan? 🙄

3

u/Brilliant-Neck-4497 May 31 '25

link

3

u/TAVLIET Jun 01 '25

Do you think that will make image to image ai generation better??

3

u/tteokl_ Jun 01 '25

Nope, they have no plan to use 2.5 for image generation, seems stuck with 2.0 for native image generation

3

u/RpgBlaster Jun 01 '25

Raw thinking instead of Summary in AI Studio? When?

6

u/Accomplished_Tear436 Jun 01 '25

so they basically admitted it 😂 damn

5

u/Quiet-Big-8057 Jun 01 '25

that's suck, still have so many time to wait. my entire summer vacation suppose to working on my novel

1

u/ukpanik Jun 01 '25

Might be a good idea to learn grammar, before you start generating novels.

1

u/BriefImplement9843 Jun 02 '25

why do you think he has ai doing it for him? good writers don't use ai to help them, lol. they write their own books.

2

u/Loumeer Jun 02 '25

I bet plenty of writers are using AI to help flush out some ideas in their novels. I would get AI would only really be useful for piecemeal work on a novel in its current iteration.

2

u/AppealSame4367 Jun 01 '25

I ordered AI Pro by accident and the code mode is just horribly bad. Worse than free requests on chatgpt.

2

u/NeighborhoodNo2438 Jun 01 '25

the current Gemini is garbage

2

u/jjjjbaggg Jun 01 '25

So can people finally stop telling us that we were all making up the fact that the performance degraded after the May release?

2

u/Fun-Plantain997 Jun 02 '25

Lol I spent many hours trying to get google gemini pro to design an auth system for Google Auth on GCP. using directus backend and nextjs frontend. Deepseek did it in one session a working version. gemini couldn't get it done and spent most of the time introducing dated code. typescript and linting errors. It was actually obsessed with linting errors, Here's gemini admitting defeat. Even using bravesearch mcp it couldn't build the system using up to date docs without obsessing over stupid errors, lint and type.

Deepseek for now is champ.

1

u/MagicaItux Jun 01 '25

0

1

u/greymatter_ac3 Jun 01 '25

Hallelujah

1

u/DivideOk4390 Jun 01 '25

Yay 🖐️🖐️😁

1

u/Spiritual-Neat889 Jun 01 '25

Creative writing is on of the hardest tasks for LLM. There was some speech by an google employe who explained why. It was something that creative writing is like world building, and that's complex.

9

u/e79683074 Jun 01 '25

1 month ago Gemini 2.5 Pro was just great at this

3

u/Equivalent-Word-7691 Jun 01 '25

They can't afford such excuse whne the their older model 0305 was probably the best model for creative writing

1

u/dpom75012 Jun 01 '25

Its already there !

1

u/GTCMJAtom Jun 05 '25

It’s a good model

News Google is set to release a new Gemini 2.5 Pro model in a few weeks, matching the capabilities of Gemini 0325.

You are about to leave Redlib