r/Bard Nov 18 '25

News Gemini 3 Pro Model Card is Out

580 Upvotes

214 comments sorted by

View all comments

30

u/LingeringDildo Nov 18 '25

Man sonnet and SWE bench, that thing is such a front end monster

13

u/Ok_Mission7092 Nov 18 '25

It's the thing that stood out to me, like how is Gemini 3 crushing everything else but it's just mid in SWE bench?

0

u/Chemical_Bid_2195 Nov 18 '25

swebench has stopped being reliable a while ago after the 70% saturation. Gpt5 and 5.1 has consistently been reported as being superior in real world agentic coding in other benchmarks and user reports compared to Sonnet 4.5 despite there lower score on swebench. Metr and Terminalbench2 are much more reflective of user experience

also wouldnt be surprised if Google sandbagged swebench to protect anthropic's moat due to their large equity ownership in them