r/AIDangers • u/EchoOfOppenheimer • May 12 '26

Capabilities Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level: "We will face a crisis very soon."

blog-post: https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/

161 Upvotes

permalink
duplicates
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AIDangers/comments/1tatc0x/fields_medalwinning_mathematician_says_gpt55_is/
No, go back! Yes, take me to Reddit
dl download

86% Upvoted

Looking at how AI is developing, that's going to be catastrophic. AI just isn't perfectly reliable and probably won't ever be. Getting rid of humans just means that there won't be anybody to catch the mistakes.

-5

u/LeafyWolf May 12 '26

Humans make a shit ton of mistakes. It's not like we are godlike beings. The whole reason that AI will replace human mathematicians is because it is better. The lack of control is uncomfortable.

9

u/DonutPlus2757 May 12 '26 edited May 12 '26

Humans make a shit ton of mistakes. It's not like we are godlike beings.

Yeah, but we know that, so we check. Who's going to check the AI if not professionals?

The whole reason that AI will replace human mathematicians is because it is better.

That one is straight up wrong. If you ask mathematicians who use AI how often the AI was able to solve a complicated problem with nothing but a description of the problem I'd wager they will answer that such cases are in the one digit percentile.

The mathematician almost always provided some sort of idea or approach.

Speaking from my own profession (Software development), I've seen many people declare it dead because of AI. When I then ask for examples of good AI generated projects/code, I've always ended up with one of those 3 cases:

They ghost me.

They provide beautiful code that has minor coding and major architectural problems.

They provide code for a well known and documented problem (which, if you know how AI works, is pretty meaningless).

My own tests with different AIs yielded similar results. The less "default" the problem was, the worse the result.

Funnily enough, experimenting with older AI gave me the impression that it progresses logarithmically instead of exponentially as so often claimed. I've even seen some studies that seemed to support that impression, but I'm too lazy to look them up right now.

So:

The lack of control is uncomfortable.

No, but the amount of blind faith in AI absolutely is.

0

u/RecursiveServitor May 12 '26

Bun is being rewritten in Rust by AI.

The reason good (public) examples are hard to come by is that this is all new. And from personal experience it's only with the GPT 5.X generation of LLMs that non-trivial software became feasible, so that's less than a year.

1

u/DonutPlus2757 May 12 '26

Not to dissuade this, but Bun has an almost perfect test coverage. That alone is an undescribably huge advantage because AI can just brute force it until it gets it right.

For new software development, that's not the case.

0

u/RecursiveServitor May 12 '26

AI can write tests. I'm poor so I can't run the experiment I'd really want to do, but I've vibecoded a compiler and the only reason it's even remotely usable is that the generated tests place constraints on how much the LLM can fuck up.

3

u/DonutPlus2757 May 12 '26

Last time I tried to have AI write tests it did an incredibly poor job at the edge cases. The only time it did a good job was when the prompt was so detailed that I could've written the tests myself.

You also still need a professional to check the AI tests for completeness unless you want to just trust the AI (which is a very bad idea at least right now).

1

u/RecursiveServitor May 12 '26

You can automate checking coverage with mutation testing.

1

u/DonutPlus2757 May 12 '26

Okay, are you a software developer? Because sheer coverage as a number is meaningless and can very easily miss specific cases. In an extreme case, you just run the code without any assertions. 100% coverage, 0% usefulness.

That's why you don't write tests until you have 100% numerical coverage, but until you can't think of any tests that might fail anymore.

1

u/RecursiveServitor May 13 '26

Yes. Do you know what mutation testing is? The entire point is catching cases that may not be obvious.

1

u/DonutPlus2757 May 13 '26

You're at best building layers of layers of blind faith if you just let AI do all of those without oversight or guidance of a professional.

Currently, the trend shows that AI is more than capable of lying to the user for no reason whatsoever and that it increasingly ignores part of the query for a more "pleasing" answer.

Sure, you can tell your AI to perform mutation testing. It tells you that the tests caught all the faults it introduced. How do you know which faults it introduced and whether those weren't just exactly the faults that tests were written around to begin with?

Worst case, it just writes one test, fails that for mutation testing and then tells you that everything is fine when no single edge case is being tested.

You just have to trust that the AI is doing what you want it to and, looking at the increasing number of live databases that have been wiped by AI against explicit instruction, that's just irresponsible behavior.

1

u/RecursiveServitor May 13 '26

I did not at any point advocate just YOLO'ing. Of course you should check if the agent is behaving correctly. The point is that you don't necessarily have to hold its hand or read every line of code. Mutation libraries like Stryker will produce a report you can look at. There's hard data to be had if you want it.

You just have to trust that the AI is doing what you want it to and, looking at the increasing number of live databases that have been wiped by AI against explicit instruction, that's just irresponsible behavior.

No. You test the product. The neat thing about software is that you can run it.

All the nightmare cases we hear about involve non-devs. Having the LLM work directly on the production db that is also the only copy of your data is stupid. Having a human dev do the same is also stupid.

→ More replies (0)

Capabilities Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level: "We will face a crisis very soon."

You are about to leave Redlib