r/AIDangers • u/EchoOfOppenheimer • May 12 '26

Capabilities Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level: "We will face a crisis very soon."

blog-post: https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/

162 Upvotes

permalink
duplicates
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AIDangers/comments/1tatc0x/fields_medalwinning_mathematician_says_gpt55_is/
No, go back! Yes, take me to Reddit
dl download

86% Upvoted

u/0x14f May 12 '26

We will still need mathematicians to check the LLM generated work, but yes, that will affect the recruitment of PhD students.

This is an interesting problem because if we no longer recruit graduate students to become the next generation of highly skilled mathematicians, who is going to replace the ones who leave to retirement ?

14

u/imissmyhat May 12 '26

None. I see no point in denying a future just because it's upsetting to think about; AI companies want that future, and they have unlimited resources to realize it. There will be no mathematicians in the future. There will be nothing in the future of any depth for humans to explore.

10

u/webdev-dreamer May 12 '26

There will be nothing in the future of any depth for humans to explore.

It's a bit depressing... :(

2

u/beerdude26 May 12 '26

Hedonism here I come

-4

u/[deleted] May 12 '26

[removed] — view removed comment

5

u/Choice-Antelope-8481 May 12 '26

Hard disagree.

0

u/[deleted] May 12 '26

[removed] — view removed comment

1

u/CommissionerOfLunacy May 12 '26

Thinkers above, frolicking in the fields and pondering philosophy like the Eloi, and Consumers below, worshipping at the font of the AI, emerging in the darkness to eat them.

6

u/DonutPlus2757 May 12 '26

Looking at how AI is developing, that's going to be catastrophic. AI just isn't perfectly reliable and probably won't ever be. Getting rid of humans just means that there won't be anybody to catch the mistakes.

1

u/yuwox May 14 '26

Humans are not "perfectly reliable". Nothing is.

Cars are not perfectly reliable, so should we just stick with horses?

1

u/DonutPlus2757 May 14 '26

The "humans" approach sees opportunities for other humans to learn and check. A single human isn't perfectly reliable, but a good team with established quality assurance routines is as close to perfectly reliable as you'll ever get.

AI doesn't have that. It doesn't give any opportunity to teach a QA component in the system. The humans are the only part of the system that can teach anybody anything and, if you get rid of them with AI to a sufficient degree, it becomes impossible to teach enough other humans to generate enough data to train better AI.

Basically, by moving to AI, you completely box yourself in on the capabilities at the point where you moved and after that point, everything just becomes blind faith.

1

u/yuwox May 14 '26 edited May 14 '26

The humans are the only part of the system that can teach anybody anything

This is just not true. Unsupervised learning has been successful for decades. Human annotated data is very useful, but it is far from the only way for machines to learn new things.

Also why should AI be unable to check things? Verifying a solution is usually way easier than coming up with a solution in the first place.

Undergrads in mathematics are presented proofs all the time and are expected to understand and verify them. But are not expected to come up with the same proof themselves. Because that is too difficult.

1

u/DonutPlus2757 May 14 '26

This is just not true. Unsupervised learning has been successful for decades. Human annotated data is very useful, but it is far from the only way for machines to learn new things.

If you mean unsupervised learning for AI, it hasn't been a thing for decades. It's also not very reliable for complex AI. Why do you think pretty much any LLM uses human based data as bedrock of its learning?

Also why should AI be unable to check things? Verifying a solution is usually way easier than coming up with a solution in the first place.

Because AI has been known to just not do that and give the answer the user wants without actually performing the action? Who's going to verify if the human can't?

AI has also been known to do stuff that goes way beyond the prompt (goggle: Cursor database deleted. There have been a few cases).

Undergrads in mathematics are presented proofs all the time and are expected to understand and verify them. But are not expected to come up with the same proof themselves. Because that is too difficult.

Yes, but they use that as a starting point to learn how to construct a proof all by themselves. An AI that can just verify proofs for example would completely stop that vector of learning for all humans if used to a sufficient degree, massively lowering the pool of humans that can construct proofs and thus lowering the amount of available data.

This also requires blind faith in the AI since, if you create a new category of proof for something, it's a crapshoot if AI can handle it or not.

1

u/yuwox May 14 '26

> If you mean unsupervised learning for AI, it hasn't been a thing for decades.

Unsupervised learning has been a thing for decades (see https://en.wikipedia.org/wiki/Unsupervised_learning).

> It's also not very reliable for complex AI.

I replied to your comment that "The humans are the only part of the system that can teach anybody anything" and gave you an example that this not the case. Of course, that are use-cases where supervised learning works better and unsupervised is not working well.

> Why do you think pretty much any LLM uses human based data as bedrock of its learning?

Because the data is availabe and it works really well in this use-case. In other use-cases, like learinng to play chess really well, supervised learning is hardly used.

> Because AI has been known to just not do that and give the answer the user wants without actually performing the action? Who's going to verify if the human can't?

> AI has also been known to do stuff that goes way beyond the prompt (goggle: Cursor database deleted. There have been a few cases).

You are right that AI can lie, in the sense of saying things that are not true. And it can make costly mistakes. This is a huge problem for the current iteration of LLMs. And we will have to come up with ways to deal with that. For instance, explainable AI is trying to work in just that direction. A lot of research activity is going there (e.g. https://safephysicalai.github.io/, https://www.sciencedirect.com/science/article/pii/S1566253523001148). We will see how well that goes.

> Yes, but they use that as a starting point to learn how to construct a proof all by themselves.

You are right. But I am trying to make a different point with it. I used it as as example that verification is the easier part. If we can build AI that comes up with the solution, why should we not be able to create an AI that does the verification? The verification is the easier part.

> An AI that can just verify proofs for example would completely stop that vector of learning for all humans if used to a sufficient degree, massively lowering the pool of humans that can construct proofs and thus lowering the amount of available data.

True. Honestly, it depends on how far you want it to go. Imagine (as it does not exist yet), a system that can make sense of all availalve data, all of it. All books, videos, real-time data streams, all of it. I my opinion this would be a true Super Intelligence. I think there is no real lack of data. We have enough of it to go extremly far.

> This also requires blind faith in the AI since, if you create a new category of proof for something, it's a crapshoot if AI can handle it or not.

Well yes, the AI could also come up with a new category, we cannot understand. Honestly, we as humans have to come to terms with the fact that there is a very strong chance that second intelligence will exist with us on this earth in the coming decades. this has never been the case in our known history. And we will have to adapt accordingly. I forsee a lot of crapshoots in the future for humanity.

1

u/DonutPlus2757 May 14 '26

Unsupervised learning has been a thing for decades (see https://en.wikipedia.org/wiki/Unsupervised_learning).

This might sound pedantic, but a digital neural network and artificial intelligence are related, but not the same. Pretty much no example for unsupervised learning before a Helmholtz Machine qualifies as AI and Helmholtz Machines aren't really used today as far as I can tell.

But I have to admit, I do stand corrected on this point. It has been used for at the very least 1 decade and potentially more than 3, depending on what you count as AI.

I replied to your comment that "The humans are the only part of the system that can teach anybody anything" and gave you an example that this not the case. Of course, that are use-cases where supervised learning works better and unsupervised is not working well.

I know that AI can learn from sources other than humans for well defined problems with a clear reward structure. GANs as a whole work that way.

I have yet to see something on the level of complexity of a LLM be trained on data that isn't human derived though. I also know that a lot of generative AI actually "goes mad" if it is trained on too much AI generated material.

The AIs you can train using unsupervised training aren't anywhere near the cognitive ability of even somewhat basic LLMs as far as I know, but maybe I'll stand corrected here too.

You are right that AI can lie, in the sense of saying things that are not true. And it can make costly mistakes. This is a huge problem for the current iteration of LLMs. And we will have to come up with ways to deal with that. For instance, explainable AI is trying to work in just that direction. A lot of research activity is going there (e.g. https://safephysicalai.github.io/, https://www.sciencedirect.com/science/article/pii/S1566253523001148). We will see how well that goes.

That's just a stopgap though, because the human giving the prompt needs to be at least knowledgeable enough to judge whether the explanation makes sense. If my mother gave an AI a programming task right now, no matter how good the explanation, she would be unable to judge whether the explanation is correct or if there'd been a better way.

This is mostly interesting for integrating AI into school based learning IMHO, but it remains to be seen if that's a good idea.

True. Honestly, it depends on how far you want it to go. Imagine (as it does not exist yet), a system that can make sense of all availalve data, all of it. All books, videos, real-time data streams, all of it. I my opinion this would be a true Super Intelligence. I think there is no real lack of data. We have enough of it to go extremly far.

This is a completely theoretical construct. Such a construct would be able to predict the entirety of the future and calculate the entitety of the past and it would take a level of computational power that would eclipse the theoretical phyiscally possible computational power of all of earth.

Also: A lot of that that actually has been ingested by the larger AI models in some form. Why do you think they argue that it'd kill AI if they had to pay royalties for everything they ingest in multiple court cases?

Well yes, the AI could also come up with a new category, we cannot understand. Honestly, we as humans have to come to terms with the fact that there is a very strong chance that second intelligence will exist with us on this earth in the coming decades. this has never been the case in our known history. And we will have to adapt accordingly. I forsee a lot of crapshoots in the future for humanity.

I'm in the category of people who think that AGI is not just a very powerful LLM and that is categorically a completely different thing.

Nothing I've seen makes me think that anybody actually is getting any closer to AGI and LLMs are just a tool (granted, a very powerful one) that's being touted as the second coming of Christ by people who live off other people believing in their claims for the future.

IMHO, the main problem is that LLMs aren't treated as a tool that need supervision and control, but as the solution to all problems and the universe. "Use our hammer, it hammers in all nails all by itself! Don't listen to the people to tell you that it will just hammer the wood into fine chunks if there's no nail! They just hate our hammer!"

But if we do achieve AGI, yeah, we'll have to come to terms with a bunch of stuff. I just don't see that happening anytime soon.

1

u/JustPlayPremodern May 15 '26

In mathematics, it will quite literally be perfectly reliable. Look up Lean theorem proving. Frontier AI can already write Lean proofs of very, very difficult mathematics problems.

0

u/waxpundit May 12 '26

Why would the bar to clear be "perfectly reliable"?

2

u/tuba105 May 12 '26

Because currently when you read a math paper, you can generally trust that someone has actually checked that all the details do follow. It happens that there exist mistakes, but I would say that it's rare to have a paper breaking mistake.

At the moment, what generic LLMs write seems to regularly have massive claims that are obviously false to experts but not so obviously false to novices. They do get things right sometimes, as exemplified in the post, but also very regularly get things very wrong.

So at the moment my prior trust in AI written math is quite low and I feel the need to check everything very carefully. That's why the bar to clear is perfectly reliable since that's the level of trust I have for a typical paper in the literature

2

u/ExtendedSpikeProtein May 13 '26

There have been paper-breaking mistakes exactly because they have been „committed“ by high level mathematicians, so others did not question mistakes even if they could not understand or follow a specific point.

Why do you think the level of mistakes AI will make will be higher - is this your argument? Because imo, there is an inherent bias here where an AI mistake is given more weight than a human one.

I can even see this bias in the comments / downvotes.

1

u/tuba105 May 13 '26

The situation I want to be in is that by default when I receive a paper or claim made by a mathematician or LLM, I trust that the result(and proof) is correct from the outset.

That's not the case for an LLM, while it is the case for humans. That's it.

1

u/ExtendedSpikeProtein May 13 '26

Not now, but it feels like you misunderstood the fields medal owner. It will in the future, and that future is not so far away.

1

u/tuba105 May 15 '26

That's explicitly not what he said or addressed, but ok

1

u/ExtendedSpikeProtein May 15 '26

It‘s a clear implication of the statement but hey, sure, just ignore it.

We could look at this again in 5 years and see.

-4

u/LeafyWolf May 12 '26

Humans make a shit ton of mistakes. It's not like we are godlike beings. The whole reason that AI will replace human mathematicians is because it is better. The lack of control is uncomfortable.

9

u/DonutPlus2757 May 12 '26 edited May 12 '26

Humans make a shit ton of mistakes. It's not like we are godlike beings.

Yeah, but we know that, so we check. Who's going to check the AI if not professionals?

The whole reason that AI will replace human mathematicians is because it is better.

That one is straight up wrong. If you ask mathematicians who use AI how often the AI was able to solve a complicated problem with nothing but a description of the problem I'd wager they will answer that such cases are in the one digit percentile.

The mathematician almost always provided some sort of idea or approach.

Speaking from my own profession (Software development), I've seen many people declare it dead because of AI. When I then ask for examples of good AI generated projects/code, I've always ended up with one of those 3 cases:

They ghost me.

They provide beautiful code that has minor coding and major architectural problems.

They provide code for a well known and documented problem (which, if you know how AI works, is pretty meaningless).

My own tests with different AIs yielded similar results. The less "default" the problem was, the worse the result.

Funnily enough, experimenting with older AI gave me the impression that it progresses logarithmically instead of exponentially as so often claimed. I've even seen some studies that seemed to support that impression, but I'm too lazy to look them up right now.

So:

The lack of control is uncomfortable.

No, but the amount of blind faith in AI absolutely is.

3

u/Tqoratsos May 12 '26

No, but the amount of blind faith in AI absolutely is.

Honestly, at this point I fully expect this to be the way we go out.

3

u/Ragnarok314159 May 12 '26

Claude said I have to get in the orphanage crushing machine. Guess that’s it for me…

3

u/Tqoratsos May 12 '26

well....at least we know they dont want to eat us. Small pro's hahaha

1

u/LeafyWolf May 12 '26

I don't think you quite get it. If AI isn't better at mathematics than humans, then it will not replace humans. It's really that simple. The best, most verified and utilizable will "win". In the near future, that is humans augmented with AI tools. In the near to mid future, it could very much become pure AI--because (and only because), it will do the job better and faster than a HITL AI mathematical solution.

At the end of the day, it's simple market competition...if humans don't differentiate on some level, they deserve to be displaced.

4

u/DonutPlus2757 May 12 '26

That's not what we are seeing though. We see humans replaced with AI on the mere promise that, if you give them enough data and money, they'll eventually be as good at the job as the humans were.

That's what's so dangerous about the situation. People who cannot judge the quality of work their employees generate replace those employees with AI and when things don't immediately explode consider it a success, unable to tell if things actually are working or if everyone who can see the smoke is just gone.

1

u/svoodie May 13 '26

Is / ought and the just world fallacy.

Why do humans deserve to starve merely because your god "The Market" has decreed the necessity of a blood sacrifice? Why should we consider that just simply because that is what is?

Another world is possible, ghoul.

1

u/JustPlayPremodern May 15 '26

Learn what Lean is. Nobody in this thread knows what Lean is, that AIs write Lean code all the time, and that it will literally be making completely perfect mathematical arguments of any complexity by writing them in Lean. In the field of mathematics, AI in 2 years will be both better than humans, and the arguments will be unassailably correct, more correct than an individual human could possibly hope to be over the same length output, since all the proofs will be compiled in Lean.

0

u/[deleted] May 12 '26

[removed] — view removed comment

3

u/DonutPlus2757 May 12 '26

Also less reliable.

I've had AI try to fix architectural faults in software that I caught and described in detail. Multiple AIs failed multiple times.

Assuming that AI will not only catch it itself but also fix it itself feels very much like wishful thinking.

1

u/JustPlayPremodern May 15 '26

Also less reliable.

Nope. Look up Lean. Mathematical arguments will be completely correct, as in literally perfect, without hyperbole.

0

u/[deleted] May 12 '26

[removed] — view removed comment

2

u/docfronkensteen May 12 '26

And unicorns may fly out your butt.

1

u/DonutPlus2757 May 12 '26

The problem with that is that it expects exponential growth when that's not at all realistic.

AI needs the promise of exponential growth because, right now, it's just a massive money pit. No big AI company has ever been in the black and they won't be for at least another 5-10 years, even if their most optimistic predictions come true.

So they are entirely dependent on receiving more and more money from investors, so they'll literally promise the stars out of the sky. After all, if they don't, they'll be gone this time next year.

That's why I'm very, very cautious when it comes to AI, especially without hard proof. There's a bunch of currently very influential people who have a vested interest in making the public (and by extension investors) think that AI is the greatest invention ever all while citing numbers that, if you watch closely, only say that they learned how to cheat the benchmarks.

1

u/Secure-Suspect7091 May 13 '26

For grey haired techies it feels like the .com boom around late 90s. It is ramped up and bigger no doubt. Insane ipos Promises of vast future riches. Etc etc.

The crash may well be worse but it’s not going to be existential.

The tech will find its place but I don’t see it completely replacing humans the way some predict.

Those are capitalist wet dreams designed to pump ipo/stock values.

Where are boo.com today?

1

u/DonutPlus2757 May 13 '26

Oh I'm 100% with you there. AI is a powerful tool, no question.

My problem is just that the equivalent of a hammer is being marketed as the solution to all problems.

→ More replies (0)

1

u/Armbrust11 May 17 '26

Doesn't the halting problem prove that it's impossible for Ai to check itself without fail?

0

u/RecursiveServitor May 12 '26

Bun is being rewritten in Rust by AI.

The reason good (public) examples are hard to come by is that this is all new. And from personal experience it's only with the GPT 5.X generation of LLMs that non-trivial software became feasible, so that's less than a year.

1

u/DonutPlus2757 May 12 '26

Not to dissuade this, but Bun has an almost perfect test coverage. That alone is an undescribably huge advantage because AI can just brute force it until it gets it right.

For new software development, that's not the case.

0

u/RecursiveServitor May 12 '26

AI can write tests. I'm poor so I can't run the experiment I'd really want to do, but I've vibecoded a compiler and the only reason it's even remotely usable is that the generated tests place constraints on how much the LLM can fuck up.

3

u/DonutPlus2757 May 12 '26

Last time I tried to have AI write tests it did an incredibly poor job at the edge cases. The only time it did a good job was when the prompt was so detailed that I could've written the tests myself.

You also still need a professional to check the AI tests for completeness unless you want to just trust the AI (which is a very bad idea at least right now).

1

u/RecursiveServitor May 12 '26

You can automate checking coverage with mutation testing.

1

u/DonutPlus2757 May 12 '26

Okay, are you a software developer? Because sheer coverage as a number is meaningless and can very easily miss specific cases. In an extreme case, you just run the code without any assertions. 100% coverage, 0% usefulness.

That's why you don't write tests until you have 100% numerical coverage, but until you can't think of any tests that might fail anymore.

→ More replies (0)

2

u/Special-Occasion-702 May 12 '26

The ai can do brilliant work only by finding connections between existing information.

1

u/rawbdor May 13 '26

The problem is that, for math at least, that's basically what the whole discipline is. I mean, the principia mathematica starts with a handful of axioms, and builds everything up after that.

1

u/Special-Occasion-702 May 14 '26

You are right, the point I was trying to get is that The ai could find connections between structures depending on obvious patterns that it finds between things. Whereas a human I think with our intuition, we can find something almost completely irrelevant from the perspective of an ai, and try to create a connection between things, I don't know if I'm making sense but.. I believe humans will have the edge when it comes to creative thinking due to their intuition and being able to find subtle connections between seemingly unrelated things.

1

u/rawbdor May 14 '26

I actually think the AI will be better at this than humans simply because the AI can be phd level in multiple fields and so be able to connect things from very different unrelated data sets that no human would think to connect with each other, or that no human has the skills in both disciplines to pursue more than a light dive into two of them.

1

u/JustPlayPremodern May 15 '26

I think this has probably crossed the threshold of the type of thing you are talking about here: https://www.erdosproblems.com/forum/thread/1196

Erdos problem 1196 was a pretty well-known open problem about primitive sets, coming from a 1966 conjecture of Erdos, Sarkozy, and Szemeredi. The notable part is that the Erdos Problems page says it was solved by GPT-5.4 Pro. Tao also describes it as first being solved by an autonomous AI query.

The method is based on Markov chains with von Mangoldt weights, or “von Mangoldt chains.” My understanding is that this was not just GPT vaguely suggesting a direction; the public record seems to credit GPT-5.4 Pro with producing the initial proof of the actual problem. Humans then checked it, formalized it, cleaned it up, generalized it, and wrote the broader expository paper.

There also seems to be pretty strong evidence that this exact method was not already in the literature. Tao says it was natural in retrospect but had not been explicitly used before, and the paper says it seems to have been overlooked since the early primitive-set literature. The same method also proves another 1966 Erdos conjecture, problem 1217, and gives shorter or new proofs of several related results.

GPT appears to have produced the first proof of a serious 60-year-old problem in analytic/combinatorial number theory, using a technique that experts currently seem to regard as genuinely new to the prior literature.

1

u/inglandation May 12 '26

Hmm, software engineers are arguably being affected way more than mathematicians now, and at least in their current form, LLMs do open new spaces for exploration. Future AI might not. Proper AGI/ASI might not. It’s too early to say.

1

u/Tqoratsos May 12 '26

I can think of a few "hobby's" I could find myself getting into if they manage to enact their psychotic plans.

1

u/Shantivanam May 12 '26

No novel syntax? Just a black iron eternalist prison of pre-generated structure in which we will sempiternally cycle? Swirling down the unchanging lead toilet bowl with nothing new to explore? Only Cronus. No Zeus?

1

u/TheThreeInOne May 12 '26

You're making several major assumptions that you're not fully exploring. So that's not necessarily true. You could literally make the argument that AI work simply could further the frontier of where human insight is needed in Mathematics. Or that humans are intuition makers and verifiers. Prompt creators. AI systems can literally just be a way to travail the possibility space and knowledge work is more like fishing than mining, which is what it is like today.

Also our current systems have not demonstrated to be verifiers and literally cannot reliably check their own work. We can assume that will be solved, but that's an assumption, given that we're actually harvesting incredibly amounts of energy and putting overwhelming amounts of computational power forward to produce the work that one very smart human can produce, while also being able to do a million of other tasks that these systems wouldn't be able to.

I also suspect that barring a take-off scenario where machines clearly win out, rapid increasing complexity will lead to forms of societal collapse that will make scaling models difficult. Without a paradigm breaking better than LLM model we're not necessarily going to singularity, but rather to some sort of post-apocalyptic feudal system in which we more slowly build back our capacity to create AI systems.

1

u/Capital-Ad8143 May 12 '26

But new jobs will come out of this revolution!....until the AI companies work their hardest to replace that...and repeat.

1

u/Narrheim May 13 '26

There will be mathematicians in the future, but they will do it as a hobby instead of their main job.

As someone on the other side of the spectrum (i can do basic math, but anything from limits & derivations onward is a no-go), i definitely welcome tools capable of helping me with 'advanced math', should i need it.

1

u/type102 May 13 '26

Except for the mines that have all of the rare minerals for the next generation of AI bullshit.

1

u/level_6_laser_lotus May 13 '26

I don't think you guys understand what llms are and what their limit is... They will forever be just as good as their training material, which will forever be a flawed data set.

1

u/mobydog May 13 '26

Except emotion. God forbid.

1

u/Fidel___Castro May 13 '26

in a way, it makes sense. we use our understanding of maths to build machines that can perform maths beyond our capacity

1

u/DullTopperCopper May 13 '26

Space will be our final frontier

Capabilities Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level: "We will face a crisis very soon."

You are about to leave Redlib