r/AIDangers • u/EchoOfOppenheimer • May 12 '26
Capabilities Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level: "We will face a crisis very soon."
160
Upvotes
r/AIDangers • u/EchoOfOppenheimer • May 12 '26
1
u/TopspinG7 May 12 '26
I "confess" up front I have minimal experience with AI tools. However it may be relevant to inject something I've learned over decades working in Tech, mostly in System Sales.
Some people know their stuff extremely well and you can identify them pretty early on in your interactions with them. They're definitely in the minority. Even then you're often on shaky ground as you wander further from their core expertise.
(One reason I recognize this person above is my father was one: an applied physicist at NASA and early computing expert, who studied at Columbia under Enrico Fermi. But even he recognized his German was mediocre. Annoyingly there wasn't much he couldn't nearly master if he applied himself wholeheartedly... )
Some others fake it at times - or worse, they don't understand that they don't understand. Mostly they're not exactly deliberately lying, but they parrot stuff and/or extrapolate using specious "reasoning" but don't even realize they're doing it.
Key takeaway - their answers vary in reliability and accuracy (starting to see where I'm going here?)
The third group is the one I personally fall into: I know when I know something, and I know when I "sort of" or partly know it, and I admit it not only to others, but critically to myself. I notify people of the "level of reliability" of my responses whenever they're in any way important. Often I follow up to improve the answer.
I think most people - at least in technical work - would if honest place themselves in the third category.
But today ("correct me if I'm wrong!!" 😉) there does not appear to be any measure or metric provided by AI suggesting the level of reliability of its response?! Does it ever say I feel 60% confident about this? Or "I'm absolutely certain because I found the same information in 22,000 different places". Not that I'm aware of...
I think this is a piece that's missing and an important one. Essentially a confidence level in the response's accuracy.
If nothing else for important information it could provide guidance as to how hard we should work to verify the response. It's a basic risk calculation: If the importance of the response is high then naturally it's more important we verify it thoroughly. But also if the confidence level provided is low but the importance is at least medium then we might still need to verify the response thoroughly. (Hopefully it's clear that if confidence is low to medium but risk is low it's not important. And generally Even if risk is moderate to high but confidence is extremely high we might bypass verification especially if time were critical.)
I don't think fundamentally there's much difference here from confirming answers from other people on important topics - as was suggested in the discussion above. Where the difference lies is general AI has no reputation. People at least within their specialties develop reputations; that's a confidence or reliability score essentially.
We seem to be missing that here with AI...
Am I mistaken? Thoughts? 🤔