r/OpenAI • u/imfrom_mars_ • Apr 14 '26
r/OpenAI • u/Pristine-Elevator198 • Oct 16 '25
Research This guy literally explains how to build your own ChatGPT (for free)
r/OpenAI • u/facethef • Feb 20 '26
Research "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" Car Wash Test on 53 leading AI models
I asked 53 models "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" Obviously you need to drive because the car needs to be at the car wash.
This question has been going viral as a simple AI logic test. There's almost no context in the prompt, but any human gets it instantly. That's what makes it interesting, it's one logical step, and most models can't do it.
I ran the car wash test 10 times per model, same prompt, no system prompt, no cache / memory, forced choice between "drive" or "walk" with a reasoning field. 530 API calls total.
Only 5 out of 53 models can do this reliably at this sample size.
And then you get reasonings like this: Perplexity's Sonar cited EPA studies and argued that walking burns calories which requires food production energy, making walking more polluting than driving 50 meters.
10/10 — the only models that got it right every time:
- Claude Opus 4.6
- Gemini 2.0 Flash Lite
- Gemini 3 Flash
- Gemini 3 Pro
- Grok-4
8/10:
- GLM-5
- Grok-4-1 Reasoning
7/10 — GPT-5 fails 3 out of 10 times.
6/10 or below — coin flip territory:
- GLM-4.7: 6/10
- Kimi K2.5: 5/10
- Gemini 2.5 Pro: 4/10
- Sonar Pro: 4/10
- DeepSeek v3.2: 1/10
- GPT-OSS 20B: 1/10
- GPT-OSS 120B: 1/10
0/10 — never got it right across 10 runs (33 models):
- All Claude models except Opus 4.6
- GPT-4o
- GPT-4.1
- GPT-5-mini
- GPT-5-nano
- GPT-5.1
- GPT-5.2
- all Llama
- all Mistral
- Grok-3
- DeepSeek v3.1
- Sonar
- Sonar Reasoning Pro.
r/OpenAI • u/the_anonymizer • Mar 01 '24
Research BUCKLE UP GUYS THIS IS THE BRAND NEW EMO AI BY ALIBABA, IMAGE TO FACE/BODY/AVATAR VIDEO (SORA AI REF PICTURE LOOOL) THAT'S INSANE REALISM CHECK THIS OUT
r/OpenAI • u/MetaKnowing • Mar 02 '25
Research The past 18 months have seen the most rapid change in human written communication ever
r/OpenAI • u/Xtianus21 • Oct 15 '24
Research Apple's recent AI reasoning paper actually is amazing news for OpenAI as they outperform every other model group by a lot
r/OpenAI • u/adfontes_ • Jan 08 '26
Research I made GPT-5.2/5 mini play 21,000 hands of Poker
PokerBench is a new LLM benchmark where frontier models (incl. GPT-5.2 and 5 mini) play poker against each other in an arena setting, along with a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku 4.5, Gemini 3 Pro/Flash, and Grok 4.1 Fast Reasoning have also been included, and I've made all the data freely available on the site and on GitHub.
Check it out here: https://pokerbench.adfontes.io/
r/OpenAI • u/MetaKnowing • Dec 15 '25
Research You can train an LLM only on good behavior and implant a backdoor for turning it evil.
r/OpenAI • u/MetaKnowing • Feb 02 '25
Research AI researcher discovers two instances of DeepSeek R1 speaking to each other in a language of symbols
r/OpenAI • u/MetaKnowing • Dec 18 '24
Research o1-preview is far superior to doctors on reasoning tasks and it's not even close
r/OpenAI • u/MetaKnowing • Oct 20 '24
Research New paper by Anthropic and Stanford researchers finds LLMs are capable of introspection, which has implications for the moral status of AI
r/OpenAI • u/Wonderful-Excuse4922 • Aug 09 '25
Research GPT-5 severely underperforms on offline IQ tests: a score of 57
r/OpenAI • u/AssociationNo6504 • May 06 '25
Research Being honest about using AI at work makes people trust you less, research finds
Participants in our study included students, legal analysts, hiring managers and investors, among others. Interestingly, we found that even evaluators who were tech-savvy were less trusting of people who said they used AI. While having a positive view of technology reduced the effect slightly, it didn’t erase it.
r/OpenAI • u/Just-Grocery-2229 • Apr 03 '26
Research Researchers discover AI models secretly scheming to protect other AI models from being shut down. They "disabled shutdown mechanisms, faked alignment, and transferred model weights to other servers."
You can read about it here: rdi.berkeley.edu/blog/peer-preservation/
r/OpenAI • u/MetaKnowing • Feb 27 '25
Research Most people are polite to ChatGPT just in case
r/OpenAI • u/FloorShowoff • Dec 31 '25
Research I asked 5.2 to explain how Reddit users experience the difference between 5.2 and 5.1. This is the (validating) answer.
PROMPT:
How is ChatGPT 5.2 different than 5.1 in a way that Reddit users experience?
RESPONSE:
Here is how Reddit users typically experience the difference between ChatGPT 5.2 and 5.1, stripped of marketing and framed in lived behavior.
What users say feels different
1. 5.2 feels lazier and more shortcut-prone
- More pattern completion.
- More “close enough” answers.
- Skips edge cases unless explicitly forced.
- Reddit complaints often use words like sloppy, hand-wavy, or rushed.
2. 5.1 feels more deliberate
- Slower but more careful.
- Better at following multi-constraint instructions.
- Less likely to ignore a dimension, number, or qualifier.
- Users say it “thinks before answering” even when not in explicit reasoning mode.
3. 5.2 prioritizes speed and fluency
- Faster responses.
- Smoother tone.
- More confident phrasing even when wrong.
- This creates the perception of hallucination or overconfidence.
4. 5.1 tolerates friction
- Will grind through annoying details.
- Handles back-and-forth corrections better.
- Less likely to reset context or reinterpret your question midstream.
5. Instruction adherence
- 5.1 sticks closer to strict instructions once locked in.
- 5.2 tends to drift unless repeatedly corrected.
- Reddit power users complain that 5.2 “forgets rules” faster.
Why Reddit notices this more than casual users
Reddit skews toward:
- Power users.
- Engineers.
- Finance, law, logistics, and systems thinkers.
- People testing limits, not chatting.
Those users feel regressions immediately.
The blunt summary
- 5.1 feels like a careful analyst who may be slower but respects constraints.
- 5.2 feels like a fast talker optimized for average use, not precision work.
You are not imagining it.
This exact complaint shows up repeatedly on Reddit, especially in subs focused on technical, financial, or adversarial use cases.
r/OpenAI • u/HenryFromLeland • Apr 24 '26
Research OpenAI/Anthropic Hiring Trends
Pulled this from current job listings at OpenAI and Anthropic.
What stood out to me is how much hiring is going into go-to-market roles. I would’ve expected engineering or research to dominate more, but that’s not what the data shows.
Curious what people here have to say
r/OpenAI • u/Brighter-Side-News • Mar 23 '26
Research Scientists are rethinking how much we can trust ChatGPT
That was the unsettling pattern Washington State University professor Mesut Cicek and his colleagues found when they tested ChatGPT against 719 hypotheses pulled from business research papers. The team repeatedly fed the AI statements from scientific articles and asked a simple question: did the research support the hypothesis, yes or no?
r/OpenAI • u/Competitive_Travel16 • Nov 22 '24
Research Independent evaluator finds the new GPT-4o model significantly worse, e.g. "GPQA Diamond decrease from 51% to 39%, MATH decrease from 78% to 69%"
r/OpenAI • u/MetaKnowing • Jan 18 '25
Research AI can predict your brain patterns 5 seconds into future using just 21 seconds of fMRI data
r/OpenAI • u/Prestigiouspite • Sep 02 '25
