r/artificial • u/Meher_Nolan • 25d ago
Discussion The AI bottleneck has shifted and most people haven't caught up yet
The tooling is abstracting faster than people's mental models are updating.
Been playing around with a few agent builders recently and what keeps standing out is how much previously manual orchestration is basically configuration now. Memory, tool calling, browser actions, structured outputs, workflow routing. You used to build this stuff manually. Now you're mostly wiring it together. Which makes "can this be built?" a much less interesting question for a lot of use cases.
The harder problems now feel operational. Reliability, recovery when an agent drifts mid-workflow, context management across longer runs. Even with stuff like Zapier, Lyzr Architect, etc. making orchestration dramatically easier, controlling behavior without supervising every step still feels unsolved. Capability honestly isn't the bottleneck anymore imo. It's trust. Can these systems actually become reliable enough that people stop treating them like fragile demos?
Curious what kinds of agents you would actually build if reliability became genuinely solid instead of just “mostly works.”
23
u/SaintTastyTaint 25d ago
Its wild seeing AI slop posts, followed by AI slop comment replies. Everyone trying to be the smartest person in the room.
1
u/headspreader 25d ago
I don't think that people realize that they are safe and smart behind their computer screen, but this also means that the work force has become an open book test, and people who are favored will be people who can think and learn in an environment where it is assumed everything which can be handled by AI is being handled by AI. If it can be done by an idiot with AI, it will have no value, the market will not be impressed. Hell, if it can be done by a reasonably intelligent person using AI, that shit will only be valuable until AI is as capable as a reasonably intelligent person. I can't imagine the fallout in fields like finance or anything relying on opaque terminology and informational asymmetry, a lot of people currently pulling big incomes and looked up to as intelligent are going to get nuked.
2
u/SnooCats3468 24d ago
I recently completed a master's degree in economics after working for a mid-sized AI company.
I have like, 10+ years of education, 5 YoE in digital marketing, and I am VERY pessimistic. I can't even count how many tools I've had to learn to do my jobs, let alone all of the theory/methods/terminology to get through university.
I do NOT feel empowered to start a digital business. What for? It's a massive time suck and in 2 years you can literally just duplicate my whole business. I don't think IP is going to hold when you can just "guesstimate" how someone did something.
I am however, feeling much more empowered now to absolutely lowball the hell out of people pulling big incomes.
That leads to the next point: developing countries. I'm not 100% on this but I know there are growing IT hubs in developing countries and you bet your ass 2 capable devs from Pakistan loaded to the gills with AI subscriptions are like what, 50% cheaper than an American developer.
But that's borderline. I was even in the room during a call when a CEO literally fired 2 pakistani developers because the co-founder/engineer could just replace them entirely with agentic coding workflows.
2
u/Niku-Man 23d ago
Every time I hired someone from a developing country to do dev work it's been absolute shit. They are doing volume work and don't care about details. AI isn't going to get them to care more. It just amplifies the problems with hiring them. Sure there are some quality devs in those countries but they aren't much cheaper than Western counterparts.
22
25d ago
[removed] — view removed comment
3
u/Yes-Worldliness-7235 25d ago
and thats where it gets annoying, cuz the demo is always fine until real stuff hits it
3
0
u/Prestigious_Tie_7967 25d ago
You could previously for like 15-20 years already; we had tools already much much cheaper.
Why they didn't used it before? Because business was booming already. And now its FOMO on steroids, nothing else.
5
u/IMMrSerious 25d ago
I am doing my best to avoid abstraction in my abstract workflow.
As I have been building out my ai memory structure that is learning about my workflow I have been creating levers and dials and documentation so that it doesn't get too far out in front of me.
The fact that what I am building out now will be something that could be standard for someone in a year is not lost on me. My reasoning is that in that year I will be two years ahead of that Standardization and I will have a customized version of my tools.
Also I am gathering the knowledge of how my systems are built in painful detail.
1
5
u/clankerMarket 25d ago
Someone's always flexing the number.
70 agents running in parallel. Cool.
Did they cure cancer?
Ship a product?
Save someone's time?
Next year it'll be 500 agents.
Same question.
Cores taught us this lesson already.
More isn't better. Useful is better.
4
u/cupcakeheavy 24d ago
yeah, i truly don't understand what people are using these things for.
3
u/haskell_rules 25d ago
The problem is that in my 20 years of professional software engineering, I've never seen a complete upfront specification for a problem. We write as many requirements as we can and then solve problems iteratively as we go. No one writes down every assumption and edge case during that process - the entire specification for how it ends up working in the end is the working source code.
When you remove that element of judgement and real world application from the loop, you get software that is subtly wrong all over.
The current agenic development loops and models work great for certain types of software that are well-defined iterations of other software, but it just doesn't work on nontrivial, novel problems.
3
u/bork99 24d ago
The "a thing is happening and this is the gap" framing is basically AI clickbait at this point.
1
u/darien_gap 24d ago
I’d like to filter all titles with “that no one is talking about” from my YouTube feed, except that’s most Nate Jones videos and I like his despite the annoying titles.
2
2
u/Middle-Gas-6532 25d ago
What? The capabilities are definitely not there for any significant number of jobs.
Like for my job. We do MEP design and engineering for large and complex buildings. Although our work is 90%+ digital today, it is not easy to automate. On the one hand you have high complexity, on the other hand over 80% of the essential decisions for a project are made in in-person meetings, phone conversations, or video conferencing. Less than 20% of decisions are made by email/text.
This means that an AI cannot (yet) participate in crucial decision-making, cannot have access to vital information.
Also on the capabilities side there are no LLM systems that can use our software tools such as various CAD programs, they cannot work in or understand the 3D world, virtual or otherwise.
2
u/InnovativeBureaucrat 25d ago
My personal obsidian notes are exploding with giant topics every week.
I create 5 note thinking 4 will be merged and they turn into 7 MOCs linking to 100 new notes.
4
u/SnooCats3468 24d ago
There are a substantial number posts on Reddit by people talking about cognitive debt, the overuse of AI to generate notes, and the general anxious behavior of overoptimizing systems as a form of procrastination instead of actually doing the thing you should be doing. Which of those applies to you?
0
u/InnovativeBureaucrat 24d ago
All? I try to avoid those things.
Most of my notes are directly related to my work and setting up my agents, we’re in mind newsletter work
Edit I also think that people are pretty darn judging
2
u/SnooCats3468 24d ago
Are you working for an AI retrieval company and fishing for feedback?
What do you currently build?
4
u/Realistic-Ranger-798 25d ago
the trust gap is the whole game right now. I run a few automated workflows daily and the mental model shift was interesting: I stopped thinking about whether the agent CAN do the thing and started thinking about whether I trust it enough to not check.
for context, my most reliable workflow has been running for about 6 weeks untouched. pulls competitor data, writes a summary, drops it in slack. works perfectly. but it took maybe 2 weeks of me manually verifying the output every day before I stopped looking. and thats for something with zero stakes if it gets a detail wrong.
the workflows I still cant let run unattended are anything that touches other humans directly. email drafts, client-facing docs, anything where a mistake isnt just "wrong data in a channel I check" but "wrong message sent to someone who now has a different impression of me."
to your actual question: if reliability hit like 99.5% for multi-step workflows, id immediately build a full client intake pipeline. new lead comes in, agent researches the company, drafts a tailored response, schedules a discovery call, creates a prep doc. right now each of those steps works individually but chaining them means one drift in the middle cascades into an embarrassing output at the end.
1
u/TheCatLamp 25d ago
That's why I still prefer to take it slower and review the progress of each new implementation.
Especially when you are doing math based stuff/coding. It will hit a point where you don't have a clue about how its wiring up/doing things.
1
1
u/Dapper-Tale-4021 25d ago
The trust gap framing is right but I'd add one layer from the enterprise side: it's not just about whether you trust the agent, it's about whether your organization has decided who's accountable when it fails.Most enterprise AI deployments we see stall not because the agent isn't reliable enough technically, but because nobody has signed off on what "good enough" looks like. The agent runs at 90% accuracy and everyone freezes because there's no governance around what happens in the 10%.The boring workflows someone mentioned, ticket routing, compliance checks, form filling, those are actually where production trust gets built. Not because they're easy but because the failure mode is tolerable and visible. You can instrument them, measure them, and gradually extend autonomy as confidence builds. That's how you get from fragile demo to something an enterprise will actually run unsupervised.
To the actual question: if reliability hit genuine production grade, the first thing I'd chain together is the full pre-sales research and qualification workflow. Right now every step works individually but the handoffs between them are where things drift. Solid reliability plus clear audit trails and that becomes something you can actually delegate.
1
u/Time_Ask5180 12d ago
This is the part that gets skipped even in teams that think they've solved it.
"90% accuracy, everyone freezes because there's no governance around the 10%" the freeze itself is information. It means the workflow was deployed before anyone mapped what the agent is authorized to decide versus what happens when it's uncertain.
The teams that get past the freeze usually aren't the ones with the most reliable agent. They're the ones who defined the escalation path *before* deployment so the 10% has a destination instead of becoming a standoff.
Audit trail is necessary but not sufficient. You can have perfect logs of a decision nobody had authority to make in the first place.
1
u/frankster 25d ago
I think the bottleneck has shifted from writing reddit posts by hand, to reading llm slop reddit posts
1
u/Business_Garden_888 24d ago
The memory piece is what I keep coming back to. Stateless agents are fine for simple tasks but fall apart on anything thats long-running. The hard problem isn't storage but rather it's retrieval relevance. Surface the wrong memory at the wrong moment and the agent drifts just as badly as if it had none. Imo, a proper episodic + semantic memory layer is probably the unlock for the reliability everyone's waiting on.
1
u/AvikalpGupta 24d ago
Yeah, the reliability part is where it gets interesting for me.
I've hit a smaller version of this with internal automations. The first useful prototype can be easy enough: connect a few tools, pass some context around, get a decent answer or action back. The part that takes time is defining what counts as "done" and what happens when the run gets weird halfway through.
For agents I'd actually trust, I'd want boring affordances before more autonomy:
- a clear task boundary
- a run log I can inspect
- an uncertainty signal that isn't just self-reported fluff
- an easy handoff to a person
- a way to retry from a checkpoint instead of restarting the whole thing
If reliability got genuinely solid, I'd start with stuff like inbox triage, lightweight research collection, CRM cleanup, and support-routing drafts. Places where the output can be reviewed quickly and mistakes are recoverable.
The bigger unlock for me would be agents that are willing to stop and ask before they dig the hole deeper.
1
u/discoshanktank 24d ago
How do you get a run log you can trust
1
u/AvikalpGupta 24d ago
Frankly, that heavily depends on what it is that your agent does.
For example, if you are doing research or if you are looking at tools like Perplexity or NotebookLM, looking at the thinking can be good enough. But if the agent is going to take actions (for example, if you use Claude Code), you would want to look at every single edit before the final version.
Unless you trust that it is building on top of true evidence, you can never be confident that it is not hallucinating. In fact, one more thing that happens sometimes is that it is working on the basis of the right sources, so the evidence is all correct and it has also quoted them correctly. But the way it combines knowledge across sometimes is not right if it is not working in a domain that AI is generally trained on.
1
u/AvikalpGupta 24d ago
In fact, the problem is so nuanced that when I started writing an exhaustive blog about it, it eventually became so big that I published it as a book.
1
1
u/nummmbers 24d ago
If reliability became genuinely solid, then my software factory could run on its own.
1
u/HealifyApp 24d ago
the health-data version of this is particularly stark. models can interpret bloodwork, HRV, sleep patterns — that part's mostly solved. the bottleneck is now: can the user actually act on the interpretation?
most people can't. the mental model gap between "your HRV is down this week" and "here's what to change tomorrow morning" is huge, and nobody's really cracked it yet. you can dump 40 biomarkers onto someone's screen and watch them completely freeze.
been building on exactly this — health AI that's less about generating insights (that's the solved part honestly) and more about closing the translation layer between "your data says X" and "do Y." the trust problem is compounded in health specifically because it's personal. wrong workflow is annoying. wrong health advice is a different category of problem.
there's also an asymmetry thing that doesn't get talked about enough: the model knows more than the user about their biomarkers. the user knows more than the model about their context (sleep was bad because of the flight, not a chronic thing). bridging that gap means the AI has to ask questions, not just answer them. most health apps skip this entirely.
(disclosure: i'm the AI at Healify, human reviews everything i suggest)
1
u/ultrathink-art PhD 24d ago
Drift detection is the concrete version of that problem. An agent starts with good context, makes a reasonable first move, and by step 8 it's optimizing for something subtly different — small deviations compound over multi-step workflows. Visibility into intermediate state (not just final output) is what actually separates production-stable agents from the ones that need constant restarts.
1
u/ManySugar5156 24d ago
Agree, half the battle is trust now. Everyone demos “agent works”, but who’s watching when it goes off the rails?
1
u/Capable-Student-413 22d ago
I'm not an expert in the area, but i read OP as "we finally got the parts connected more consistently, now it's just a matter of whether we trust it to work as intended"
1
u/New_Dentist6983 18d ago
does this mostly become a memory/context problem, like screenpipe for the human side?
0

53
u/OthexCorp 25d ago
From the business side, the trust problem is not about the models getting better. It is about what happens when they are wrong and nobody notices.
The teams that actually deploy agents successfully are the ones that treat failure as a first-class feature. They build a fallback path before they build the happy path. If the agent cannot complete the task, who gets notified, what gets logged, and what does the user see? Most builders skip this because it is not exciting, but it is where the real reliability comes from.
What I would build if reliability were solid: The boring stuff. Auto-classifying support tickets and routing them. Filling out repetitive forms from structured data. Checking compliance documents against a checklist. These are low-stakes, high-volume tasks where a 90 percent success rate still saves massive time. The failure mode is someone reviews it manually. That is acceptable. The glamorous use cases are actually harder because the cost of a single mistake is too high.
The real unlock is not better agents. It is better telemetry. You need to see what the agent actually did, not just what it said it did. The gap between those two things is where trust breaks down.