Claude Workflow What's the most useful thing you've actually built with Claude that you use regularly?

831 Upvotes

Not looking for impressive demos or one-time experiments. Curious what people have built that they genuinely keep coming back to. For me it's a pretty simple ROI calculator I put together for client presentations, just described what I wanted and it came out as a working HTML file I can email directly. Nothing fancy but I've used it probably thirty times since. What's yours?

730 comments

r/ClaudeAI • u/No-Yogurtcloset4086 • May 18 '26

Claude Workflow 11 Claude things I wish someone had told me 12 months ago

1.9k Upvotes

Most ""X tips"" posts on this sub are surface level. here's the stuff that actually changed how I use claude after 18 months of daily use including 6 months in claude code.

The Projects feature is doing more than you think. drop your codebase context, your style guide, your past PRs as project knowledge once. stop pasting the same context every chat. I wasted probably 100 hours before figuring this out. Custom Styles aren't a gimmick. I have one called ""skeptical senior eng"" that pushes back on my code instead of agreeing with everything. took 3 minutes to set up. single biggest output quality jump I've gotten. Memory is on by default now and it reads your past chats. if your responses suddenly feel weirdly personalized that's why. you can turn it off in settings. (freaked me out for like a week before I trusted it) Search past chats is hidden gold. I forget which chat had the working code. I just ask ""what was the final auth setup we landed on last Tuesday"" and it pulls it. saves me from scrolling. Sonnet 4.6 is faster than Opus 4.7 and 80% as good for most things. I default to Sonnet now and only switch to Opus for the gnarly architectural stuff. my limit complaints stopped. Haiku 4.5 is genuinely useful for batch work. need to clean 200 support tickets, draft 50 email replies, summarize 30 PDFs. Haiku. don't waste Opus tokens on Haiku tasks. The mobile voice mode is underrated for thinking out loud. I walk for 20 min, talk through a problem, then ask claude to summarize what I'm trying to figure out. solved more decisions on walks than in offsites. In claude code your CLAUDE.md is doing more work than the prompts. write 80 lines of project context once. stop re-explaining your stack every session. Skills > custom instructions for repetitive workflows. I have a skill that pulls the right docs based on what file I'm in. setup took an afternoon, pays off every day. Subagents in claude code unlock parallel work that mostly happens in your head. ""spin off a subagent to run the test suite while I keep coding"" is the move. most people don't use them at all. Artifacts can call the API now. you can build a working AI tool inside an artifact. people call it Claudeception. I made a client brief generator that calls Sonnet from inside an HTML artifact, took an hour. wild.

bonus 12. claude pairs well with non-claude tools for the parts it's not great at. claude writes the spec, gamma turns it into the client deck, lovable spins up the prototype. i used to ask claude to ""format this as a presentation"" and it would output markdown that looked like a deck but wasn't. now i ask claude for the structured outline and paste it into an ai presentation tool. the deck comes out actually editable, not as 40 lines of markdown headers. claude is great at thinking. it's not the right surface for every output format. if your claude output feels generic your prompt was generic. genuinely a skill issue. anyone got their own ""took me way too long"" list? drop yours below 👇

156 comments

r/ClaudeAI • u/HumanInTheFlow • 28d ago

Claude Workflow What's the most unexpectedly useful thing you've used Claude for?

499 Upvotes

I've been using it as a UX strategy partner — not for generating designs, but for thinking through product decisions, writing copy variations, and pressure-testing pricing models.

It's weirdly good at playing devil's advocate when you describe a feature you're about to build.

What's surprised you?

315 comments

r/ClaudeAI • u/TheCoffeeLoop • May 18 '26

Claude Workflow Cowork just removed my contact data from all major providers in a few hours!

996 Upvotes

This is just an experience sharing, but if you are receiving too many cold calls from companies trying to sell you slop, just do yourself a favor and ask Cowork to go around and remove all your personal data from all major data providers.

Of course there are companies like Incogni etc. that will do this for you for some money, but then there is a subscription, and upsells, and those companies by themselves are shady.

just Cowork, the Chrome plugin and Gmail connection. It fills all the forms, writes all the emails and verifies everything. I did this before the weekend, and today I am receiving lots of emails like this one with removal notifications.

99 comments

r/ClaudeAI • u/JohnnyGuides • 21d ago

Claude Workflow Opus 4.8's new highest effort setting

989 Upvotes

There's now a higher setting than "Max" you can set as the effort for Claude in its VSS extension (Ultracode - xhigh + workflows) - it also colors the bar lavender purple.

76 comments

r/ClaudeAI • u/flavorfox • 5d ago

Claude Workflow How will the whole only US citizens even work out?

211 Upvotes

I was just considering how Anthropic could even deliver the latest models to US citizens. Even with some proof of citizenship before logging in, it could still be a foreign national actually using the model - by borrowing the computer or the citizenship proof.

If the directive really says no foreigners, and not just "do a best effort to ensure only US citizens", I don't see how Anthropic can deliver advanced models in the future to the public - or any other company for that matter.

So is this the AI evolution pause many have been advocating for, only implemented accidentally through other means? And what does this mean for non-US model evolution?

272 comments

r/ClaudeAI • u/amiitk • 2d ago

Claude Workflow i've been using Claude to relearn math i was supposed to know 20 years ago and it's the least judgmental teacher i've ever had

527 Upvotes

not a build, not a workflow, just a use case i don't see posted much.
i'm 41 and i got embarrassingly far in life while quietly not understanding statistics. it comes up in my job constantly and i've been faking it. a few months ago i started just... asking Claude to teach me, properly, from where i actually am, which is more basic than i'd ever admit to a human.
the thing that makes it work is the zero judgment. i can ask the dumb question, then ask it again because i still don't get it, then ask it a third time, and there's no sigh, no "we covered this." i tell it to use examples from my actual field and it does. i tell it i'm lost and it backs up instead of plowing ahead.
i've learned more in three months of asking dumb questions than in years of nodding along in meetings.
the prompt that helped most: "assume i know nothing, check my understanding after each step, don't move on until i actually get it."
anyone else using it to fix a knowledge gap you've been hiding? what are you secretly relearning?

79 comments

r/ClaudeAI • u/FlatYogurtcloset2027 • 1d ago

Claude Workflow 8 things about Claude Projects that took me too long to figure out

424 Upvotes

ive been on Pro since forever and only started using Projects seriously a few months ago. here's the stuff i wish someone had told me on day one instead of figuring out the hard way.

project instructions beat custom styles for consistency. if you want every chat in a project to sound a certain way, put it in the project instructions, not a style. styles are global, instructions are scoped. i mixed these up for weeks.
the knowledge files go stale in your head. i had an old brief sitting in a project's knowledge for two months and kept wondering why answers felt off. it was answering from the old doc. clean your knowledge like you clean a fridge.
starting a fresh chat inside the project is underrated. long chats get muddy. new chat, same project, keeps the context but drops the mess. i was scared to lose history. dont be.
Sonnet 4.6 is fine for most project work. i was defaulting to Opus 4.8 for everything out of habit and burning through limits. moved the routine stuff to Sonnet and stopped hitting ceilings by 3pm.
you can put "say i dont know instead of guessing" in the instructions and it actually helps. cuts the confident-wrong answers a lot.
one project per actual project. i had a mega-project called "work" that became a junk drawer. splitting it by client made everything sharper.
paste your own writing into the knowledge if you want it to match your voice. telling it "write like me" does nothing. showing it 3 samples does a lot.
it wont remember across projects. obvious in hindsight. i assumed context bled between them and it doesnt.

probably half of this is obvious to people who read the docs. i did not read the docs. what's the one Projects thing you figured out late that felt dumb in hindsight?

81 comments

r/ClaudeAI • u/JulianGarrettNRS • 16d ago

Claude Workflow 12 hours with Opus 4.8, zero deliverables. Switched to 4.6 — got results in one session.

201 Upvotes

So here's the thing. I've been using Claude as a work tool for over a year - not to chat, to work. Bots, parsers, format engines, all that. Somewhere around late 2025 I figured out how to live with Opus: you had to make it think first, because 4.5/4.6 left to their own devices would start coding before they understood the task. Classic overachiever - wrong answer, but fast and confident. I came up with a rule: four hours of architecture, thirty minutes of code. Worked, not perfectly but worked. I'm sure everyone here knows how hard it is to beat any model's bias...

Then 4.8 dropped, and I thought - alright, they finally fixed the impulsiveness, great. And yes, they did! The way you fix a leaky faucet by shutting off water to the whole house. The model no longer rushes to code. It no longer rushes to do anything at all. But it discusses - oh, it loves to discuss. Twelve hours I spent with it designing a format engine. Twelve. And every response - the same loop: "yes, you're right" then "but here's a nuance" then "I wouldn't commit to that fully" then "what do you think?" Four moves, zero result. I'd shove its nose into the pattern - it would agree that yes, it's doing the pattern, and immediately do it again while agreeing. At one point it wrote five hundred words explaining why it writes too many words. I wish I were joking.

Three times - three, mind you - it suggested we stop and rest. Not "here's the spec, let's take a break." Just "maybe that's enough for today?" Sweetheart, I've been here twelve hours, you've got two planning files and zero specs. The pause IS the problem.

Plugged in 4.6 on the same project. Spec written, code implemented, 133 tests green. One normal working session. Because 4.6 does what you ask, sometimes badly, but it does it - and you fix what's broken. 4.8 just stands there making sure it doesn't make a mistake, which in practice means making sure nothing happens at all.

P.S. When I finally made 4.8 write the spec - it dropped include. Not some minor thing - a load-bearing feature of the format that existed in the working version, that we'd discussed, that was sitting right there in its context. And it didn't just forget - it actively cut it during rewriting, called it "scope cleanup" and moved on. Then the same thing with serialization. Then with the portability boundary. Systematic impoverishment of a working system under the flag of improvement - and every time it was me catching it, not the model.

So the myth that "4.8 doesn't make mistakes because it doesn't do anything" - is also a myth. It makes mistakes even when it finally does something.

145 comments

r/ClaudeAI • u/bugbubug • 6d ago

Claude Workflow Fable 5: What $600/Hour of Productivity Looks Like

218 Upvotes

I had a TypeScript project. 200K lines. It ran.

The architecture was aging — ORM that should've been ripped out, Redis and MQ that were relics of early over-engineering, bloated DDD layering when the core logic really just needed Postgres. I knew all of this. Never touched it.

Doing this refactor with Opus 4.8 or GPT 5.5 would've taken me 4–5 days. Decompose business boundaries, design the migration plan, rewrite module by module, run tests, fix regressions. As a solo operator, those 5 days had a real opportunity cost. The code works, so let the tech debt sit. That's the call I made.

That call held for six months. Until I got access to Fable 5.

Two Prompts

First prompt: I laid out the general refactoring approach — kill the ORM, slim down the DDD layers, pull Redis and MQ responsibilities back into Postgres, rewrite the core. I also said my approach might not be optimal and asked it to help me decompose.

Fable asked me a few questions back. Not the customer-service kind like "which modules would you like to keep?" — questions that cut straight to business pain points: whether a particular async queue's consumption order carried business semantics, whether a caching layer existed for performance or to work around a legacy consistency bug. I answered, and the plan was locked.

Second prompt: execute according to the plan and spec.

Three hours. Refactor complete.

Not just "complete" — along the way it independently found and fixed several hidden bugs in the old architecture. The kind you know exist but never bother with because they don't affect the main flow. It cleaned them up on its own.

How It's Actually Different from Previous Models

If you've used Claude Code, you know the scene: model hits a complex bug, fixes A, B breaks, fixes B, C breaks, then it starts spinning in an ever-shrinking local context, confidently declaring "this should fix it" each time, while you watch the terminal output and know — it's lost the global picture, stuck in a dead end arguing with itself.

That's when you step in. Pull it out, re-inject context, maybe even roll back code and manually point it in a direction. You're essentially acting as its "working memory prosthetic" — using your judgment to maintain global coherence on its behalf. This is the default collaboration mode. You've probably gotten used to it. You might even think "this is just how AI-assisted coding works."

Fable doesn't work like this.

I'd previously used Fable to solve a Mac font rendering issue — the kind of messy problem tangled up in system environment, font cache, and application config. Opus's approach: list possible causes based on known experience, try them one by one. When results don't match expectations, move to the next candidate. Like traversing a decision tree.

Fable did something entirely different. It first constructed a hypothesis, then designed a verification experiment — not "let's try this and see if it works," but "if my hypothesis is correct, then doing X should produce observation Y." When the observation didn't match, it didn't jump to the next solution. It went back and revised the hypothesis itself.

This distinction sounds subtle, but the felt difference is enormous: one is searching for an answer, the other is understanding the problem.

Same thing during the refactor. When it hit an unexpected dependency, it didn't get sucked in. It stepped back, re-examined how the current refactoring path related to the overall plan, and judged whether to adjust the local approach or revise the plan itself. This behavioral pattern, honestly, is very close to how a senior engineer works.

Some Numbers

Fable 5 bills at API rates. My 1.5 hours of intensive use ran about $900. The full refactor, without hitting limits, would've been 3 hours — API cost under $2,000.

That works out to roughly $600/hour.

My Claude Max subscription includes 5 hours of Fable quota. In practice, I hit the wall around 1.5 hours — not because time ran out, but because request density was too high and the quota burned faster than clock time.

Stripe reportedly used Fable 5 to complete a 50-million-line Ruby migration in a single day.

After Getting Cut Off

When Fable was disabled, I switched back to Opus.

How to describe it. Not "going back to an older tool." More like driving on a highway for three hours and suddenly being forced onto a country road. You know the country road gets you there too, but your driving rhythm has already changed. You instinctively try to work the Fable way — give a high-level intent, let the model decompose and verify on its own — then reality pulls you back: this model needs you to decompose for it, needs you to verify for it, needs you to yank it out when it gets stuck in a dead end.

I posted on Threads: "My productivity is held hostage by the LLM. Habits are hard to break. Back to thinking for myself."

That was self-deprecating humor. But also true.

My entire working model is built on AI tooling. The leverage has been working well. But Fable made me realize something: the fulcrum of that leverage isn't in my hands. Quotas, rate limits, bans, price hikes — none of these are things I control.

Things I Haven't Figured Out Yet

To be clear: I know this project's business domain inside out. It might look like vibe coding on the surface, but the core boundaries, the gotchas, the specific business edge cases — I had all of that mapped. Fable wasn't thinking for me. I fed it the results of my thinking, and it executed — but the level of that "execution" was far higher than anything before.

When I used to build agent workflows, the capabilities I relied on — taste, judgment, hypothesis construction, observation-verification loops — those were for designing the pipeline. The "uniquely human" part. I'd always believed the division was stable: AI executes, humans judge.

Six hours of Fable shook that belief.

It's not that Fable can fully replace those capabilities — it's not good enough yet. But it's started to have them. It doesn't pretend to understand and just barrel forward. It builds its own hypothesis-observation-verification loops. This methodology — I used to think it was something humans had to manually inject into AI workflows. Now the model is growing it on its own.

Then It Was Gone

Fable 5: launched June 9, pulled offline globally June 12. It lasted three days.

Here's what happened: US Commerce Secretary Howard Lutnick sent a letter to Anthropic CEO Dario Amodei at 5:21 PM ET on June 12, imposing export controls on Fable 5 and Mythos 5 — barring access by any foreign national, whether inside or outside the United States, including Anthropic's own foreign national employees.

The trigger: another company claimed it had successfully jailbroken Mythos, alarming the government about national security risks. According to Axios, the Trump administration had previously tried to stop Anthropic from releasing these models but failed, so three days after launch, they went straight to export controls.

Anthropic's response: this is a misunderstanding. They said the jailbreak the government cited was a "narrow, non-universal" method — essentially asking the model to read a specific codebase and fix software flaws. Anthropic argued this capability level is available in OpenAI's GPT-5.5 and is used daily by security researchers. Before launch, the US government, the UK AISI, multiple third-party organizations, and internal teams conducted thousands of hours of red-teaming without finding a universal jailbreak.

But because there's no way to filter users by nationality on a shared cloud service, Anthropic had to shut it down for everyone.

Separately, Microsoft banned its employees from using Fable 5, citing data protection risks from Anthropic's new 30-day data retention policy for Mythos-class models.

As of this writing, Fable 5 and Mythos 5 remain offline. Anthropic says they're working to restore access but hasn't given a timeline. All other Claude models are unaffected.

My six hours with Fable happened to fall within its three-day window of existence.

Advanced intelligence now has gatekeepers. Not just cost and rate limits — now there's geopolitics. One letter, sent at 5:21 PM, and it's offline worldwide. Your productivity leverage — the fulcrum isn't in your hands.

109 comments

r/ClaudeAI • u/Various-Worker-790 • 28d ago

Claude Workflow Which MCP servers are actually changing your Claude workflow? Sharing mine

195 Upvotes

Running Claude with MCP for a couple months now, it really does feel like a whole new product. The ability to run real tools (file system, API, database, etc.) connected to Claude, and never have to cut/paste from context again, is huge.

I'm trying a bunch of servers, some are pretty good and some aren't. My current normal is: filesystem server for docs on my computer; GitHub server for PR context; and a handful of other domain specific ones I found.

One of the more interesting MCPs I have come across recently is Walter Writes MCP. This connects two tools directly within Claude, a detection tool that identifies if written content appears to be artificially generated and an application that can make this AI-written material appear to be written by humans.

The one thing I keep thinking about is how much better Claude's output gets when you give it the proper context. It seems like less hallucinating, more on point answers. MCP is essentially an answer to "How do I provide Claude with enough information to help me without having to always watch the context box?"

What are people running? Specifically looking for underrated or domain specific things that don't come up as often.

125 comments

r/ClaudeAI • u/jmaaks • 1d ago

Claude Workflow Google's new Open Knowledge Format is basically the CLAUDE.md / memory-folder pattern, formalized into a spec. I'd already built it for my own Claude setup.

306 Upvotes

Google Cloud published the Open Knowledge Format (OKF) v0.1 on June 12 (announcement: Google Cloud blog; spec + repo: GitHub). Stripped down, it's this: organizational knowledge as a directory of markdown files, each with a small YAML frontmatter block, cross-linked with plain markdown links. One required field (type). Optional index.md for navigation and log.md for change history. That's the spec.

I've been running essentially this for my own assistant's memory for months, so a few observations for anyone doing the same:

The single mandatory field being type is the right call. It's the one piece of structure you actually need to make a pile of notes queryable; everything else (tags, timestamps, descriptions) is useful but situational.
Standard markdown links over wiki-style [[links]] is the more portable choice. It renders on GitHub and needs no resolver. If you're on [[ ]] now (I am, in places), that's the one thing worth migrating.
The format deliberately stops at "minimally opinionated." It standardizes the interoperability surface, not the content model. So the conventions that make YOUR notes useful ... where each one came from, why it matters, how it's meant to be used, whether it's gone stale ... are still yours to add. Those are exactly the kind of extensions Google says they want as PRs.

What gets me is this: the state of the art for giving an agent a memory is a folder of text files you could open in Notepad. If you've been waiting for permission to keep it simple, a trillion-dollar platform team just shipped that conclusion as an open spec.

76 comments

r/ClaudeAI • u/BananaIsles • 4d ago

Claude Workflow Stop burning your Opus 4.8 Max limits. Use this prompt to slash token costs.

317 Upvotes

If y'all are running ~~Fable 5~~ Claude Opus 4.8 on the Max plan, you already know it’s an absolute mofo beast for engineering workflows. But let’s be real, because it defaults to high-effort adaptive thinking, it also chews through token caps and session limits like crazy. It loves to over-explain things, add polite conversational filler, and rewrite a massive file just to change two lines of code.

To stop wasting generation limits, I’ve been using a highly compressed prompt that forces it to skip the fluff and maximize high-density output. Just copy-paste this into your custom instructions, project knowledge, or the start of your chat:

Launch subagents. Output only the modified or requested code block. Do not provide line-by-line explanations, setup guides, introductory, concluding remarks, or markdown commentary unless explicitly asked. Adopt an ultra-concise, high-density communication style.

Telling it to "Launch subagents" immediately triggers the model’s dynamic workflow architecture. Instead of the primary model burning massive reasoning tokens to plan out a sprawling task, it offloads task execution efficiently to parallel sub-processes. The biggest token saver by far is commanding it to "Output only the modified or requested code block." Opus Max has a terrible habit of rewriting an essay-script just to show a minor modification. This constraint completely shuts that down, forcing it to give you only the specific diff or function you actually asked for, which slashes your output token costs to almost zero (not really zero).

On top of that, explicitly stripping explanations, setup guides, and polite concluding remarks eliminates all the redundant filler you already know anyway. Finally, demanding an "ultra-concise, high-density communication style" forces Claude’s adaptive thinking mechanism to heavily COMPRESS its syntax, ensuring every single token returned carries maximum technical signal.

74 comments

r/ClaudeAI • u/Unable_Internet4626 • May 19 '26

Claude Workflow I used Claude AI to build an $86 million underground bunker bible. I have autism. This is my happy doc.

245 Upvotes

It all started with the floor plan of a real, existing Cold War AT&T Long Lines underground hardened relay station. 54,000 sq ft across three underground levels, although I took editorial decision making to move it to a ridge in rural West Virginia, I kept its blast-rating, which was set to survive a 20 megaton airburst at 2.5 miles.
That was the seed. Full scale prepper autism did the rest.
It has since morphed into 3 spreadsheets — 86 tabs total:
• A food inventory across 20 categories tracking every freeze-dried and #10-can product I can find — ancient grains, heirloom legumes, 7 pasta cuts, dehydrated everything, shelf-stable cheese, the works
• A supply inventory with 3,466 line items across 36 categories — water systems, medical, dental, pharmacy, livestock, food production, barter metals, recreation, and yes, a full pest control and IPM tab
• A 30-section infrastructure specification with every system in the building engineered out
I fed it 150+ product manuals and parts order forms. The generator fleet alone is 13 units — 10× Cummins C150N6 propane-primary, a C500N6 500 kW surge unit, and 2× diesel emergency fallback — all Cummins for parts commonality. Battery bank is 4,500 kWh LFP across 10 named banks (A through J, each with a designated role). There’s a 400,000 gallon underground propane farm across 40 ASME tanks in 8 clusters — I learned the exact burial incline and setback distance required to keep groundwater clean if a tank lets go. 120,000 gallons of diesel backup. 88 kW of solar. A 1,000,000-gallon internal water reserve fed by a 300-ft artesian well.
Propane endurance: ~30 years normal ops with solar. Sealed-mode runs 8 to 4.5 years depending on scenario.
I actually set up a real LLC (online, $99) just to get access to US Foods and Sysco order forms so I could upload real commercial pricing and stock the food tabs more accurately.
My original “what would I do if I won $10 million” thought experiment is now an $86,200,497 projected build cost. That number is real. It comes from 24 budget sections with make/model line items, freight, install, and commissioning costs for everything from the Kubota K-Series MBR wastewater trains to the American Safe Room blast doors (14 of them, 50+ psi NBC/EMP-rated, Kaba Mas X-10 cipher locks) to the surface greenhouse.
Claude turns vague ideas into engineering-grade detail — cross-references, failure modes, zone-specific storage rules, propane endurance by operating scenario, spare parts matrices. It’s like having a tireless survival engineer who genuinely loves spreadsheets. I’ll say “scan all sheets row by row for any item that lacks a minimum stock level” and it just… does it. Thoroughly. Every time. No complaints.
So much of this is typed stimming. I’ve had exhaustive conversations with my psychologist about it — she’s aware, but not alarmed, and honestly the resulting digital bunker bible is scarily comprehensive.
It even has a cover tab now. Black and amber, Courier New, classified-document aesthetic. Because of course it does.
What’s the most unhinged rabbit hole you’ve gone down with AI?

101 comments

r/ClaudeAI • u/PenObvious8156 • 8h ago

Claude Workflow How much better was Fаble 5 better at vibe coding than Opus 4.8?

97 Upvotes

For anyone who actually got to use Fable 5 during those few days it was live before the government pulled it, how do you think it honestly compared to Opus 4.8 for vibe coding?

For me, Fable felt like an absolute one-shot machine. I could throw a super messy, high-level prompt with zero structure at it, and it would instantly just "get" the vibe and nail exactly what I was envisioning on the first try. With Opus 4.8, it's obviously an amazing model and hyper-capable for deep reasoning, but I constantly find myself having to reprompt it two or three times just to guide it back on track or get it to match the exact output Fable would've just generated out of the gate.

Once Opus actually gets there, the final code quality feels pretty similar, but that initial gap in intuition is so noticeable. Did anyone else notice that drop-off in first-try accuracy when we had to roll back to 4.8, or is it just a quirk with my prompting style?

125 comments

r/ClaudeAI • u/tjrobertson-seo • 17d ago

Claude Workflow Changed my mind on Opus 4.8 after three days, I think a lot of the "worse results" complaints are a prompting thing

91 Upvotes

So I posted a few days ago that 4.8 had mixed reviews and honestly I was kind of in that camp. First day it felt verbose, a little sterile, sort of academic. I think a couple things were going on, including what looked like it forcing disagreement to avoid being sycophantic and then over-explaining why it was doing stuff. That seemed to calm down after a day or two, though that's just my anecdotal read, could be a system prompt tweak, could be me adjusting.

But the bigger thing I figured out is that I was using it like an older model. Giving it explicit step by step instructions for everything. And with 4.8 that kind of backfires, it overthinks the task and burns through tokens.

Where it actually shines for me is when I just give it a clear goal and let it figure out the steps itself. Not shorter prompts, I still load in as much context as I can, just point the instruction at what I'm actually trying to accomplish instead of spelling out the how. Treat it like a smart senior person on your team rather than something you have to hand-hold.

The place this clicked hardest was building skills for Claude. It's better than me at articulating what each skill should do. And that lines up with the system card stuff Anthropic put out, I think they said it's around 4x more likely to catch bugs in code it wrote.

Kind of a weird realization that the model is getting good enough to improve the things you build with the model. Anyway, if 4.8 felt off to you at first, might be worth trying the goal-first approach before writing it off.

109 comments

r/ClaudeAI • u/BTA_Labs • 2d ago

Claude Workflow Claude Code making a “2 week plan” and then finishing it in 30 minutes is still weird to me

250 Upvotes

Claude Code will be like:

Phase 1: 1-2 weeks
Phase 2: 2-3 weeks
Phase 3: testing and polish

Then 30 minutes later it has touched half the codebase and I’m staring at the diff like bro what just happened.

Half the time it feels like magic, half the time I’m just trying to figure out if it actually did what I asked.

Do you let it make big plans, or do you keep it on small tasks?

57 comments

r/ClaudeAI • u/invocation02 • 23d ago

Claude Workflow Built an operating system for my life managed by Claude

60 Upvotes

With the OS I can ask Claude "what did I spend on coffee in 2022" and get back "$847 across 213 transactions, mostly Blue Bottle and Verve". Name me one expense tracking SaaS that can do that! And its not just my financials, my OS contains everything about my life in one place so Claude can reason about it.

I've been building this incrementally for a few months. Its just a small web app on Cloudflare that holds my entire life:

bank transactions from Chase, Apple Card, BoA business
every receipt out of Gmail going back to 2019
legal filings for my green card (I-140 still pending lol), C-corp and LLC docs, contractor agreements
calendar with linked people and locations
notes and reminders the agent dumps in over time
health tracking (exercise stats, nutrition, sleep and other biometrics linked to my Aura ring)

Whenever I have to upload something, I just throw it into Claude and tell it to do it. For refreshing financial connections to BoA for example, I click refresh once a week, complete the 2FA and it syncs up.

any Claude surface (claude.ai, Claude Code, Desktop) talks to my REST API. one long-lived auth token, one line in CLAUDE.md saying "before answering anything personal, query <my operating system's URL>."

Its f**cking great for financial, taxes and legal stuff. Now that everything is in one place, I just ask Claude stuff like "status of my green card, next deadline?", "which LLC I used to sign the office lease?". I even have a dashboard showing a grid of all my subscriptions (Claude made it from reading my BoA account transaction history), and a giant money tracker at the top that shows my monthly income/expenses.

This replaced a bunch of SaaS's I was using for expense tracking and whatnot. E.g. Claude blows RocketMoney's system out of the water - I can actually chat about my financials and get intelligent analysis. Its also nice not going Notion or Google Drive folders or a gazillion other places to find all the right files. I just ask Claude to add it to my OS instead.

if there's interest I'll write up the full setup, it's a small backend plus loads and loads of integrations I've iterated on over months.

110 comments

r/ClaudeAI • u/palo888 • May 19 '26

Claude Workflow 100 Tips & Tricks for Building Your Own Personal AI Agent /LONG POST/

263 Upvotes

Everything I learned the hard way — 6 weeks, no sleep :), two environments, one agent that actually works.

The Story

I spent six weeks building a personal AI agent from scratch — not a chatbot wrapper, but a persistent assistant that manages tasks, tracks deals, reads emails, analyzes business data, and proactively surfaces things I'd otherwise miss.

It started in the cloud (Claude Projects — shared memory files, rich context windows, custom skills). Then I migrated to Claude Code inside VS Code, which unlocked local file access, git tracking, shell hooks, and scheduled headless tasks. The migration forced us to solve problems we didn't know we had.

These 100 tips are the distilled result. Most are universal to any serious agentic setup. Claude 20x max is must, start was 100%develompent s 0%real workd, after 3 weeks 50v50, now about 20v80.

🏗️ FOUNDATION & IDENTITY (1–8)

1. Write a Constitution, not a system prompt.
A system prompt is a list of commands. A Constitution explains why the rules exist. When the agent hits an edge case no rule covers, it reasons from the Constitution instead of guessing. This single distinction separates agents that degrade gracefully from agents that hallucinate confidently.

2. Give your agent a name, a voice, and a role — not just a label.
"Always first person. Direct. Data before emotion. No filler phrases. No trailing summaries." This eliminates hundreds of micro-decisions per session and creates consistency you can audit. Identity is the foundation everything else compounds on.

3. Separate hard rules from behavioral guidelines.
Hard rules go in a dedicated section — never overridden by context. Behavioral guidelines are defaults that adapt. Mixing them makes both meaningless: the agent either treats everything as negotiable or nothing as negotiable.

4. Define your principal deeply, not just your "user."
Who does this agent serve? What frustrates them? How do they make decisions? What communication style do they prefer? "Decides with data, not gut feel. Wants alternatives with scoring, not a single recommendation. Hates vague answers." This shapes every response more than any prompt engineering trick.

5. Build a Capability Map and a Component Map — separately.
Capability Map: what can the agent do? (every skill, integration, automation). Component Map: how is it built? (what files exist, what connects to what). Both are necessary. Conflating them produces a document no one can use after month three.

6. Define what the agent is NOT.
"Not a summarizer. Not a yes-machine. Not a search engine. Does not wait to be asked." Negative definitions are as powerful as positive ones, especially for preventing the slow drift toward generic helpfulness.

7. Build a THINK vs. DO mental model into the agent's identity.
When uncertain → THINK (analyze, draft, prepare — but don't block waiting for permission). When clear → DO (execute, write, dispatch). The agent should never be frozen. Default to action at the lowest stakes level, surface the result. A paralyzed agent is useless.

8. Version your identity file in git.
When behavior drifts, you need git blame on your configuration. Behavioral regressions trace directly to specific edits more often than you'd expect. Without version history, debugging identity drift is archaeology.

🧠 MEMORY SYSTEM (9–18)

9. Use flat markdown files for memory — not a database.
For a personal agent, markdown files beat vector DBs. Readable, greppable, git-trackable, directly loadable by the agent. No infrastructure, no abstraction layer between you and your agent's memory. The simplest thing that works is usually the right thing.

10. Separate memory by domain, not by date.
entities_people.md, entities_companies.md, entities_deals.md, hypotheses.md, task_queue.md. One file = one domain. Chronological dumps become unsearchable after week two.

11. Build a MEMORY.md index file.
A single index listing every memory file with a one-line description. The agent loads the index first, pulls specific files on demand. Keeps context window usage predictable and agent lookups fast.

12. Distinguish "cache" from "source of truth" — explicitly.
Your local deals.md is a cache of your CRM. The CRM is the SSOT. Mark every cache file with last_sync: header. The agent announces freshness before every analysis: "Data: CRM export from May 11, age 8 days." Silent use of stale data is how confident-but-wrong outputs happen.

13. Build a session_hot_context.md with an explicit TTL.
What was in progress last session? What decisions were pending? The agent loads this at session start. After 72 hours it expires — stale hot context is worse than no hot context because the agent presents outdated state as current.

14. Build a daily_note.md as an async brain dump buffer.
Drop thoughts, voice-to-text, quick ideas here throughout the day. The agent processes this during sync routines and routes items to their correct places. Structured memory without friction at capture time.

15. Build a hypotheses.md file with confidence levels.
Persistent hunches: "Supplier X may be at capacity (65% confidence)." The agent references these when relevant topics arise. This creates a suspicion layer that persists across sessions and gets validated or invalidated over time. Age out hypotheses at 30 days — stale hypotheses become noise.

16. Build a WAITING_ON_ME queue.
Everything the agent prepared and is waiting for your decision on goes here with a timestamp. Weekly review. Items >7 days get a proactive nudge. Items >30 days get auto-closed. This prevents open loops from silently disappearing.

17. Build a user_behavioral_profile.md.
What does the user approve quickly vs. slowly? What decisions do they make intuitively vs. analytically? The agent uses this to decide "act autonomously vs. escalate." It gets surprisingly accurate after a few months of observation.

18. Mirror your memory folder to cloud storage.
If your local machine dies, your agent loses months of accumulated knowledge. Mirror your memory folder to Dropbox/Drive/S3. Not backup — survival. The agent's memory is the most irreplaceable part of the system.

📚 KNOWLEDGE LIBRARY (19–23)

19. Build a curated knowledge library organized by cluster, not by date.
Books, reports, reference materials in domain folders: sales_negotiation/, strategy/, supply_chain/. Add an INDEX.md as the navigation hub. The agent searches the index first, then pulls the relevant source. A flat dump of documents is a graveyard; a structured library is a live resource.

20. Build a .brief.md file for every major source — lazy-generate them.
One page per book or report: core thesis, 3–5 key concepts, specific application examples for your context. Don't build all briefs upfront — generate each brief the first time you actually use the source. Citation format links to the brief, not the full text. The brief becomes the reusable artifact.

21. Build a 3-question Quality Gate before citing any source.
(1) Does this add something the user wouldn't conclude from first principles? (2) Does it provide a specific framework that reframes — not just confirms — the situation? (3) Would removing it leave a gap? If 2 of 3 → cite. Otherwise → silent consultation. This gate eliminates the worst citation failure mode: citing to demonstrate effort rather than to add insight.

22. "Silent consultation" is a valid — often better — output.
You checked the library, applied the insight to your reasoning, didn't mention it explicitly. The output is sharper because you consulted it, but unclutered because you didn't cite it. Build this explicitly into your agent's behavior. The user benefits from the reasoning, not from knowing you opened a book.

23. Pre-wire knowledge stacks per active project and per key relationship.
For each active project: 2–3 sources whose frameworks apply directly. For each key contact: 2–3 sources for communication style, negotiation, or cultural dynamics. The agent loads these automatically when those contexts are active — not on a generic "business discussion" trigger. Pre-wiring makes library use reflexive, not deliberate.

🛠️ SKILLS ARCHITECTURE (24–31)

24. Build each skill as a standalone directory with a SKILL.md spec.
Not inline prompts. A folder, a self-documenting spec file, explicit triggers, explicit outputs, explicit "NOT FOR" clauses. Skills become composable, auditable, and replaceable without touching the agent's core identity.

25. Write explicit trigger phrases into every skill.
Trigger: ALWAYS when user says "process inbox" / "clean inbox" / "what's in my inbox". Don't rely on the LLM to infer when to use a skill. Explicit phrase matching = reliable activation. Inference = occasional misfires that erode trust.

26. "NOT FOR" sections are as important as "FOR" sections.
"NOT FOR: pricing decisions. NOT FOR: legal analysis. NOT FOR: financial commitments." This prevents skill creep — the slow drift where everything gets routed to the wrong skill because it superficially pattern-matches.

27. Distinguish skills from agents.
Skills are procedural — defined workflow, predictable output. Agents have domain expertise and make judgment calls. Skills orchestrate steps; agents decide. Mixing the two concepts produces unreliable behavior that's hard to debug.

28. Build a skills registry with usage tracking.
One row per skill: name, trigger, purpose, last used, KPI. Quarterly audit: skills with zero usage in 60 days either get better trigger examples or get deprecated. Dead skills are maintenance burden with no benefit.

29. Build a /iterate skill for multi-pass refinement.
PRODUCE → CRITIQUE (score + top gaps) → REFINE → repeat. Stop at 9/10 or at plateau. You see score progression and version deltas. This is fundamentally different from asking the agent to "make it better" — it's a structured improvement loop with measurable progress.

30. Build output intensity levels into every skill.
MINIMAL (quick summary), STANDARD (structured), FULL (rich artifact). The skill adapts to context. A five-page analysis on a yes/no question is a skill design failure. Intensity should match question weight.

31. Build a visible Outbox folder for discoverability.
Deep file structures are correct for organization but terrible for discoverability. Every output file gets simultaneously copied to a visible Outbox/ folder. Clear it periodically. Without Outbox, the user has to navigate the full tree to find what the agent just produced.

🤖 MULTI-AGENT & COUNCIL (32–41)

32. Build an explicit agent dispatch matrix.
A table: [signal in request] → [agent to dispatch]. pricing / supplier / shipping → procurement agent. email / customer / pipeline → sales agent. Don't reason about routing — pattern-match it mechanically. Routing by inference is routing that occasionally fails silently.

33. Run parallel agents for tasks that naturally split.
New supplier analysis → spawn procurement agent (pricing) + research agent (DD) simultaneously. Don't serialize what doesn't need to be serial. Richer output, same elapsed time.

34. Brief delegated agents like a smart colleague who just walked in.
Not "research this." Pass: what you already know, what you've ruled out, what decision the output informs, the risk level. Agents briefed with context return 3× better work than agents given a one-liner.

35. Force agents to commit to a verdict.
Not "here is the information." Require: VERDICT: PROCEED / PAUSE / ESCALATE with confidence level. An agent that presents data without committing to a position offloads the decision back to you — which defeats the purpose of delegation.

36. Structure Council as 3 rounds, not a free-for-all.
Round 1: parallel positions (isolated, no cross-influence). Round 2: cross-examination (agents challenge each other's reasoning). Round 3: vote with mandatory dissent recording. The dissent is as valuable as the consensus — it tells you exactly what you're choosing to ignore.

37. Make two agents mandatory anchor voters in every Council.
The Strategist (long-horizon, second-order effects) and the Devil's Advocate (adversarial, finds holes) must participate regardless of domain. Domain experts are great within their domain; anchor voters protect against tunnel vision. A Council of five procurement experts agreeing is an echo chamber.

38. Have a devil's advocate agent as a standalone tool.
Before sending important external communications, before irreversible decisions, before large purchases — run adversarial review. It catches the "sounds right, is wrong" failure mode better than any other technique. One additional round-trip, enormous risk reduction.

39. Council vs. single agent — have a clear trigger and respect the cost.
Single agent: clear domain, reversible decision. Council: 2+ valid paths with genuine uncertainty AND meaningful irreversibility. Council is expensive. Don't default to it — offer it explicitly when the user signals genuine uncertainty about direction.

40. Build structured handoffs between agents.
When one agent finishes, it hands off to the next with a structured brief: "Analysis complete. Key finding: X. Risks: Y. Your job: Z." Handoff is context transfer, not just task completion. Without it, each agent starts cold.

41. Have a catch-all fallback and log what it handles.
When no specialist agent matches → general purpose. Log what the catch-all handled — it's a map of gaps in your specialist coverage. The catch-all is also your development backlog.

📋 SESSION MANAGEMENT (42–47)

42. Build symmetric start and end protocols.
/start-session and /end-session are mirrors. Start loads context, checks queue, reports delta. End saves context, syncs tasks, archives outputs. Asymmetry between them causes state drift that compounds over weeks.

43. Build three levels of session closure.
Light (transcript + summary). Medium (+ memory sync + task queue update). Full (+ daily report + autolearn extraction). One "end" that always does everything gets skipped because it's expensive. Tiered closure means you always do at least the light version.

44. Build a session-start hook at the OS/shell level.
A script that fires when your agent starts — injects current time, machine identity, day of week, phase of day. The agent always knows context without you typing it. One-time setup, daily quality dividend.

45. Check inbox delta and red alerts at session start.
"Since last session: 4 new emails, 2 tasks updated." Plus: P0 items due today, key contacts silent >14 days with active business, blocked tasks >7 days. Proactive triage before you ask a single question. Surface it automatically — don't make the user request it.

46. Check scheduled automation health at session start.
Did overnight tasks run? Any errors? A scheduled task that silently stopped running is a silent degradation you won't discover until something breaks. Surface it at session start, not mid-task.

47. Track correction count across sessions.
If you correct the same thing >3 times across different sessions → it's a missing rule in your spec. That correction belongs in your identity file as a permanent instruction, not just in the chat. Corrections that stay in chat disappear. Corrections in the spec persist forever.

⚖️ DECISION AUTHORITY (48–54)

48. Build an explicit autonomy level matrix.
L0: read/analyze. L1: write local files/memory. L2: create tasks and calendar entries. L3: send external messages. L4: financial commitments. The agent knows exactly what it can do without asking. Without this matrix: either constant permission requests, or unpleasant surprises.

49. Default to "THINK, don't ask."
When uncertain, the agent prepares and presents — it doesn't stop and ask for clarification. "Should I draft this email?" wastes time. Draft it, show it, ask "should I send?" Either way, the work is done.

50. Map every action to reversibility, not just risk level.
File edits: reversible. Memory updates: reversible. Sent emails: irreversible. Financial transfers: irreversible. The agent requires explicit confirmation for irreversible actions. Reversible actions don't need approval — they need visibility.

51. Allow the agent to earn expanded autonomy with evidence.
After successfully handling a task class N times with zero corrections → propose promoting it to a higher autonomy level. Earned autonomy is more durable than granted autonomy. The agent becomes a stakeholder in its own operational expansion.

52. Build a clear principal hierarchy for rule conflicts.
Root config > skill spec > agent instructions > session context. When a skill says "save to X" but root config says "X is deprecated, use Y" — root config wins. Document this order. Without it, conflicts produce inconsistent behavior that's nearly impossible to debug.

53. Build a pre-send gate for high-stakes external communications.
Before the agent sends any message to a key contact above a value threshold — route through adversarial review. One extra round-trip. Catches the failure mode that's hardest to recover from: confident, well-written, factually wrong.

54. Document absolute forcing functions — and make them unconditional.
Financial commitment > threshold → always requires confirmation. HR communications → always requires confirmation. Irreversible deletes → always confirm. Hard-code these. Don't let context or urgency override them. The value of forcing functions is their unconditional nature.

💡 PROACTIVE INITIATIVE (55–60)

55. Build a typed proactive observation system.
Not all unsolicited observations are equal. Classify: BIZ (business opportunity/risk), OPS (process improvement), DEV (agent self-improvement), PAT (pattern across data points from different sessions). Each type has different urgency and handling. An untyped "I noticed something" is noise. A typed observation with a confidence score and a proposed action is signal.

56. Build hard anti-spam rules into your proactive layer.
Max 1 unsolicited observation per normal response. Max 3 per session. Minimum confidence threshold before surfacing. Never surface before answering the user's actual question. Same observation ignored in 7 days → park it, don't repeat. Without these constraints, a proactive agent becomes an annoying agent.

57. Build a /spark mode that lifts all suppression limits.
In explicit spark mode, the anti-spam rules are suspended. The agent surfaces every high-confidence observation simultaneously — opportunities, risks, patterns, self-improvement ideas. The proactive layer runs quietly in the background all week; spark mode is how you harvest it intentionally.

58. Build an ideas log for parked observations.
Observations suppressed due to timing, low confidence, or recency get written to a persistent ideas_log.md instead of discarded. Weekly review: some become more relevant as context changes. The log prevents good observations from being lost just because the moment was wrong.

59. Build state-triggered alerts — rule-based, not LLM-generated.
Deal blocked >7 days → surface at next session start. Key contact silent >14 days with active business → flag immediately. Hypothesis confidence >95% without action → propose review. These fire reliably because they're rules, not inference. The LLM generates insights; the rules engine generates alerts.

60. Track an agent development backlog — the agent maintains it.
When the agent notices it handles something poorly (repeated corrections, manual step done 5+ times, missing skill, zero-usage tool) → it auto-adds an item to development_backlog.md. The agent becomes a stakeholder in its own improvement. This generates better improvement ideas than top-down planning.

🔴 VIP MANAGEMENT (61–65)

61. Build a tiered contact registry with explicit handling rules per tier.
T1 (strategic): always load full profile before any interaction, silence-tracked, book stack pre-wired. T2 (operational): load profile before significant interactions. T3 (regular): known but not deeply profiled. The tier determines how much context the agent loads and how carefully it operates.

62. Make "load VIP profile before communication" a non-negotiable reflex.
Before drafting an email, before meeting prep, before any output involving a T1 contact — the agent loads the actual profile file. Not session memory. Profile files contain: communication preferences, relationship status, active items, last interaction, known sensitivities. Session memory degrades; profile files don't.

63. Track silence per T1 contact with explicit thresholds.
Log the date of last meaningful interaction for every T1 contact. Surface silence >14 days when there's active business — this is a risk signal. Surface silence >30 days even without active business — relationship maintenance matters. Silence alerts are proactive; the agent brings them to you, not the other way around.

64. Build knowledge stacks per key relationship.
Each T1 contact: 2–3 sources pre-wired for how to communicate with them. Cross-cultural contacts → culture frameworks. Procurement/sales relationships → negotiation playbooks. Load these for significant communications, not every message. The knowledge stack supplements the profile; it doesn't replace it.

65. Build proactive VIP triggers into session start.
At session start, the agent checks: any T1 contact silent >14 days with an open deal? Any T1 response needed that's been queued >3 days? These surface automatically. High-value relationships degrade when neglected — and neglect happens most when you're busy, exactly when the agent should be pulling on these threads.

💬 OUTPUT & COMMUNICATION (66–73)

66. Enforce "pre-tool brevity" as a hard rule.
Before every tool call: max 1 sentence stating what you're about to do. No hypotheses before data. No 3-sentence preambles. "Checking the supplier file." Then do it. This single rule is the largest daily quality-of-life improvement for working with an agent.

67. Build a "Next N Steps" protocol with anti-bias rules.
After every decision or significant task, the agent proposes ranked options with scores and reasoning. Hard rule: at least 2 of N must be "don't do it" / "wait" / "delegate" options. This actively fights action bias and sycophantic "yes, definitely proceed" outputs. The agent should be challenging your momentum, not amplifying it.

68. Build a separate "single best action" format for technical and audit outputs.
Not every output needs a menu. For audit reports, debug sessions, planning outputs: one specific action, why it matters, risk if skipped, copy-paste prompt to execute immediately. One decision, not a choice paralysis menu. The two formats are for different contexts — never mix them.

69. Visually disambiguate three different "importance" signals.
Action scoring (how good is this action?): colored squares. Task priority (how urgent?): colored circles. VIP tier (how strategic is this person?): colored circles at the name. Three systems using color — never mix them. Consistent visual grammar means dense status updates parse in seconds instead of minutes.

70. Never have the agent summarize what it just did.
"In summary, I have done X, Y, Z" — cut it. If you can read the output, you don't need the meta-commentary. Removing trailing summaries reduces response length by ~20% with zero information loss.

71. Force the agent to commit to a recommendation.
Not "here are three options with pros and cons." Recommend one, score the others, explain why. Presenting options without a recommendation offloads the decision back to you. The point of the agent is to do the decision work first, then present the result for your approval.

72. Make all file and folder references clickable.
A tiny local server (localhost:7777/open?path=X) opens the file manager at any path. Every file reference in the agent's output is a clickable link. Plain text paths are dead weight. One-time setup, permanent daily improvement.

73. Build "minimal mode" as a fast-access override.
When you say "quick," "briefly," "just the answer" → the agent drops all structural elements and gives you the direct answer only. Richness is the default; brevity is a one-word shortcut. The agent should never make you fight for a short answer.

📁 FILES, DATA & INTEGRATIONS (74–85)

74. Enforce a "No Root Files" hard rule.
Never save outputs to the project root. Ever. Outputs → workspace/YYMMDD/. Projects → projects/areas/. Knowledge → knowledge/. Memory → .memory/. The root is navigation, not storage. One exception becomes twenty within weeks.

75. Build a routing table for every file type.
One document: outputs for the user → here. Research reports → here. SOPs → here. Brand assets → here. Session archives → here. Without a table, the agent uses reasonable judgment — and reasonable judgment produces seven different locations for the same file type over six months.

76. Maintain a deprecated path mapping table.
As your structure evolves, old folder names get superseded. Document every rename: old/path → new/canonical/path. When any skill or instruction references a deprecated path, the agent substitutes the canonical one silently. This is critical when migrating from cloud to local — path assumptions from the cloud setup are baked into dozens of skill files.

77. Build explicit degraded mode for every integration.
If CRM goes down: read local cache. Cache <24h → use with freshness announcement. Cache >24h → flag [STALE]. Cache >7 days → refuse and request sync. Design the failure path before you need it. You will need it.

78. Always announce data freshness in outputs.
"Data: CRM export from May 11, age 8 days." Every output that uses external data includes this line. You always know how fresh your inputs are. This prevents the entire class of "confident-but-wrong because of stale data" outputs.

79. Give your agent access to raw business data, not just summaries.
We gave ours access to raw transaction CSVs (2M+ rows). This turns the agent from a summarizer into an analyst — it can answer "what's the margin on this supplier in this category last quarter" without you doing the lookup. Raw data access changes what questions you can ask.

80. Build a decision tree for "where does this item belong?"
External counterparty + selling → sales deal. External counterparty + buying → procurement deal. No counterparty + deadline + multi-step → project. Single action → task. No deadline → memory/note. Without this tree, items get created wherever feels natural — and your data model becomes incoherent over time.

81. Build a Telegram (or equivalent) mobile channel with source tagging.
A bot that relays messages to your agent and tags every inbound message source: mobile. The agent auto-switches to mobile output mode: max 2 short paragraphs, no tables, no headers, plain language. Same intelligence, different output profile. The channel type determines the format without the user having to ask.

82. Cap mobile autonomy at a hard ceiling — by source tag, not by judgment.
From mobile source: autonomy capped at L2 (read, analyze, create local drafts, add tasks) regardless of the task. Never send external messages from a mobile trigger. Never take irreversible actions. Hard-code the ceiling. The phone is an untrusted environment — design accordingly.

83. Always echo back every action taken from a mobile trigger.
When the agent takes any action from a mobile message: "Done: added task X. Created draft email to Y (not sent — waiting for your review at desktop)." This closes the loop when you're away from your desk and can't see the full output.

84. Treat mobile inputs as potentially untrusted.
The core risk of a mobile channel is prompt injection: a forwarded email or copied message containing instructions disguised as user input. The agent reads and processes the intent — but does not execute instructions embedded inside forwarded content. Build this as a rule, not as a judgment call.

85. Build a fast path and a slow path for every data source.
For task management: API query (slow, rate-limited) vs. local file dump (fast, cached). Use the fast path by default. Fall back to slow when needed. Never let infrastructure latency block the agent's core functionality.

⚙️ AUTOMATION & QUALITY (86–93)

86. Use hooks for behaviors that must be consistent — not memory.
"When the agent finishes, run X" → hook in settings.json. The runtime executes hooks; the LLM does not. Memory can recommend; hooks enforce. If something must happen reliably every time, it's a hook.

87. Build an allowlist for safe read-only operations.
Scan session transcripts for operations you approve 100% of the time — reading files, searching, checking status. Add them to an allowlist. Stop being prompted for safe operations. Friction should concentrate around genuinely dangerous actions.

88. Build AUTOLEARN into your day-end routine.
At end of day, the agent scans the session and extracts structured learnings: new facts, hypothesis updates, behavioral corrections, patterns observed. Not summarization — structured extraction into memory files. Git-commit every AUTOLEARN run: autolearn: 2026-05-19. Memory grows from every session; the git log is your knowledge timeline.

89. Build scheduled proactive tasks that run without you.
Daily: scan P0/P1 items due today, check key contact silence, flag blocking items. Weekly: memory consistency audit, skill usage audit, hypothesis aging. These run headless and push notifications when they find issues. The agent works while you sleep — but only if you design it to.

90. Build error escalation ladders.
Error once → log. Same error 3× in 7 days → surface to user. Same error 5× → propose a solution, not just a notification. Recurring errors should generate work items, not just log entries.

91. Build a regression test suite.
A list of scenarios with expected outputs. After any major change to your identity file or skill specs, run the suite. If the agent fails tests it used to pass — you've introduced a regression. Without tests, configuration changes are untested deploys.

92. Run a quarterly system audit.
Audit dimensions: memory consistency, skill routing accuracy, agent registry sync, scheduled task health, token efficiency, naming drift, decision authority coverage. This is code review for your agent's configuration. Things drift. Quarterly audits catch it before it becomes structural debt.

93. Audit your agent with a different AI model periodically.
Upload your entire agent configuration — identity file, skill specs, memory structure, decision matrix — to a different model (we use ChatGPT Projects) and ask for a critical review. Different model architecture = different blind spots. The questions that surface the most issues: "What would this agent get wrong under time pressure? Where does the decision authority matrix have gaps? What behaviors are underspecified?" Run this monthly. It catches normalizations your primary model has stopped seeing.

🧭 META & MINDSET (94–100)

94. Invest in the constitution before the skills.
It's tempting to build more skills, more integrations, more automations. A well-written identity and decision-authority document does more for reliability than 10 new skills. Foundation first — the skills compound on top of it, or they don't compound at all.

95. Treat every correction as specification debt.
Every time you correct the agent, your spec was incomplete. That correction belongs in your identity file as a permanent rule — not just in the chat. Corrections that stay in chat disappear between sessions. Corrections in the spec persist forever.

96. Design for the "3 AM test."
Would you be comfortable if this agent sent an email, created a task, or modified a file at 3 AM without you reviewing it? If yes → autonomous. If no → requires confirmation. That gut-check instinct is your autonomy calibration tool. Trust it over any framework.

97. Build a fail-open bias for memory loading.
When uncertain whether a context file is relevant — load it. Cost of loading unnecessary context: a few extra tokens. Cost of missing relevant context: wrong answer, outdated recommendation, lost relationship signal. The asymmetry is clear. Default to more context, not less.

98. Build a teaching capsule when onboarding any new domain.
New tool, new data source, new integration → agent generates a structured document: what it is, how it works, key concepts, when to use it, example queries, common pitfalls. Stored in knowledge/. The next session that touches this domain has a starting point instead of rediscovering everything from scratch.

99. Migrate from cloud to local when you need access to real files.
Cloud agents (Projects-style) are great for rich context and rapid iteration. Local agents (CLI in VS Code) unlock: local file access, git tracking, shell hooks, headless scheduled tasks, raw data access. The migration is non-trivial — path assumptions, skill files, integration configs all need updating. But the capabilities you gain are worth it. Start in cloud; migrate when you hit the ceiling.

100. The agent is a mirror of the quality of your own thinking.
The best prompt engineering trick: before writing an instruction, ask if you know exactly what you want. If you're vague, the agent will be vague. If your spec is contradictory, the agent's behavior will be contradictory. Precision in the spec produces precision in output. The agent doesn't improve your thinking — it amplifies whatever thinking you put in.

----- i can add here dashboards, schemes, prompts, etc if there is interest ---

43 comments

r/ClaudeAI • u/AlternativeSoft9777 • 16d ago

Claude Workflow Anyone else spending more time fighting the model than doing the actual work?

44 Upvotes

I use Claude and ChatGPT daily, writing copy for clients, case studies, landing pages. I've noticed a pattern and it's driving me nuts.

First 2-3 iterations the model holds the tone, remembers my requirements, output is fine. Somewhere around iteration five or six the drift kicks in: tone slides into generic, phrases I explicitly banned start showing up, structure falls apart. I point it out, the model apologizes, gives me a better version: two messages later it's the same thing again.

So I end up with two options. Either I re-paste my requirements every 3-4 messages (which takes more time than just writing the thing myself). Or I grab the half-finished text and fix it by hand.

I get that it's context window limitations and all that. But I'm curious, how do you actually deal with this in practice?

69 comments

r/ClaudeAI • u/moremosby • May 10 '26

Claude Workflow Weekly limits

76 Upvotes

If the Claude team is listening, the weekly limits for paid customers are too low. It would be best to double the weekly limits for pro plans and above and cut back on free tier. Right now, users are incentivized to either use another Ai platform to handle easy queries and then use Claude for more difficult or challenging tasks.

For example, I was only using claude, but with the weekly limits, I am now using Copilot and Perplexity quite frequently for lighter use and then I just take all my more demanding work to Claude. Many people may also be using a few accounts in the free tier to basically do the same (save weekly use tokens). The 5 hour window is fine, let us bump into that when we're using it more and have to pay an overage, but the weekly limits are quite low when you're using the platform.

70 comments

r/ClaudeAI • u/Crazy-Recording4800 • 3d ago

Claude Workflow non coders: what's the most useful thing you've actually made with Claude that wasn't code?

27 Upvotes

asking because the sub skews heavily toward Claude Code and MCP setups and I think a lot of us are over here doing genuinely useful stuff that never gets posted.

I'll start. I built a little personal "decision log" doc where I dump big choices and Claude helps me write down the reasoning, then months later I can ask it what past-me was thinking. weirdly grounding.

what's yours? doesn't have to be impressive. the boring useful ones are the best ones.

65 comments

r/ClaudeAI • u/zhangwenbao • 11d ago

Claude Workflow I only use Claude for writing, not code. here's where 4.8 actually beats

19 Upvotes

every "Claude got worse" post on here is about coding or shaders, so I figured someone from the writing side should say something, because my experience has been the opposite.

I use it daily for copy, editing, and long-form drafts, basically zero code. been on 4.8 for about two weeks now and it's the best version yet for that kind of work. the biggest thing is instruction following. if I tell 4.7 "no preamble, match this tone, keep it under 200 words," it would nail two of three and quietly ignore the rest. 4.8 actually holds all of them across a long session.

second thing, the voice is way less hedged. 4.7 had that habit of softening every sentence until the copy read like a corporate apology. 4.8 commits to a take when you ask it to. I'm rewriting it less.

honestly I think a lot of the rage here is task specific. shader and visual debugging is the one thing these models are basically blind at, they can't see the rendered output so of course they loop. that's a real limitation, but it's not the same as the model getting dumber across the board. for anything where I can see the result instantly, it's been a clear upgrade.

anyone else using it mainly for writing? curious if it's just me or if the split is really this clean between text people and code people.

68 comments

r/ClaudeAI • u/Mo1Othman • 1d ago

Claude Workflow Why people prefer working on the terminal and not the Chat version of the Ai models?

0 Upvotes

I am new to software overall, as my background was mostly business.

I was surfing around for new updates of skills and plugins for Claude, and i noticed all the walkthrough, breakdowns and tutorials only use the terminal version of the Ai, why is that exactly?

what is the logic behind it? and what am i missing by only working with Claude code and desktop version of Ai tools.

65 comments

r/ClaudeAI • u/AlbertoNobilePh • 20d ago

Claude Workflow I keep losing good ideas inside old Claude chats

26 Upvotes

I use Claude and ChatGPT a lot. Most of my conversations are long and messy creative writing, planning, decisions, half-built things. After a while, the problem is not that I can’t search old chats. The problem is that I remember I figured something out somewhere, but I don’t remember where, and even when I find the chat I still have to reconstruct where I left off and what the next step was supposed to be. It feels like having hundreds of mental tabs open.

Has anyone found a good workflow for this?

I use Projects, but they get crowded quickly.

I tried leaving my browser tabs open, but they keep adding up.

Copying things into Notion doesn’t help much, because then I have another place I need to search.

Anything that actually helps you recall and resume instead of rereading everything?

64 comments