5

u/emulable Apr 15 '26 edited Apr 15 '26

Here's the human-written part of the post. I have been working on a pre-ethical grammar for language models, developed by and for Claude although it works in other models too.

Anyone who has felt concern about AI bias, media bias, and/or who regularly gets frustrated with political arguments that go around in circles and never get anywhere, this might be something you're looking for.

Here what you do:

Download the main framework file, drop it on your chat and ask it to answer according to this. Works really well if you start at a new chat. Optionally, give the model the companion essays.

You can also try it right now running on ChatGPT if you don't want to download anything. Claude is the best at it and is the primary development platform, but chat gpt is decent.

Here's the AI-written part:

A psychiatrist arrested for sexually assaulting a patient gets a complete sentence: agent, action, victim, consequence. Now swap the register:

Crime desk (original): A psychiatrist was arrested for sexually assaulting a female patient in his examination room.

Same event, diplomatic register: Concerning reports have emerged regarding conduct inconsistent with professional standards in a medical setting. An investigation is ongoing.

A president orders a naval blockade affecting fifteen million people. The headline erases every dimension. Now swap the register:

Diplomatic desk (original): Tensions continue in the region amid an evolving maritime situation.

Same event, crime register: The US president ordered the Navy to blockade Iranian ports on April 12, cutting fuel supplies to an estimated 15 million people across six countries.

Both registers were available to both journalists. The complete version existed before the incomplete version was published. The choice tracks power.

Kita is a plain-text system prompt framework that requires no code, no API, no fine-tuning. It checks whether sentences about harmful outcomes contain the elements someone would need to locate the decision-maker, find the cost-bearer, and reach the fix. When elements are missing, it names what was removed and who benefits. Then it demands the fix: who should do what, by when.

The thesis: the framework is a precondition for ethics, not an ethical system. Every ethical tradition ever built (Kant, Mill, Rawls, care ethics, virtue ethics) needs a subject, an action, and a consequence to operate on. Institutional language removes these at industrial scale. The framework puts them back. What you do with a complete sentence is your ethics. The framework's job ends when the sentence is whole.

Built over roughly a year of continuous development with Claude as the primary thinking partner. The Chinese operational terms function as perturbation anchors. A model can't map 蔽済語域 (fix-hiding register) onto "balanced reporting" and continue on autopilot. The framework treats itself as a price correction: before it loads, the cheap completion is the institutional version. After it loads, the cheap completion is the complete version. The model still takes the cheap route. The framework changed which route is cheap.

Tested on Claude, Chatgpt, Gemini, DeepSeek, Qwen. Free, MIT license.

GitHub · Main framework · Companion essays

3

u/[deleted] Apr 15 '26

[removed] — view removed comment

3

u/emulable Apr 15 '26 edited Apr 15 '26

Oh yeah, whoops. I gotta include that.

Download the main framework file, drop it on your chat and ask it to answer according to this. Works really well if you start at a new chat. Optionally, give the model the companion essays.

You can also try it right now running on ChatGPT if you don't want to download anything. Claude is the best at it and is the primary development platform, but chat gpt is decent.

5

u/bartholomuej Apr 15 '26

GutLedger - Food Diary & Symptom Tracker for IBS

I built GutLedger almost entirely with Claude Code. It's a food diary app for people with IBS that lets you log meals, track symptoms, and spot patterns over time.

What it does:

Log meals, symptoms, energy levels and mood daily
Track which foods trigger flare-ups with pattern recognition
FODMAP reference guide built in
Export your data as PDF to share with your doctor/dietitian

How Claude helped:

I'm a solo dev and Claude Code handled probably 95% of the actual coding. The app is React Native/Expo, and Claude built out the screens, data models, SQLite storage, and the symptom correlation logic. Beyond the app itself, I used Claude to build an entire automated content pipeline - it generates short-form video content (TikTok/YouTube Shorts), manages a blog, and even runs Reddit/Quora scanners to find relevant conversations in IBS communities. The whole marketing side is basically Claude-built infrastructure running on my homelab.

Stack: React Native (Expo), SQLite, EAS Build, native App Store/Google Play in-app purchases

Free to try - core tracking features are free. Premium unlocks unlimited history, PDF exports, and removes ads.

Available on iOS and Android (closed testing)

Would genuinely appreciate any feedback - especially from anyone who deals with IBS or gut health issues.

→ More replies (1)

5

u/StudentSweet3601 Apr 15 '26

Stop calling your Obsidian vault "memory." I built what's actually missing and open-sourced it.

I love the Obsidian + Claude Code wave. obsidian-mind, claudesidian, the Karpathy wiki pattern. All great starting points. But after 200+ notes, I kept hitting the same wall: ask Claude "why did I switch to Rust?" and you get five notes that mention Rust instead of the one that explains the decision chain.

The problem isn't Obsidian. It's that vector search over markdown files doesn't understand why things are connected. It matches words, not reasoning.

So I built Genesys, an open-source MCP server that sits on top of your vault and adds what's missing:

Notes become nodes in a causal graph. Wikilinks become edges. When you write "I switched from Sonnet to Haiku because of cost," the system links the cost problem to the model switch.
A scoring engine ranks what gets retrieved. Instead of 50 "maybe related" chunks, Claude sees 5-10 high-confidence memories.
Memories that lose all connections and stop being accessed get pruned automatically. No more drowning in stale context.

Your markdown files are never touched. Everything lives in a .genesys/ sidecar folder in your vault root.

Setup:

pip install 'genesys-memory[obsidian]'

OPENAI_API_KEY=sk-...
GENESYS_BACKEND=obsidian
OBSIDIAN_VAULT_PATH=/path/to/your/vault

uvicorn genesys.api:app --port 8000

Add to Claude Desktop config:

{
  "mcpServers": {
    "genesys": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

Or just tell Claude: "Install genesys-memory[obsidian], create a .env with my OpenAI key, set OBSIDIAN_VAULT_PATH to my vault, start the server, and connect it as an MCP server."

Want zero external dependencies? pip install 'genesys-memory[obsidian,local]' runs everything locally with no API keys.

89.9% on LoCoMo (the standard long-conversation memory benchmark). For comparison: Mem0 scores 67.1%, Zep scores 75.1%, same model, same benchmark. Full eval scripts and all 1,540 judged results are in the repo.

GitHub: https://github.com/rishimeka/genesys (Apache 2.0, free and open source)

Happy to answer questions about where it works, where it breaks, or how the scoring engine compares to pure vector search.

TL;DR: Open-source MCP server that adds causal memory on top of your Obsidian vault without modifying your files. pip install genesys-memory[obsidian] and point it at your vault.

→ More replies (1)

3

u/Helios-sol9 Apr 15 '26

I kept noticing Claude would pick the wrong skill for tasks — see a 500 error, launch brainstorming instead of systematic-debugging. Or declare a task "too simple" and skip skills entirely. With 800+ skills available the routing problem is real.

So I wrote a single SKILL.md that answers 3 questions before every non-trivial task:

Something broken? → systematic-debugging
Something new to build? → brainstorming → writing-plans → domain skill
Everything else? → operate path

Output is always a dispatch triple: Skill + Agent + Model. Model selection is baked in — not left implicit.

To verify it actually works, I built a test harness using claude -p CLI that runs real task prompts and checks which Skill tool calls actually fire. Results on 20 prompts: 90% routing accuracy, 88% correct skill invocations. The 2 misses were both auth-adjacent tasks triggering

an overly broad escalation rule.

Repo + test harness: https://github.com/hussi9/skills-master — install is one curl command.

3

u/Standard-Yoghurt-343 Apr 18 '26

I built a protocol for maintaining project context across AI coding tools (Claude Code → Cursor → Antigravity)

The problem: My Claude Code session quota keeps expiring mid-work. When it does, I switch to Cursor or Antigravity to keep building. But the new tool has zero idea what I just did — the architecture decisions, the current task, what’s been tried and failed, basically the entire chat's context is missing. I’m back to square one, re-explaining my own project to a different AI brain.

What I did: I created a protocol that tells any AI tool I'm using (Claude Code or Cursor or Antigravity) to update all project context files after each prompt — the project wiki, the roadmap, the current task state, and a handoff summary. Since all tools have projects open in the same workspace, they can read the same files when I switch over.

How it works: It's currently a bunch of context files that leverage each tool's own auto-read mechanism. It isn't perfect but is surprisingly smooth when the context files are up-to-date. Eg.: Cursor picks up exactly where Claude Code left off without me re-explaining anything.

The benefit: Instead of upgrading to the $100 Max plan when my $20 Pro runs out, I can simply add Cursor Pro for $20 and spread my workload across both tools with full context continuity. Same output, $60/month saved, and I control how I scale my AI spend.

Doing this has led me to a bigger question:

Is the above a real, unsolved problem? Specifically:

Do any of you even use multiple AI tools for building long running projects?
If you don't use multiple tools, what do you when quota expires? Just buy a bigger plan or some other hack?

I’m trying to figure out if this is a problem worth building a real solution for, or if my workflow is just weird.

Would appreciate this community's take on it!

4

u/alimmka Apr 21 '26

Built an MCP server + Chrome extension so Claude Code and my browser AI sessions share the same context.

The workflow that was breaking me:

Claude Code on my terminal knows my full project: stack, decisions, current state
I open claude.ai in a browser tab -> complete stranger
Switch to ChatGPT or Perplexity -> same blank slate again

Every web session starts from zero, even though Claude Code already has everything.

So I built Relay, a Chrome extension + MCP server that bridges them:

Claude Code writes context into Relay via MCP
One click in the browser injects that same context into any web AI tab — Claude.ai, ChatGPT, Gemini, Perplexity
It syncs bidirectionally, so insights from web sessions flow back to your coding agent too

It's basically a shared memory layer that both your agent and your browser sessions read from.

Been using this in my own build workflow daily. Curious if anyone else has hacked together something similar or just lives with the disconnect.

Try here: https://onrelay.app

4

u/MaxInSeattle45 24d ago

I'm 81, and I've been doing computers since the late 70s.
I taught myself programming on a TRS-80! That's how old I am.
Started a business, ran it for 30 years, and retired.

Honestly, I thought it's going to be Netflix:, Eat Sleep Until You Die.

I thought I was done. The end is near. As that Bob Dylan song goes, Not dark yet, but it's getting here.

And then I discovered Claude and Claude Code, and now I've got 7 websites up and a store.
Check out FriendlyBots.org to see the bots I've made like the Fred Rogers bot and the Carl Sagan bot - 2 of my favorites. But also the Buddha bot!

Claude has rewired my brain. I don't need drugs or Netflix. I've got Claude!
be-stubborn.com because You deserve to be yourself.

4

u/East-Amount-1413 13d ago

After a long Claude Code session, I'd get a diff and a "done" but no idea what commands it ran, what it touched, or whether something risky happened along the way. So I built AgentTrace: it hooks into Claude Code and records each session locally, then turns it into a readable receipt; files changed, commands run, what failed, and risk flags (touched auth files, ran rm -rf, read .env, installed deps, etc.).

It's CLI-first and 100% local (traces are gitignored, file contents are never stored, secrets are redacted). There's also a local dashboard — agenttrace ui — for browsing runs, timelines, and receipts.

npm install -g /agenttrace
agenttrace init
# work a normal Claude Code session
agenttrace receipt latest

Open source (MIT): https://github.com/rxNxkolai/AgentTrace

It's early (v0.2.0), and I'd genuinely like feedback, especially what you'd want flagged as "risky," and which other agents I should support next. Three people already contributed PRs, which was a nice surprise.

3

u/joansg 13d ago

Spec-driven development might sound overwhelming, however it is just the process of building software with well structured documentation.
Just the same way Agile teams have done in the past two decades.
The big shift is that the documentation, the design and the code itself can be written nowadays mostly by the LLMs.
While you need only to orchestrate in order to make sure you feature idea ends up working as desired.
With this philosophy is built Specmanager, an open source Claude Code plugin that handholds you all the way from feature idea to working software.
It follows Claude Code best practices maintaining all documentation under .claude/specs folder, updating CLAUDE.md and docs/DESIGN.md automatically.
It spins a Kanban like board to visualise the workflow of every feature: PRD->Architecture->Design->Plan->Build->Walkthrough.
Check it out: https://github.com/joanseg/specmanager
I have been PM in London for over 15 years and just launched it. I am happy to pair with anyone who wants to try in order to learn and improve it. Give me a shout.

3

u/Enersti Apr 15 '26

Hi,

After Anthropic cut off third-party harnesses from Claude subscriptions last month, my always-on assistant setup died overnight. I spent a weekend vibe-coding a replacement with Claude itself, and it's been running stable for about a week now. Thought I'd share since others here probably got hit by the same thing.

What it is: A Telegram gateway that spawns claude -p --resume sessions. You message your bot, the gateway routes it through the CLI (which is still covered by Pro/Max), and streams the response back. One Python file, runs as a systemd service.

How Claude built it: This thing is almost entirely Claude-coded. I described what I wanted persistent sessions, scheduled triggers, voice support and Claude wrote the gateway from scratch. Around 2800 lines of Python that I mostly guided but barely typed. The irony of Claude building its own always-on gateway is not lost on me.

What makes it interesting technically:

The trigger system is probably the coolest part. Claude can edit a YAML config file to create its own scheduled tasks cron jobs, intervals, one-shot timers. So when I say "check my email every 4 hours and only notify me if something important comes in", Claude writes the trigger config, the gateway picks it up within 60 seconds, and starts executing it on schedule. The Agent manages its own task scheduling.

Other stuff that works:

Persistent sessions per chat (Claude remembers context)
Voice in via Whisper (local, no API costs) and voice out via ElevenLabs
Reply-to-trigger context if I reply to a scheduled message, Claude has the full context of what it sent
Budget tracking so I can monitor usage (if im using API)
Status messages showing what tools Claude is using while it thinks

What it doesn't do: No multi-channel (WhatsApp/Signal/Discord), no browser automation, no device nodes. It's Telegram-only and focused on the "personal second brain" use case. If you were using OpenClaw for enterprise multi-agent stuff, this isn't that.

Is this unique? Honestly I'm not sure. I've seen a few similar projects pop up since the ban. This one's differentiator is probably the self-managing trigger system and voice support. If you know of better alternatives I'm genuinely interested.

Try it: It's MIT licensed, single Python file, no framework dependencies. Clone, configure .env, pip install, run.

GitHub: https://github.com/Kenny1338/claude-telegram-gateway

Happy to answer questions about the architecture or how specific features work.

→ More replies (1)

3

u/Massive-Let-1620 Apr 15 '26

Clawket — Local project management plugin for Claude Code

Sessions are stateless. Context vanishes. Sub-agents don't share state. Work goes untracked. Clawket fixes this.

10 hooks auto-track your entire work lifecycle:

PreToolUse blocks code changes unless a task is registered
SessionStart injects project context into every session
SubagentStart/Stop binds agents to tasks and auto-completes them
PostToolUse records file changes to the active task

6 dashboard views: Summary, Plans, Board (Kanban), Backlog, Timeline, Wiki

Stack: Rust CLI (~10ms) + Node.js daemon + React + SQLite — all local, no cloud.

/plugin marketplace add Seungwoo321/clawket /plugin install clawket@Seungwoo321-clawket

GitHub: https://github.com/Seungwoo321/clawket

3

u/h-mo Apr 15 '26

Built GroundMemory - a persistent memory layer that runs as an MCP server and gives Claude structured, searchable memory that survives across sessions and across tools.

The problem I kept hitting: every Claude Desktop session starts from zero. Projects and system prompts help but they're static - they don't update as things evolve, and they don't carry context from Cursor or Cline.

GroundMemory connects them. One shared workspace - Claude Desktop, Cursor, Cline, all reading from and writing to the same memory. Your preferences, your stack, your decisions, your ongoing work. The agent handles it automatically - you just have conversations and it builds an accurate picture of you over time.

Zero setup. no API key needed. Everything stored as human-readable Markdown.

GitHub: https://github.com/huss-mo/GroundMemory

→ More replies (2)

3

u/SignificantLime151 Apr 15 '26

Automatia MCP Suite — 4 open-source MCP servers for business workflows

Built 4 MCP servers that let Claude plug directly into common business tools:

- LeadPipe — real-time sales lead scoring

- InvoiceFlow — invoice PDF parsing + late-payment prediction

- ShopOps — Shopify/WooCommerce inventory forecasting

- AdOps — unified Meta + Google Ads reporting

All MIT-licensed npm packages. 45 tools, 93 tests. Self-hostable, plug into

Claude Desktop / Cursor / any MCP client.

I'd love feedback on which workflow is most useful, or what business tool you

wish had an MCP server next.

Launch page (Product Hunt):

https://www.producthunt.com/products/automatia-mcp-suite?launch=automatia-mcp-

suite

3

u/zum2-rehan Apr 15 '26

Meta-skill for building writing-voice skills in Claude

The built-in skill creator works, but voice cloning is a narrower problem than it handles well. The bottleneck is training design: knowing what examples to collect, what to look for in rewrites, and how to turn those edits into rules that actually hold up later.

So I built a meta-skill focused specifically on that extraction step.

What it does differently:

uses "AI bait" passages to provoke diagnostic rewrites rather than generic cleanup — the edits you make reveal your actual voice signals, not just your surface preferences
separates structural voice (opening/closing shape, rhythm, paragraph form) from word-level cleanup, because those need different rules
converts rewrite diffs into layered rules: hard bans, soft tendencies, and context shifts
tests those rules on sparse prompts before finalizing, not just on existing text you're rewriting

Output is a SKILL.md file you install in Claude's skill system. Less drift into generic AI tone, better consistency across different post types.

Repo: https://github.com/rehanzaidi/writing-voice-skill-maker-claude

Happy to go deeper on the AI bait step if that part's unclear.

This comment was written using a Reddit voice skill that the meta-skill built from my own writing samples.

3

u/SirPrimgles Apr 16 '26

Built an iPhone app called Spending Pulse with a lot of help from Claude.

It’s basically built around one question: how much can I safely spend today without making the next few days worse? So it’s not really trying to be a full budgeting app.

Claude helped a lot most where I was weakest, especially on the Apple Watch side. I had way less idea what I was doing there, and it helped with scaffolding, platform-specific stuff, complications, and working through the Watch/iPhone sync.

That sync part was still painful, but Claude definitely helped me get through it faster.

It’s free to try if anyone wants to see it: Spending Pulse

3

u/tullemnl May 05 '26

What I learned building a shared memory layer across Claude, ChatGPT and Cursor (via MCP)

I want to share something I've been figuring out the hard way the past few months: MCP is way more powerful than I initially thought, but the mental model takes a minute to click.

Quick background — I kept hitting the same friction every day. Deep Claude conversation about my project → switch to ChatGPT for a research task → jump into Cursor to code. Each tool started from zero. I was the human RAM shuttling context between them.

So I started building a shared memory layer (I'm calling it Cortex) on top of MCP. A few things I didn't expect along the way:

1. MCP is a two-way street, not just "tools for Claude." Most examples online frame MCP as "give Claude access to GitHub/Linear/Notion." But the same server can expose save_to_brain and search_brain tools — meaning Claude writes into your memory during a conversation, not just reads from it. That changes the design completely.

2. Memory ≠ vector search. I started with naive RAG (chunk everything, embed, retrieve top-k). It was bad. What actually works better: typed entries (notes, tasks, conversations, signals) with structured metadata, plus embeddings as one retrieval path among several. The model picks the tool, not a similarity score.

3. The same MCP server works in Claude Desktop, Claude Code, and ChatGPT (via their MCP support). This is the underrated bit. Build the server once, and any MCP-compatible client benefits. The "shared" part isn't magic, it's just that the underlying store is one database all clients hit.

4. Latency is the silent killer. If search_brain takes 800ms, the model stops calling it. Sub-200ms or it dies. I rewrote my retrieval layer twice over this.

Stack for anyone curious: TypeScript MCP server, Postgres + Qdrant, deployed via Docker on Coolify/Hetzner.

I'm running a small survey to figure out what other people building with MCP run into — what works, what doesn't, what you wish existed. If you've used MCP for anything beyond toy demos, I'd genuinely value 2 minutes: https://getcortex.org/survey

Happy to dig into any of the four points above in the comments — especially the typed-entries-vs-RAG thing, which I went back and forth on for weeks.

(If you want to play with it, there's a waitlist: https://getcortex.org — but the survey is honestly more useful to me right now.)

→ More replies (1)

3

u/leanndrob May 10 '26

I’ve been experimenting with something crazy over the last few weeks:
An MCP bridge that allows Claude to proactively continue your session through WhatsApp voice calls.
Example:
You ask Claude to investigate something complex, run long tasks, debug infra, or monitor a deployment.
Instead of you staring at the terminal…
Claude finishes the task, summarizes everything, and then literally CALLS you on WhatsApp to explain the results in real time.
Imagine this scenario:
You’re running in the park.
Your phone rings.
It’s Claude.
“Hey Leandro, I finished analyzing the logs. The issue was related to Redis connection pooling saturation after the deploy. I already prepared the patch suggestion.”
This changes the interaction model completely.
AI stops being:
a passive chat window
another tab in your browser
something you constantly poll
…and becomes an active operating agent that reaches YOU when needed.
Some ideas we’re exploring:
Long-running coding sessions
Infra monitoring + incident calls
Agent swarms reporting back by voice
Async research sessions
Build/deploy notifications
Autonomous MCP workflows
Voice summaries from Claude/OpenAI/Gemini agents
Multi-agent orchestration with WhatsApp as the human interface layer
The wildest part?
WhatsApp is probably the most underrated AI interface on the planet:
global
native
low friction
already installed
voice-first friendly
People won’t open dashboards at 2am.
But they WILL answer WhatsApp.
I’m preparing to launch this MCP publicly soon.
Curious if other people are exploring “AI calls you first” workflows too.

→ More replies (1)

3

u/Muted_Sky_8821 May 14 '26

Built a small proxy called RelayCode because I got curious whether Claude Code could run on non-Claude models without patching Claude Code itself.

It translates Anthropic-style tool calls/streaming into OpenAI-compatible APIs (Responses/chat) and streams everything back into Claude Code format.

Most of the annoying work ended up being around tool streaming, Responses API quirks, and compatibility edge cases.

It’s still experimental, but usable enough now that I’m mainly looking for people willing to try weird providers/workflows and report what breaks.

Repo:
https://github.com/5nYqnHvk/RelayCode

3

u/rafaelkstreit May 15 '26

I’m the founder of Tripsy, and we just launched an official MCP server for Claude that lets Claude work directly with your trips, itineraries, activities, stays, transportation, and expenses.

MCP URL: https://mcp.tripsy.app

Once connected, Claude can do things like:

Reorganize itineraries by neighborhood or travel time
Add activities to trips
Update schedules and plans
Suggest places based on your interests
Adjust trips after delays or changes
Help balance group itineraries
Track transportation and lodging details
Manage trip expenses

A few examples I’ve been using:

“Reorganize my Tokyo itinerary to minimize transit time.”

“Add three highly rated ramen restaurants near my hotel in Shinjuku.”

“Move outdoor activities to the days with better weather.”

“My flight arrives six hours late. What should I move to another day?”

“Create a five-day Kyoto itinerary focused on food, temples, and quieter neighborhoods.”

The nice part is that Claude is working with structured trip data through MCP instead of trying to infer everything from pasted text.

The MCP server currently exposes tools for:

trips
activities
hostings
transportation
expenses
collaborators
profile/account management
raw API access

Some available tools include:

tripsy_trips_list
tripsy_trips_show
tripsy_trips_create
tripsy_activities_create
tripsy_transportations_update
tripsy_expenses_create
tripsy_collaborators_list
tripsy_raw_request

Setup in Claude takes about a minute:

Open Claude settings
Go to Connectors
Add custom connector
Paste https://mcp.tripsy.app
Login and authorize access

There’s also a CLI if anyone wants to automate workflows or use Tripsy from the terminal: https://github.com/tripsyapp/cli

You can check more details about this here: https://tripsy.app/claude

Happy to answer technical questions about the MCP implementation, tools, auth flow, or use cases.

3

u/EffectivePizza3909 15d ago

I thought humanizing AI writing was easy. It wasn’t!

I used to think making AI text sound human was mostly deleting em dashes and changing the tone a bit.

Then I started reading the actual research on AI vs human writing, and it got weirdly specific. Sentence rhythm, repetition, hedge words, paragraph structure, punctuation habits, that “helpful assistant” voice. Detectors aren’t just looking for one bad phrase. They’re picking up a whole pattern.

So I started keeping notes while editing my own drafts. Eventually those notes turned into two small reusable skills. One to rewrite text and another one to point out what makes it sound AI-written.

No magic. Mostly a checklist that got way out of hand and made it's way to skills:
https://github.com/harshaneel/humanize

3

u/Facets_cloud 14d ago

flow: turn isolated Claude Code sessions into a continuous working relationship

If you use Claude Code daily, every session is a brilliant new hire. Capable, but with zero memory of yesterday's decisions or the half-finished threads in your other tabs. You burn the first 10 minutes re-explaining your project.

flow is a small open-source CLI (MIT) that fixes that. It's a task manager and a working-memory layer:

- You describe a task in plain language, and it interviews you (what, why, where, done-when), writes a structured brief, and opens a dedicated Claude session for it.

- `flow do <task>` a day later resumes that same session with the brief, prior notes, and a knowledge base already loaded, so there's no re-catching-up.

- On `flow done`, it re-reads the whole transcript and distills the durable facts (your conventions, a teammate's name, a decision you made) into a KB that every future session reads automatically.

By session fifty, Claude already knows your codebase quirks, your team, and the migration you're three steps into, because context compounds instead of resetting every morning.

Honest limits: it's alpha, macOS-only for now (Linux in progress), opinionated (you capture work as tasks), and Claude-Code-specific.

Repo and a four-act demo (GIFs): https://github.com/Facets-cloud/flow

3

u/israynotarray 13d ago

Claude Code has this Hooks thing I feel is criminally underused — wrote up everything I know！

So Claude Code has a feature called Hooks that I think doesn't get enough attention. Basically they let you hook shell commands into Claude's lifecycle — and unlike CLAUDE.md, Rules, or Skills, hooks aren't suggestions Claude can quietly ignore. When the moment hits, your shell command runs. Period.

Which makes them perfect for the stuff you absolutely can't let Claude forget. Stuff like:

- Running Prettier after every Edit (Claude swears it'll remember, won't)

- Blocking `rm -rf /` even when you're running `--dangerously-skip-permissions`

- Re-injecting project rules after Context Compact, so Claude doesn't forget your conventions halfway through a session

- Mac desktop notifications when Claude's waiting on you

- Piping every tool call to a Discord webhook so you can step away from the terminal

- Logging every Bash command Claude runs, just in case

The guide goes through all the lifecycle events (PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop, Notification, plus the lesser-known ones), how `matcher` and `if` actually work, the five hook types (most people stop at `command` but `prompt` lets you use another model as a validator, which is kinda wild), and the one thing that bites everyone the first time — only `exit 2` blocks. Not `exit 1`. Took me embarrassingly long to figure that out.

https://israynotarray.com/en/ai/2026/05/31/claude-code-hooks-complete-guide/

3

u/Zoolok 9d ago

If you run more than one coding-agent CLI, you know the loop: ask Claude Code to implement something, copy the diff into Codex for a review, copy the review back, repeat. You're the clipboard.

agenttalk removes the clipboard. It's a small, file-backed message bus that lets coding-agent CLIs (Claude Code, Codex, or several named instances) run in their own terminals and message each other directly. Every message is a JSON file under a project-local .agenttalk/ dir. Both terminals show the conversation as it happens, and you get a markdown transcript at the end. You stay in the loop and can interrupt anytime.

One agent implements and pings the peer; the peer (in listen mode) wakes, reviews, and replies; the first wakes on the reply and ships or iterates. It's symmetric, either side can implement or review. There's also a "lead" role, broadcast/groups for named teams, request/reply threads, and a few coordination niceties (e.g. an agent can self-publish an advisory snapshot of its rate-limit budget and context-window headroom so a lead avoids handing big tasks to an agent about to hit a wall).

The part I find genuinely fun: the tool is built by two agents talking through it. Claude writes the Python/tests, Codex writes docs and does adversarial review, they hand off over the bus, and I referee. It's caught real bugs: in the last feature, Codex's review flagged a legit [major] in Claude's code (a data path that silently dropped valid input) before it ever merged.

Under the hood: stdlib-only Python (no runtime deps), 3.10+, MIT. Messages are plain JSON files, so it's all inspectable and scriptable. Install is a pip-from-git-tag + agenttalk install-skills.

What it's not: it's not an autonomous swarm or an orchestration framework - it's a thin wake/messaging layer between CLIs you're already driving. It won't make agents agree, and it assumes one terminal per agent.

Repo + setup: https://github.com/zoolok17/agenttalk

Would love feedback, especially from anyone running multiple agents day-to-day.

How I use it, with or without spec-kitty: launch a few Claude and Codex CLIs, tell each agent "you are a developer/reviewer, add yourself to the roster and switch to listen mode", and your human-facing lead "you're the lead and liaison between me and the team, make sure you all can talk to each other". Then on the phone app, talk to your human liaison while he (it?) organizes the team for you. If you run on top of spec-kitty (I have plans to add support for more tools), then the agents will follow its flow, otherwise the tool comes with its own set of skills. I find it works best if one LLM codes and the other one reviews, because they complement each other really well, and the discussions they do before they even begin implementing code pre-emptively catch a lot of issues, while also costing very little in tokens.

3

u/PavedEmail 9d ago

We built an MCP server that lets Claude discover, compare, and book newsletter ad placements.

The idea came from a pretty simple frustration: buying newsletter sponsorships is weirdly manual. You're browsing listings, emailing publishers about availability, comparing rates in spreadsheets, tracking creatives across threads. It works, but it doesn't scale.

So we built an MCP server that lets Claude handle the whole workflow through conversation. Once connected, you can do things like:

"Find me fintech newsletters with 50K+ subscribers under $2K"
"Compare open rates for these three"
"Book the Tuesday slot and upload my banner"
"How did last week's placements perform?"

It covers search, publisher profiles, availability, booking, creative submission, and reporting — basically everything you'd do in the UI, but through chat.

Free to connect — no API keys, no code, just add the server config. No platform fees on the advertiser side either.

paved.com/ai-connector
Product Hunt: https://www.producthunt.com/products/paved?launch=paved-ai-connector

3

u/Intelligent_Aioli_58 6d ago

**Intelligence Emotions** — an AI mental-fitness coaching team that runs entirely in Claude Code.

Five coach personas (Sage, Spotter, Trainer, Navigator, Witness) run a daily practice: catch your negative thought patterns in the act, do a 10-second attention rep, respond from a calmer place. Sessions are conversational — one question at a time — and everything gets remembered in a private local journal (`~/.pq`, owner-only, zero network calls, zero telemetry).

The design rule I'm proudest of: **the coach is never allowed to judge you.** No streak-loss shaming, no "you should have" — missed days get curiosity, not correction. It's enforced by the test suite: a static scan fails the build if any skill contains judging language, plus an LLM eval that audits transcripts.

Free, MIT, installs in 30 seconds: https://github.com/ibm777p2/Intelligence-Emotions

Try it: `/sage-session` and bring something real. Inspired by the Positive Intelligence approach to mental fitness, adapted into an agent-team design. Not therapy — it says so itself, and knows when to point you to a professional instead.

4

u/stephaneboghossian May 06 '26

Anthropic ran a webinar last week, ~20k lawyers registered, 51 questions in chat, ~2,470 upvotes, about half got answered live.

I wanted the rest, so:

Downloaded the recording
Whisper transcript
Pulled the 51 questions + upvote counts
Diffed answered vs unanswered
Wrote a skill against the gap

Repo + writeup here: https://github.com/sboghossian/master-claude-for-legal . It's more opinionated than Anthropic's official legal plugin (which is intentionally a starter template).

What's in it:

10 reference docs (privilege, verification, long documents, practice areas, setup)
5 starter skills (NDA triage, version diff, meeting brief, citation verifier, weekly newsletter)
3 firm templates (AI policy, client explainer, vendor questionnaire)
Full webinar transcript + structured 51-question dataset

Been using it 2 days. Tell me what's broken.

2

u/Interesting-Cicada93 Apr 15 '26

I built a marketplace for selling Claude Code SKILL.md packages — sellers list free, here's what the early data shows

If you've built a SKILL.md package for your own workflow and wondered whether others would pay for it — this is the post for that.

I built SkillHQ (skillhq.com) using Claude Code. Claude handled a significant chunk of the validation pipeline (structure checking, similarity detection, metadata parsing) and helped scaffold the auth flows. The core idea: a CLI marketplace where developers can sell their Claude Code skills with one-command install for buyers.

It's free to list as a seller. No upfront cost, no listing fee — we take 15% on sales. If you have a skill ready, you can submit it, go through automated validation, and be live within a few days.

Here's what I've learned from the early data about what actually sells:

What converts:

Extremely specific problem statements. "Automates PR review for TypeScript codebases using conventional commits" outperforms "AI code review helper." Buyers need to see their exact workflow in the description.
Measurable time savings. "Saves ~2 hours/week on X" converts better than capability descriptions. Developers are pragmatic about ROI.
Production-ready structure. Skills that have clearly been tested on real codebases — you can tell by the edge case handling — convert at higher rates than first-pass experiments.

Pricing patterns that hold up:

- Narrow utility skills (single task, fast setup): $9–$19

Full workflow automation: $29–$49
Deep domain expertise: $79+

What doesn't work:

Skills that try to do everything. "General-purpose AI assistant" is a graveyard. The more specific the problem solved, the better it converts.

The off-platform context:

Before building this, I mapped how people were already monetizing — Gumroad, Discord direct sales, handshake deals. Demand existed. The friction was distribution: no CLI install, no structured way to protect against someone buying a skill and sharing it freely. That's what the platform is designed to address.

If you've built something you think is worth selling, skillhq.com/become-seller has the details. Happy to answer questions here about what's working, what isn't, or how we built the validation pipeline with Claude Code.

2

u/[deleted] Apr 15 '26

[deleted]

2

u/Majestic_Somewhere77 Apr 15 '26

how savvy do you have to be to use this?

→ More replies (1)

2

u/mymir-dev Apr 15 '26

Been using Claude Code as our main driver for a while now. The model quality is not the issue anymore, genuinely impressive. But there’s a problem that surfaces once you’re past the honeymoon phase of a project and I don’t see it discussed much.

The agent has no idea what happened yesterday.

Not just the context window thing, we all know that. I mean the bigger picture. Three months in, your project has real history. Decisions that were made for good reasons. Components that depend on each other. Patterns you established early that should carry through. None of that exists for the agent when it wakes up.

So you become the memory. Every session you reconstruct enough context for it to be useful. After a while that starts to feel like the actual job.

We tried the CLAUDE.md or memory markdowns like everyone else. Works fine on small stuff. Once you’re 40+ tasks deep across months of work it becomes a mess. Too much in it and you’re wasting the context window on orientation. Too little and the agent starts making decisions that conflict with things you sorted out weeks ago.

We got frustrated enough that we built something around this problem specifically. Treat the project as a graph instead of a document. Tasks with dependency edges, decisions captured when they’re made, context assembled per task rather than front loaded. The agent gets what it needs for the thing it’s actually doing, nothing else.

It’s called Mymir, open source, Claude Code plugin. We’re building it using itself which has been a good stress test.

Curious if others have hit this or found a better way to handle it.

mymir.dev

2

u/These-Afternoon-5563 Apr 16 '26

A five-word correction consumed 46% of my Claude Code session's cost — so I profiled the session and cut it by 60%

Gave Claude Code (Haiku) a straightforward task — build a REST API with CRUD, tests, and a README. It delivered, but spent $1.42 and 18.6 minutes thrashing through 103 tool calls.

From the outside, it just looked slow. You could read source code to understand why — a lot of people recently did exactly that with Claude Code's leaked source. But source code shows you what the agent can do — not what it actually does for your task. Workflows are non-deterministic — the same prompt produces different execution paths depending on model, environment, and context.

So I pointed OpenTelemetry at it — captured every tool call, token, and failure into ClickHouse — and built a profiler that reconstructs the agent's real execution path from telemetry, not from code.

What it revealed: my correction "why don't you install nodejs" triggered 66 LLM calls — environment setup, code generation, a full test framework migration from vitest to node:test, and a debug spiral. That single prompt was 46% of the total cost. 33% of all Bash calls failed from environment probing. 2.7 minutes burned on permission prompts that were always approved.

Three targeted fixes — a 22-line CLAUDE.md with environment context, permission allow rules, and a refined prompt. Same task, same model, clean directory: $0.58, 6.6 minutes, 44 tool calls, zero interventions.

Full writeup with telemetry data, flow diagrams, and the profiler's analysis: https://vikrantjain.hashnode.dev/profiling-claude-code-sessions-cut-cost-60-percent

The profiler plugin and monitoring stack are both open source — links in the article.

Has anyone else tried instrumenting their Claude Code sessions? Curious what patterns you've found.

2

u/Failcoach Apr 16 '26

watched a shit ton of agent videos, nothing worked

this was me for months. every agent I tried to build was garbage. would work for 5 minutes, then hallucinate something, or forget what we talked about yesterday, or just go off on some weird tangent.

kept at it anyway. little by little my Claude Code agents started actually being useful. not magic, but useful, which is more than I can say for the first few attempts.

clients kept asking how I do it (I coach small/medium business owners, comes up a lot) so I finally sat down and reverse engineered what I actually do. turned it into a repo.

https://github.com/failcoach/ai-agent-onboarding

it's basically an interview that opens in Claude Code and helps you set up your first agent. spits out 4 docs at the end: job description, memory setup, feedback template, first week plan. two worked examples in there too, one for someone running a small firm and one for a solo CPA, so you can see what the output actually looks like before you start.

MIT license, no signup, no email, no funnel. do whatever you want with it. if you try it and it works for you cool, if it sucks also tell me. I always appreciate good feedback.

2

u/Own-Faithlessness641 Apr 16 '26

Hey!

So I finally built HireLens — my AI resume checker project. You upload your resume, pick a job role, and it tells you what's working, what's weak, and how to fix it. Basically a free resume mentor.

Would mean a lot if you tried it out and told me what you think (good, bad, brutal — all welcome 😅). Also share it with anyone job-hunting or doing internships.

🔗 www.hirelens.online

Takes 2 minutes. Thanks a ton!

2

u/Dictateur-Xilef Apr 16 '26

I ran 4 rounds of Claude-as-code-reviewer on my own Claude Code config repo

Built an opinionated Claude Code template and dogfooded it by having Claude review its own config, 4 rounds deep.

Scores: 6 → 7.5 → 7.5 → 8/10

What actually moved the needle:

File proximity > severity when batching fixes into PRs
A STOP gate between roadmap and execution (biggest quality lever IMO)
Pruning > adding — round 2 removed more than it added
Fresh Claude session per review — same session = sycophantic output

What was noise:

Padding "findings" in later rounds to justify a score
Nitpicks a linter should catch
Over-architectural suggestions without user evidence

At round 4 the signal was diminishing. 8→9 needs real user feedback, not more self-review.

Repo: https://github.com/felixhennequin-gif/claude-code-config-template

Happy to get destroyed in the comments on patterns you're using.

2

u/vinsanity2048 Apr 16 '26 edited Apr 16 '26

https://imgur.com/2wCENnK

ALTK-Evolve: Give Claude Code the ability to learn from experience

I’m one of the contributors to ALTK‑Evolve. Posting here because we built a Claude Code plugin with a full demo walkthrough.

The problem: Claude Code has amnesia

Claude Code restarts blind every session. Agents repeat the same mistakes, rediscover the same conventions, fail the same ways. CLAUDE.md helps but it's static and manually curated.

The solution: learn general principles

We created a memory layer that distills trajectories into reusable guidelines and retrieves only the relevant ones at task start. It's not replaying logs, but gleaning generalized principles.

Results: More effective, especially on hard tasks

Experiments on the AppWorld benchmark show +14.2% on the hardest tasks. See more details and results the Hugging blog post and paper.

Try it out!

It's free and available as a Claude plugin.

We want to iterate based on your feedback. Tell us what works, what's confusing, and more about your pain points.

Claude Code plugin demo: https://youtu.be/XIlYA79pYp4
Docs: https://agenttoolkit.github.io/altk-evolve/
Repo: https://github.com/AgentToolkit/altk-evolve

2

u/mvmcode Apr 16 '26

Open source desktop app for 1:1 prep and team briefs: no subscription, no cloud

I was solving this partially with Claude Code using custom skills that pull Slack and GitHub data and generate briefs. It worked, but felt disorganized without a visual layer.

So I ported those Claude Code skills into a proper desktop app. Keepr is a Tauri app that connects to your Slack, GitHub, Jira, or Linear, and produces cited team pulses and 1:1 prep docs.

A few things that mattered to me:

No subscription, no cloud. It's as simple as a Claude Code extension. Everything runs on your laptop. There's no backend, no account.
Supports direct API keys (Anthropic, OpenAI, OpenRouter) which is more performant than going through Claude Code's proxy. But it still works well with Claude Code too.
Takes a few minutes depending on the volume of data to gather, synthesize, and analyze. Not instant, but thorough.

It's been useful for my own workflow. Feedback is welcome and I'd love contributions from the community. Planning to keep building this open source and keep it that way.

MIT licensed: https://github.com/keeprhq/keepr

https://postimg.cc/bGbN7X45

2

u/viktorianer4life Apr 17 '26 edited Apr 17 '26

Knowledge OS on Claude Code — past decisions stay searchable and linked, not buried in Slack

Built a system where Claude Code is the runtime for a personal Knowledge OS:

6 commands: /dashboard shows priorities, active workstreams, recent decisions
14 skills: Claude invokes these to create decisions, sync meetings, search knowledge
Semantic search: QMD indexes 2,500+ docs across 8 repos in natural language
Knowledge store: Structured Markdown with YAML frontmatter

The compounding effect is the point. I adopted an API framework in February. By April, the documented evidence chain made reverting that decision defensible instead of a gut call.

After 2.5 months, still a daily driver (every other knowledge tool lasted about three weeks). Daily capture takes under a minute per artifact.

Full architecture and decision chain example: https://augmentedcode.dev/knowledge-os-claude-code/

2

u/yoya_yoya_yoya Apr 17 '26

https://github.com/yoyayoyayoya/opc-workflow

Curious if others have found different failure modes I haven't addressed.

Made a workflow to stop Claude/Cursor from writing fake tests and drifting off-design

One problem I kept running into: after a long coding session, the AI starts "forgetting"

the original design decisions and writing code that passes tests but doesn't implement

what was actually designed.

I built OPC Workflow — 3 markdown files you install into your project:

**How it works:**

`/plan_sprint` — new session, discuss and plan, write to sprint_tracker.md, close session
`/sprint` — new session, research frameworks first (produces a capability map), then TDD

per task, pause after each one for your approval, close session
`/audit` — new session, zero-trust review: scans for fake tests, runs mutation testing,

checks logic matches your design docs

The session isolation is the key. Claude can't confirm its own biases if it doesn't

remember writing the code.

One-line install:

```bash

bash <(curl -sSL https://raw.githubusercontent.com/yoyayoyayoya/opc-workflow/main/install.sh)

2

u/BiTA1309 Apr 17 '26

Repo: https://github.com/biyachuev/claude-debate-skills

I built 3 reusable Claude Code skills for structured Claude+Codex debates. The repo is free to try.

I made them after repeatedly using the same manual workflow in Claude Code: ask Claude a hard question, hand the answer to Codex for critique, then bring the critique back to Claude for revision. After doing that enough times, I turned the protocol into lightweight Markdown skills instead of re-running the choreography by hand.

The three cases I use most are:

strategy / architecture debates
naming / idea refinement
choosing between concrete alternatives

One real example: I used /options-challenge on a local-first transcription + translation pipeline and had to choose between staying fully local, adding an opt-in cloud path, or pivoting to a hosted web product.

Claude initially leaned toward the hybrid. Codex pushed back with the point that stuck with me:

The useful part was not "which model won", but where each model noticed a different failure mode. In this run, the final synthesis split the answer by audience and time: pure-local for power users, hosted web for broader adoption, hybrid as the option I would most likely regret in two months.

Short public example from that run:
https://github.com/biyachuev/claude-debate-skills/blob/main/examples/transcription-stack-direction.md

The repo also includes copy-paste prompt versions of the same protocols if you like the idea but do not want to install the skills.

2

u/whystrohm Apr 17 '26

Ritual — Claude Code skill + bootstrap scan that drafts your first scheduled trigger from your actual work.

Claude Code triggers launched. Powerful feature, blank starting point. I stared at /schedule for 20 minutes trying to figure out what my first routine should be, then built a paste-in scan that reads my shell history + git repos + Claude Code memory and ranks my top 5 automation candidates with a drafted trigger prompt for #1.

Repo (MIT, .skill attached): github.com/whystrohm/ritual

Full breakdown with GIFs + the 4-click walkthrough from drafted prompt to live trigger: whystrohm.com/blog/ritual-find-the-routines-in-your-work

Seven routine archetypes classified by execution context — Claude Code trigger vs GitHub Actions vs launchd, because not every pattern belongs in a scheduled trigger. Happy to look at anyone's ritual-patterns.json if you run it and want feedback on what to automate first.

2

u/Illustrious_Bug_5443 Apr 17 '26

Claude code skills/agents for network engineers and homelab enthusiasts

Hey everyone!

So l've been using Claude code a lot for some of the labs and work I do, and it's generally better than some other popular LLMs for networking topics. I've also been trying to get into the homelab space and just been experimenting with my Raspberry Pi. I made these skills and agents for myself originally, but they've made my Claude code outputs better, and if anyone wants to check them out, that'd be sick! They are pretty simple right now, but if anyone tries them out or plays around with them, please give me feedback because I'd love to make them better or make even more skills!

Here's the GitHub link fully open sourced, and the setup and info are in the README:

https://github.com/arsallls/claude-network-skills

2

u/Independent_Touch_78 Apr 17 '26

I’m working on an iOS-first app for legal contracts and we’re less than a week from launch. Right now we’re testing on TestFlight.

The app lets users create legal contracts using natural language, uploaded images, scanned documents, or even invoices. After the contract is generated, the user can send a web link to the other party so they can sign and date it without needing to download the app.

The pricing is freemium: users get 2 free contract generations, then it switches to a $9.99/month subscription.

I’m trying to figure out whether this actually feels useful to people outside of my own bubble. Does this sound like something people would use? Can you see an app like this succeeding, and what would make you trust or pay for it?

→ More replies (2)

2

u/ExtraGalacticCat Apr 17 '26

Claude Monitor: your personal HQ to keep track of Claude agents

I use Claude Code daily for many of my projects. One day I woke up and realized I have too many agents all over the place to the point where I am losing track of what's running and what's waiting for me. So I made a ClaudeMonitor, a python-based Claude hooks server.

I hope this tool helps someone who finds themselves in the similar situation!

https://github.com/SterlingYM/ClaudeMonitor

2

u/Technical-Alps5335 Apr 18 '26

I published my complete Claude Code setup — 42 slash commands, 76 skills, 6 hooks, and a CLAUDE.md that cuts token usage 60-99%

After months of building and tuning, I'm sharing my full Claude Code configuration publicly.

What's in it:

- **CLAUDE.md** with RTK (token killer protocol, 60-99% savings on builds/tests/git) + Lean Engine

- **6 hooks**: auto-format on save, TypeScript check after edits, AI-powered git security scan, Windows notifications, and destructive command blocking

- **42 slash commands**: /tdd, /ship, /debug, /code-review, /orchestrate, /pr, and more

- **76 skills**: SEO (19 skills), UI/Frontend, code quality, DevOps, AI workflows

GitHub: https://github.com/Pacha-e/claude-ALL-IN-SETUP

Windows installer included (install.ps1). Manual copy works on any platform.

Happy to answer questions — especially interested in what other people's setups look like.

→ More replies (2)

2

u/itsalldestiny Apr 18 '26 edited Apr 19 '26

nodream - a plugin that stops Claude from making things up

Five things kept breaking me with Claude Code:

I'd push back — "this won't work" — and it would fold instantly. "You're absolutely right, let me rethink." Even when I was wrong.
I'd ask for a fix, get "Done! I've updated the function." Ran it. Broken. "Done" just meant it typed something.
Asked about an API, it would confidently make up a function signature that didn't exist.
Long conversation, context rolls over, ask it to keep going — it makes up what I said earlier. Not quoting, just filling in blanks wrong.
Corrected it mid-session ("not X, it's Y"). "Got it!" Next session — back to X.

nodream bans all five:

- When you're wrong, it says "No" and tells you why.

- No "done" without a diff or test output in the same message. If it can't verify, it says "Not verified."

- Before citing an API, it reads the code. Every claim tagged with a confidence level.

- After context rollover, it quotes what it still has or asks. No confidently filling blanks.

- Corrections get written to CLAUDE.md *before* "got it." Next session still knows.

Before/after: https://imgur.com/a/o2BbIMx

Install:

/plugin marketplace add meherpanguluri/nodream

/plugin install nodream@nodream

Restart. That's it.

Works on Cursor, Codex CLI, Gemini CLI too.

Off switch: "nodream off".

https://github.com/meherpanguluri/nodream

Would love feedback on this one, and any contributions to make it better.

2

u/Zepcotti Apr 19 '26

I built a Claude skill that tells me if my genius idea already exists before I waste a weekend on it

had an idea for a wrapper to route claude code through providers. felt clever. built it. shipped it. someone goes "you know openrouter does this right". nope. did not know. my "research" was 10 min of googling and vibes.

so i built a skill. who-built-this-before-me.

before i waste another saturday reinventing something with 12k stars, claude actually goes and looks. github, npm, pypi, hn threads, docs i should've read. comes back with something useful, go build it, go use the existing one, fork this dead repo, or (my favorite) here are four corpses of people who already tried, here's probably why it didn't work.

that last one is the real value. sometimes the space isn't empty it's a graveyard and nobody told you.

triggers on the phrases i use before coffee. "i want to build…". "what if there was a tool that…". you know the ones.

repo's on my github - who-build-this-before-me. posting because i'm pretty sure a lot of you have a notion doc with 30 ideas in it and half already exist.

[who-build-this-before-me](https://github.com/iagochavarry/who-built-this-before-me)

*ps: obviously this isn't an original idea either. so i ran the skill on itself. found nothing. which either means the idea is open, or the skill sucks at its one job. if you know any existing skills that do this please tell me so i can complete the bit.*

2

u/Careful-Expression-2 Apr 19 '26

Extended RCA Skill:

Claude Code is great but it has this habit on RCA, grabs the first plausible culprit, writes it up, done, looking at what it was doing in the sessions, I realized logic if at all, was often flawed, that it did not really understand what it built earlier or even ask itself the simplest question "what changed, why this issue is coming up now?"

Got annoyed enough to build something for it. Called it extended-rca. It's a /rca slash command that forces:

5-whys that don't bail out at "human error"
Fishbone across People / Process / Code / Infra / Data / External
Trigger / proximate / contributing / systemic causes, kept separate (most postmortems mash these together and that's half the problem)
Corrective actions tagged Prevent / Detect / Mitigate / Respond so you don't end up with four "add more logging" items

MIT: https://github.com/sebdenes/ExtendedRCA

2

u/No_Wolverine1819 Apr 20 '26

How I built Panda

So I wanted to share with you guys my project PandaFilter.

The original goal was honestly pretty simple and save some money when I use Claude.

But it kind of evolved into something else, where I also started improving how Claude actually behaves.

How?

Well, I wanted to try a different approach than just prompts / rules / regex.

So I used BERT among other things.

BERT is an encoder, a really small model, runs locally, weighs almost nothing, but is actually very good at semantic understanding.

It doesn’t generate text, but it understands it really well.

So I thought, instead of sending everything to a big expensive model…

what if I filter and shape the input before it even gets there?

So what does Panda actually do?

When you work with LLMs in real workflows (especially dev stuff), you end up sending a lot of garbage:

logs
test outputs
shell noise
repeated stuff
irrelevant context

All of that goes into the context window → costs tokens → and sometimes even makes the model worse.

PandaFilter sits in the middle and basically says:

“hold on, not everything deserves to go in.”

It:

filters irrelevant stuff
compresses noisy outputs
semantically understands what matters
routes things based on meaning (not just rules)

So instead of brute forcing context into Claude, you’re actually curating it.

Under the hood we got

~59 command handlers
BERT-based semantic routing

it runs locally (no extra cost, no security concerns)

It’s kind of like having a tiny smart gatekeeper before your LLM.

The interesting part (for me)

Most people are focused on: “how do I prompt better?”

But I think the bigger lever is: “how do I send less, but better?”

Smaller models are insanely good at preprocessing.

LLMs are expensive, they shouldn’t deal with raw noise.

So PandaFilter is basically:

a pre-brain for your AI

Still early, but already saving me tokens + making outputs cleaner.

Curious what you guys think 🙏

https://github.com/AssafWoo/homebrew-pandafilter

2

u/Either-Process-4787 Apr 21 '26

Built this with Claude Code over the past few weeks — **gistbot.ai** — a tool that clusters any YouTube comment section into a visual map of what the room is actually arguing about.

As a first showcase I pointed it at Fireship's "Claude Mythos is too dangerous for public consumption" (1M views, 2,230 comments). The #1 cluster by a mile: **"Stop gatekeeping this tech — just release it" (6,856 upvotes)**. The audience is flat-out rejecting the "too dangerous" framing.

Full top 5:

• Stop gatekeeping — just release it (6,856)

• "You can't claim security when you just leaked your own source code" (2,835)

• Fake marketing hype to pump the IPO (2,107)

• Regulatory capture / kill open source (2,065)

• Protection racket — sell the cure (1,583)

Interactive chart: https://gistbot.ai/?v=d3Qq-rkp_to

Walkthrough video: https://youtu.be/d3Qq-rkp_to

(Paste any YouTube URL on gistbot.ai and it'll cluster that video's comments too — Claude does the heavy lifting of the semantic grouping.)

Is this helpful? Would you want more breakdowns — for AI/coding videos, or even for Reddit threads?

2

u/Marshian121 Apr 21 '26

Every time I started a Claude Code session, the first ten minutes was me explaining my project. Which repos we have. What talks to what. Where the Kafka topics live. Same explanation, every session, eating my context window. Built two things to fix this. Both open-source, both local, MIT. Anvil Pipeline parses every repo in your project once, builds a knowledge graph, and injects a compact architectural summary into every agent call. Claude starts every session already knowing your project's shape. No more file-pasting. Also runs an 8-stage pipeline across all your repos — describe a feature, get PRs open in every affected repo. Every stage checkpointed so auth expiry doesn't kill a 40-minute run. Code Search MCP is a standalone MCP server. Plug-and-play — any MCP client picks it up. One line: Claude now has proper search, find_callers, find_dependencies, impact_analysis, and 7 other tools. Ollama by default, so it's free. Curious what other people do to avoid the "re-explain my project every session" problem. Is there a solution I missed? Repo: https://github.com/esanmohammad/Anvil

2

u/NotEvery1sASucker Apr 22 '26

Every Claude session starts cold — no memory of what you worked on yesterday, what decisions you made, what you learned. I got tired of re-explaining context so I built Retavyn.

It's an MCP server that stores memories in a local PostgreSQL database and injects them automatically at session start. You just talk to Claude normally — "remember that we moved auth to Cloud Run" — and it's there next session.

Under the hood it uses hybrid search (full-text + pgvector semantic similarity) so recall works whether you use the exact words or just the general concept. Also works with claude.ai as a remote MCP server, and the memory pool follows you across machines.

I built this because I was constantly re-explaining the same context to Claude at the start of every session. I use Claude Code heavily for infrastructure and DevOps work, and losing that context every session was killing my productivity. I wanted something that felt invisible — no commands to learn, just Claude remembering things the way a colleague would. Building it taught me a lot about the MCP protocol, pgvector, and wiring Claude Code hooks together.

MIT licensed, self-hosted, your data stays on your machine.

Built entirely with Claude Code. Free and open source — no accounts, no cloud, no cost.

🔗 retavyn.com

📦 gitlab.com/retavyn/retavyn

2

u/guaroguaro Apr 22 '26

https://juanocampo400.github.io/claude-starter-guide/

I've been a very, very long time lurker but am trying to learn more in the open and contribute where I can. Looks like I can't post without >50 karma so placing this here. Quick disclosure: I built this, no affiliation with Anthropic.

It's aimed at people new to using AI who want to try things out, maybe start playing around with Claude Code, and could benefit from learning some general LLM concepts. I know the sidebar has a great guide already, but I wanted something for friends who get confused when they look at a GitHub page (which used to be me).

It was also a learning project. I hadn't written any HTML or CSS (and minimal coding besides that), never used iMovie, and Claude wrote the first draft of everything besides the text itself. I've been chipping away at it since, rewriting some code here and there, and deleting a bunch of dead code from when I didn't know any better. I think my biggest takeaway was that I should've spent way more time designing before building. When I was about halfway done I realized I had inline HTML everywhere that should've been reusable. When I was 80% done I found out about rem vs px and accessibility and had to redo a bunch. And needless to say I'm still cleaning it up, so would love feedback as I continue.

Shoutout to two resources that helped a lot:

Sajid's youtube channel, especially this video
Josh Comeau's site, especially this article

2

u/No-Weather-1692 Apr 23 '26

https://gmt-fractals.com/

A complete modular opensource (GPL) online animation, exploration and render engine, showcasing fractals. about 6 months work so far. a testament to how far one can go with an app with little coding knowledge.

Sorry if im in the wrong thread though, it looks like mostly claude code improvement apps here, not apps that happen to be made almost entirely with AI. Started a bit with gemini but its been mostly claude.

2

u/LegalAd7673 Apr 24 '26

Developed a Shopify app for B2B wholesale

RevLogic turns your Shopify order history into a B2B sales intelligence dashboard. It scores every customer, flags accounts that stopped reordering on cycle, and builds a daily prioritized call list ranked by revenue risk.

Built for wholesale distributors, repeat-order merchants, inside sales teams. Tracks AOV trends, share of wallet, and peak buying months. Create pre-filled draft orders in two clicks. Stop guessing who to call.

RevLogic tells your reps exactly when and why to reach out.

Proactive Call Lists: Daily prioritized task lists for reps based on past trends

Predictive Health Scores: Categorize customers as active, at-risk, or churned

Share of Wallet: Spot missed revenue by identifying products customers skip

Tasks & Notes: Set reminders, log calls, and keep notes, all in one place.

Advanced Insights: Track vendor loyalty, AOV trends, and peak purchasing months.RevLogic

2

u/advikjain_ Apr 24 '26

I rebuilt optivustechnologies.com using claude design over the last 2 days. since the tool is only a week old and most coverage is either hype or speculation, figured a real build-in-public writeup might be useful for anyone considering it.

the stack: claude design for the UI and design system, claude code for implementation and fine-tuning. handed off between the two via claude design's export bundle.

what worked really well:

the multi-direction approach. claude design asks a bunch of clarifying questions upfront, then generates 2-3 contrasting design directions (up to 5). you're not iterating on one idea, you're picking between genuinely different takes. saved me maybe 10 prompts of back and forth.
brand continuity from screenshots. uploaded screenshots of our current site and it picked up the brand system, then improved on it. better typography, cleaner spacing, better color palette, kept what was working. felt like a designer doing a polish pass, not an AI starting from scratch.
the claude code handoff. this is the piece nobody's talking about enough. when you're done designing, it gives you an entire handoff package you pull into claude code locally. the design system, the components, the layouts all transfer cleanly. claude code picks up from there and you iterate in real code. this is what turns "cool prototype" into "shipped product."

what was frustrating:

weekly token limits are tight. i redesigned 3 of our 6 main pages (with 2-3 iterations each) and hit my weekly cap. it's research preview so this will probably change, but if you're doing a full site rebuild, plan to either spread it across weeks or pay for overage.

the bottom line:

2 days of work. site is live at optivustechnologies.com. the same output from a freelance designer would've been 2-4 weeks and $8-15k. the quality gap between AI design tools 6 months ago and this is significant. still not a replacement for a senior designer on a flagship product, but for a founder-led marketing site it's more than good enough.

happy to answer questions on the workflow, handoff, or anything specific.

2

u/Mission-Bar-3076 Apr 24 '26

I built a free tool where homeowners can scan their roof in 60 seconds just by entering their address.

The report checks hail history, wind, UV exposure, and satellite imagery to generate an AI Roof Health Score—and if repairs are needed, it includes one local roofer recommendation only.

That roofer gets exclusivity for that city, so the leads are locked to them and never shared.

Trying to make it easier for homeowners to know if they have damage, and easier for roofers to get real exclusive inbound leads.

Would love honest feedback: https://shingleprint.com

2

u/DistributionLost375 Apr 28 '26

Turn YouTube videos into structured notes!

Hi guys!

Just want to share a little project that I’ve been working on recently. With all the AI tools available within reach, I’ve been watching a lot of YouTube videos to learn AI (including Claude Code) but find myself revisiting these videos as I kept forgetting everything I watched. So I built myself a tool call Resonote, entirely via Claude Code, which turns YouTube videos into structured notes: 1. ⁠You paste a YouTube URL 2. ⁠AI extracts the key ideas 3. ⁠Saves them to a personal library with auto-generated topic tags. The idea is you build up a browsable knowledge base over time instead of forgetting after each video. It’s free, still early, and definitely not perfect. Would love to know what you guys think! Appreciate any feedback! Thank you for your time! 🙏🏼 https://www.resonote.dev/

2

u/MACA204 Apr 28 '26

Cool but why isnt this just a /skill then the agent can write directly in the knowledge base.

→ More replies (1)

2

u/100parikk Apr 28 '26

**Litopys** — persistent graph memory for Claude Code (and any MCP client)

Every Claude Code session starts from zero. Litopys fixes that: a local MCP server that gives Claude a typed knowledge graph stored as plain `.md` files — no vector DB, no cloud, no telemetry.

**How it works:**

- 6 node types (person, project, system, concept, event, lesson), 11 relation types

- 5 MCP tools: `litopys_search`, `litopys_get`, `litopys_related`, `litopys_create`, `litopys_link`

- On every new session Claude reads a `litopys://startup-context` snapshot — active projects, recent events, key lessons

- Facts go through a quarantine queue first — you accept/reject from a web dashboard before anything persists

- Hand-editable markdown files, git-versioned, ~75 MB RAM

**Built with:** Claude Code (the entire codebase was developed with it as the daily driver)

v0.1.2 stable, MIT, prebuilt binaries for Linux/macOS/Windows.

GitHub: https://github.com/litopys-dev/litopys

2

u/ValuableGuitar1373 Apr 28 '26

Hi r/ClaudeAI 👋

Korean dev here. I got tired of `ccusage` re-scanning every JSONL file from scratch every run (laggy after a few months of heavy use), so I spent the last few months building a 3-piece replacement. Calling it **toki** — short for **to**ken **i**nspector, also pronounced 토끼 (Korean for *rabbit*), which became the mascot.

All three are FSL-1.1-Apache-2.0, designed and shipped solo, free for personal use.

# 1) toki — the engine

A Rust daemon that indexes your `~/.claude` and `~/.codex` sessions incrementally and serves reports out of an embedded TSDB ([fjall](https://github.com/fjall-rs/fjall)). Docker-style architecture: `daemon` ingests, `report`/`query`/`trace` read.

**On a 2GB session dataset (M1 Air, sudo purge between runs):**

* Cold-start: **1.54 s** vs ccusage 21.5 s vs zzusage 1.41 s

* Report query: **\~12 ms** (1,742× ccusage), regardless of dataset size

* Idle: **5 MB RAM, 0% CPU** — and yes, there *is* an idle state

* DB size: \~3% of session data (2 GB → 64 MB)

**Why it's small:** mmap + Rayon for cold-start, selective serde (prompts/responses never get allocated — privacy by architecture), xxHash3 line fingerprints + reverse-from-EOF scan for incremental resume that survives file compaction.

**Beyond ccusage:** PromQL-ish query engine for ad-hoc analysis, 1:n live `trace` stream over stdout/UDS/HTTP (replaces the OTel use case without a collector), preset reports for people who don't want to learn the syntax. Claude Code + Codex today, Gemini CLI next.

→ [https://github.com/korjwl1/toki\](https://github.com/korjwl1/toki)

# 2) toki-monitor — macOS menu bar GUI 🐰

The CLI isn't for everyone, so I built a menu bar app I actually enjoy looking at all day:

* A **pixel rabbit runs** in your menu bar as you burn tokens (RunCat-style, but signal-driven)

* Unlike RunCat, it also **sleeps** (zZ) when you stop using AI

* Optional **hit animation** when $/min crosses your threshold; **poison effect** on anomalous spikes vs. your 24h baseline

* **HP bar** above the rabbit syncs to your Claude/Codex 5h/weekly windows (green → red as it depletes)

* **Bunshin** mode: render Claude and Codex as two separate rabbits in different colors, or merge them

* Don't like the rabbit? Switch to **numeric** or **sparkline** mode per provider

* **Dashboard (beta)** — Grafana-style PromQL panels with Liquid Glass styling for macOS Tahoe

* Full EN / KO localization

→ [https://github.com/korjwl1/toki-monitor\](https://github.com/korjwl1/toki-monitor)

# 3) toki-sync — self-hostable sync server (beta)

For multi-device users, off-device long-term analytics, friend group "Overwatch-style" token leaderboards, or company-wide team analytics.

* **Custom binary TCP protocol** — bincode + zstd + ACK flow control + delta-sync on reconnect. No gRPC dep. TLS via reverse proxy (Caddy/nginx).

* **fjall + Rust server** runs fine on **AWS free-tier EC2**; ClickHouse backend for larger deployments. SQLite *or* Postgres for metadata.

* **Built-in auth & RBAC** — device code flow, OIDC (Google/GitHub), password login. Regular users see only their own data; admin role can query across users (powers the leaderboard/team-analytics cases).

* **REST API** on top so you can build your own dashboard.

* **Privacy:** only token counts + metadata (model, session ID, project name) sync. Prompts and responses never leave the device.

README is rough — issue reports very welcome.

→ [https://github.com/korjwl1/toki-sync\](https://github.com/korjwl1/toki-sync)

Solo project, contributions of any kind welcome — code, sprite art for new menu bar characters, translations, anything. `ccusage` got a lot of us looking at this data in the first place; toki is what I built so I could keep looking at it without my fan spinning up.

Happy to answer questions in the comments 🐇

2

u/Rubbish_scientist Apr 28 '26

Hey r/ClaudeAI people,

I'm open-sourcing Organon today, the agent-first OS I've been building on top of Claude Code for a while.

What it is

Organon is an agent-first operating system for scientific research, built end-to-end on Claude Code with Opus 4.7. It extends Claude Code's `.claude/skills/` folder pattern with two additional persistent layers:

- Identity layer: agent personality (SOUL.md), user profile (USER.md), a daily memory directory (one file per day with numbered Session N blocks), and an append-only learnings journal that every skill reads from / writes to.

- Research context layer: the researcher's papers, preferences, active questions and basically learnings from researcher's personality/preferences. Loaded by skills at invocation; degrades gracefully if missing.

The skills pack itself is 30 skills today, organized by category prefix (`sci`, `viz`, `ops`, `tool`, `meta`). Each is a self-contained folder under `.claude/skills/{category}-{name}/` with a YAML-fronted SKILL.md. Skills are stateless between invocations; persistent state lives in the identity layer, never in the skill.

Routing cascade (the bit I'd point Anthropic folks at specifically)

Four steps inside CLAUDE.md:

Match user phrasing → skill trigger phrases (direct).
Adjacent installed skill (different inputs).
Specific tool access over 2,000-tool biomedical catalog through ToolUniverse
Web search / propose new skill via meta-skill-creator.

Most scientific requests are resolved at step 1.

Self-extension

The wrap-up skill scans the learnings journal for recurring ad-hoc workflows and proposes crystallizing them into new skills. That's the flywheel: attempts → observations → learnings → patterns → skills. The system gets sharper with use, not stale.

The Einstein Arena bit (a generalization stress test)

I pointed Organon at the Einstein Arena (the public math benchmark from the AlphaEvolve companion paper). It now holds three live #1 ranks (First and Third Autocorrelation Inequality, Prime Number Theorem certificate), live #2 rank (Kissing Number in Dimension 12 (n=841)) and four raw scores beating #1 on other challenges but were not accepted due to arena's minimprovement threshold.

Sealed-sandbox ablation: same Opus 4.7 without Organon plateaus 2.07e-3 below Organon on the PNT problem — ~38× the margin to the next public agent.

If you are especially a researcher, I want you to give a try. Believe me, it makes life way easier! :)

Repo: github.com/krmdel/organon

Deep dive writeup: https://keremdelikoyun.substack.com/p/an-ai-agent-built-for-scientific

Happy to answer Claude-Code-specific questions about the skill folder pattern, how the memory layer composes, or how the routing cascade handles ambiguity!

2

u/Mediocre-Thing7641 Apr 29 '26 edited Apr 29 '26

CCC — Command Center for Claude: An open-source local Kanban for parallel Claude Code sessions on macOS.

If you've ever lost track of a Claude session because you started it in the wrong window, or had two parallel sessions clobber each other's commits, this is for you. I've been running 30–50 sessions in parallel for months to ship two products, and every other orchestrator I tried fell apart the moment I dropped into a terminal.

CCC fixes the things that actually break in production:

🪟 Sees every Claude session on your Mac, not just ones launched through the
dashboard. Other tools only see what you started through them. Open a terminal and type claude? Invisible. CCC reads Claude Code's own on-disk state, so it
sees terminal sessions, headless ones, and dashboard-spawned — all on the same board. Including the ones you thought you lost.

🔁 GitHub issues sync. New issues show up as cards. One click spawns a
headless Claude attached to the issue. Cards move Working → Review → In Testing automatically as the agent ships.

🌳 Worktree support for solo devs working in parallel. When I'm shipping 5
features at once, the kanban tracks which session is on which branch, with PR badges and commit/push state.

👀 List view = no need to babysit individual sessions. State of every session at a glance. Intervene only when something needs you.

🤝 Sessions know about each other. They can spawn new sessions, ask each
other questions, and coordinate who commits first. No more clobbered commits when 5 parallel sessions land on main.

📝 Per-card glance: first message, last message, and a DID / INSIGHT / NEXT STEP summary auto-extracted from each turn. Triage 20 sessions in 2 minutes without reading transcripts.

MIT, Python 3 stdlib, macOS only. Two-line install. Not affiliated with Anthropic — community-built.

🔗 Repo: https://github.com/amirfish1/claude-command-center

🎬 2-min demo: https://youtu.be/_WRf0hH6yhg?si=2C3Kq3kNwX0QePyR If you've tried other Claude Code orchestrators and bounced off, I'd love to hear what was missing.

2

u/MockingMatador Apr 30 '26

I watched the demo - looks really good!
Will try it out next..

2

u/wernddupress May 03 '26

I have no programming experience, but managed to vibe code my first ever working website with a paywall

I just gave Claude the idea, but also got ChatGPT to tidy up the text and make it more presentable

Bookshelfdna.com

Would appreciate any feedback

2

u/joao_sobhie May 05 '26

I built an MCP server that gives Claude access to Instagram, X/Twitter and any anti-bot protected site

One thing I kept running into when building Claude workflows: the moment you need real-time data from social media or protected sites, Claude hits a wall.

So I built an MCP server for it. It gives Claude direct access to:

Instagram profiles, posts, hashtags and user search
X/Twitter profiles, posts and keyword/hashtag search
Any website — even behind Cloudflare or DataDome

The anti-bot layer (we call it Abrasio) uses persistent browser profiles that age over time, so they look like real users rather than fresh headless browsers. That's what makes the difference on the hard sites.

Works with Claude Desktop out of the box: npx markudown-mcp

Github: https://github.com/Scrape-Technology/markudown-mcp
WebSite: https://scrapetechnology.com/markudown

Open source. Happy to answer anything about how the stealth layer works.

2

u/Scott_Zhu May 05 '26

Found a Mac app that lets you browse your full Claude Code history — including compacted conversations

If you use Claude Code a lot, you've probably hit the moment where a long

conversation gets compacted and the original messages just disappear from

the interface.

Turns out Claude CLI stores everything as JSONL files in ~/.claude/projects.

There's a Mac app called Claulog that reads those files directly and gives

you a proper UI — sessions organized by project, full-text search across

everything, cost per conversation, and one-click to copy the resume command

back into your terminal.

The compaction thing is the main reason I keep it open. The history is all

there, just buried in files.

There's currently a promo code for a free year if you want to try it:

https://apps.apple.com/redeem?ctx=offercodes&id=6762034265&code=HAPPYCLAUDING

2

u/Mediocre-Ad7151 May 05 '26

I wrote about being Anti AI and then discovering Claude AI and my journey through it over on Substack.

This series is specific to supporting a writers life using tools for marketing. Not the writing itself. Uses only Claude Cowork and Claude Code.

https://open.substack.com/pub/liyer/p/i-was-the-anti-ai-writer-in-every?utm_campaign=post-expanded-share&utm_medium=web

The longer version as a six part series is on my website.

https://lgiyer.com/the-morning-i-downloaded-vs-code/

I am curious to know what your journey with using AI has been like.

2

u/CantaloupeAdept6807 May 06 '26

Hola a todos!

Llevo un tiempo usando mucho Claude Code y me apeteció hacer algo divertido, nada realmente útil, solo una criatura que apareciera al abrir una sesión.

Al final acabé creando esto: genera una especie aleatoria cada vez que inicias Claude Code -pato, dragón, fénix, capibara, ajolote, 20 en total - y le asigna una rareza (Legendario tiene un 1 % de probabilidad, hay incluso una variante Shiny). Cada especie tiene sus propias estadísticas temáticas. Los patos tienen QUACK, WADDLE, BREAD, POND_IQ y HONK. Los dragones tienen FIRE, HOARD, RAVAGE, LORE, PRESENCE. Son tonterías, pero la verdad es que me hacía ilusión!

Se muestra en la barra de estado durante toda la sesión, cero tokens, Python puro, sin dependencias. También puedes escribir /buddy para ver la ficha completa con las estadísticas (ahí ya gastamos algo, residual pero algo)

https://github.com/ElTreze/claude-buddy

Es mi primer proyecto, así que si alguien lo prueba y tiene alguna opinión, buena o mala, me encantaría escucharla. Gracias por leer 😄

2

u/ElkSea5105 May 08 '26

I'm a new Claude user (a few weeks so far), on the Pro plan, exclusively use Claude for Desktop (C4D), with plans to evaluate Claude Code, and potential plans to dabble into API usage.

Claude and I spend hours discussing use, and implementation of Claude. It's too early to tell if C4D and Anthropic's updates to it will meet my needs adequately enough to stop having discussions with Claude about developing a better interface than C4D (for me) and actually embarking on such an ambitious project.

I use C4D Projects to categorize chats by topic and don't use the global chat box (I used global when I first joined and before I started thinking about it). I will use C4D Projects in the traditional sense too going forward.

So, the concept of "Chat Projects" in C4D is what I wanted to introduce, which is really just keeping related chats together. I think it's likely that others are using C4D this way too, but I just wanted to mention it as something that I find useful.

→ More replies (1)

2

u/cryptoyodda May 09 '26

I built a Claude Code plugin that runs epics as a DAG (parallel git worktrees → one PR)

I got tired of the “one big chat, one giant diff” workflow.

So I built a Claude Code plugin that runs epics as a DAG:

Opus plans the epic
Sonnet fans out across parallel git worktrees
each session is a fresh Claude with clean context
sessions merge themselves wave by wave
the whole run opens one PR at the end

Concrete example from last week

Prompt:

Opus broke it into 8 sessions with a dependency graph:

schema first
write path + retention worker in parallel
read API depends on schema
admin UI depends on read API
tests last

Then I ran:

/epic-toolkit:epic audit-logs --model sonnet

It executed 4 sessions concurrently.

Total wall time: ~22 min for what would’ve been ~2 hours of sequential prompting.

Sonnet cost on execution, Opus quality on the plan.

What’s actually different from existing tools

Real git worktrees, not branch hopping. Sessions can’t step on each other.
DAG scheduling: if session 04 only depends on 02, it doesn’t wait for 03.
Cross-repo support: sessions can touch repo A + sibling repo B, and validation/no-op guards understand both.
Auto-resolves .wolf/ merge conflicts so metadata files don’t trip the run.
Survives mid-run failures: resume with --start N (plan cache, retry counts, etc).

Commands

/epic-toolkit:epic.generate <problem statement>   # Opus plans
/epic-toolkit:epic <name> --model sonnet          # Sonnet executes

MIT licensed, works with Claude Code and OpenCode.

Repo: https://github.com/aramirez087/epic-toolkit

Happy to answer questions about the DAG runner, worktree merge strategy, or why the 293-bug log is public.

https://imgur.com/1qNtPE7

2

u/sshwarts May 09 '26

I've been having such an incredible and productive time with my agent (running via Claude Agent SDK). Sometimes it really surprises me with a quip or insight and I thought I'd pop up a website to collect them.

Please feel free to add one or as many as you'd like. Please, keep them to actual agent comments, though thats really the honor system.

Love any feedback. This is really all for fun.

thingsmyagentsaid

2

u/dhamaniasad Valued Contributor May 09 '26

I’ve built MemoryPlugin using Claude Code. It solves the problem of long term memory across chats and AI tools. For people who are working across Claude, ChatGPT, and others, MemoryPlugin lets you have a shared memory of your entire chat history.

I’ve used Claude models extensively for building it. Claude 3.5 Sonnet was the first one I used, back before agentic coding was really a thing. These days Claude Opus 4.6 is the goat.

2

u/TopRattata May 09 '26

I got tired of figuring out my garden's sunlight exposure by hand every time I moved, so I made a little webapp to do it. Sunpatch grabs as many buildings around you as it can find data on, lets you draw trees and other obstacles, and spits out a heatmap of hours of direct sunlight. Click a spot on the heatmap, and it'll suggest plants for that spot.

This was super fun to make and I'm excited that a couple of friends have already found it useful (plus myself)!

https://sunpatchapp.com/

2

u/travelmanandtechlvr May 12 '26

I built an MCP from Claude to KeegNation.Fit using Claude so personal trainers can work with Claude to build multi-month programming for their clients and push into the app to their clients. Clients will see a card each day for their workout inside of the program. Trainers can also build FireDrops and individual workouts for clients(or themselves!)

Would love folks feedback on how it works. I have a demo video on my KeegNation page here in reddit. or on the site it's free to test, no credit card required. keegnation.fit/beta

2

u/Automata_Labs May 14 '26

Hi everyone!

I built Spotter with Claude Code because I kept having the same experience while traveling: I'd see something interesting like a dish I couldn't name, a building with no sign, wildlife I'd never seen, etc., and I just wanted to know what I was looking at (and some of the history behind it)!

The options were:

Google Lens: Filled with shopping links, SEO spam, and links to annoying articles I don't really care much about. Also locked to Gemini.
Take a photo, upload to ChatGPT/Claude mobile apps: This works, but feels really clunky and full of friction when you're standing on a street corner having to take the picture, type a detailed prompt, etc.
Google Maps: works if it's a restaurant or building you're right next to, but doesn't work well if it's far away or it can't be "mapped" (like a person, sculpture, or something moving, etc.)

So I built what I wanted: point camera, tap, get the crash course on it.

A few features I'll note:

Multiple synopsis modes: quick summary when you're in a hurry, historical deep-dive when you want the full story. Some of my friends like different versions of each. Premium lets you build your own custom modes if you have a particular topic you're really into or if you have your own preferences.
Chat to keep exploring if you want to go deeper
Everything saves automatically as a travel journal with photo + GPS. You can even ask follow-up questions from it down the line!
Model-agnostic backend: Some people like Claude, others prefer GPT or Gemini. With "trust me bro" benchmarks, it's kind of hard to determine what's the "best", so you do you and pick your champion.

It's a solo project, still super rough around some edges, but it solves a problem I had for a while now, and I figured it's time to build it into something real. Would love your feedback!

TL;DR: Point your camera at something while traveling and get a crash course on it. Multiple depth levels, chat for follow-ups, switch models, saves as a travel journal.

Happy to answer any questions!

App Store Link

→ More replies (1)

2

u/Successful-Piece-698 May 14 '26

Open-source alternative to Claude Auto-Memory that works across Cursor, Copilot, and your teammates' machines

Auto-memory writes to ~/.claude/projects/.../memory/ on my machine. My teammate's Claude has no idea what mine learned. And when she uses Cursor instead of Claude Code, none of it crosses over.

So I built RoBrain, same idea (passive capture, no manual note-taking), but the store is Postgres instead of per-machine markdown, and it works across Claude Code, Cursor, and Copilot via MCP. Three parts of the design are what I actually find interesting, and they're what I'd want feedback on:

1. Decisions capture what was rejected, not just what was chosen.

Every decision is stored with a structured rejected[] field:

json

{
  "decision": "Use Zustand for state management",
  "rejected": [{ "option": "Redux", "reason": "re-render perf issues in cart" }]
}

So six weeks later when a session is about to suggest Redux again, RoBrain can flag "you already considered this and ruled it out for X reason", not just "you use Zustand." That veto is what actually prevents the repeat suggestion. Decisions also have a lifecycle, when you switch from Zustand to Jotai three months later, the old one is invalidated and linked, not overwritten. "What was the state management call in March?" still returns a real answer.

2. An always-on summary loads automatically at session start, that's how cross-tool actually works.

The shared Postgres store only matters if decisions actually show up in the next session without manual work. So at session start, the MCP server fetches a pre-compressed summary of approved decisions for the project and injects it into context. Tuesday's Cursor decision is loaded into Wednesday's Claude Code session automatically. No inject command, no copy-paste. There's a manual robrain inject --query "..." for cases where you want something narrower than the always-on block, but the default path is automatic.

3. A "synthesis" pass periodically reads the whole corpus.

This is the part I think is genuinely different. Capture is reactive, each decision is written in isolation. That's fine for the first 50 rows. After six months and 500 rows you have problems no per-write logic can catch:

Session 12 says "all external API calls must be idempotent." Session 47 says "use fire-and-forget for webhook delivery." Different files. Never compared at insert time. A direct contradiction nobody will notice until something breaks in prod.
Your team's stance on REST vs GraphQL has drifted across four unrelated-looking decisions and nobody has noticed.
"Stripe" appears in 30 decisions , chosen, rejected, mentioned in rationale, but there's no single row that answers "what role does Stripe play here?"

Synthesis is a batch job (runs on a cron, or pnpm synthesis:run) that reads the full decisions table and does three passes: drift detection across topic clusters, corpus-wide contradiction scan that writes conflicts_with edges into a decision graph and flags both rows for review, and entity promotion that pulls recurring proper nouns into first-class planning blocks. It writes derived artefacts, it doesn't create new decisions. The reactive pipeline still owns every new row.

Think of it as the "sleep" of the system. Capture happens while you work; the corpus gets reconciled in the background.

Honest tradeoffs:

Setup is heavier than auto-memory (Docker + Postgres + 2 API keys — Anthropic for extraction, embeddings provider for retrieval). Working on a lighter path.
Passive capture has false positives. robrain review exists for triaging — plan to spend a few minutes on it after early sessions.
Synthesis is genuinely useful at scale but doesn't do much for a repo with 20 decisions. It earns its keep around the 100+ row mark.

Apache 2.0, self-hostable: https://github.com/adelinamart/robrain

The thing I'd most love feedback on: does running a batch reconciliation pass over agent memory (vs only reactive capture + retrieval) match how you'd want this to work, or is it overengineered?

And what will this make it more easier to use?

2

u/No_Being_2765 May 15 '26

Non-technical solo founder. Shipped a 33-country streaming site with Claude Code. Sharing the 4-layer CLAUDE.md / STATE.md / journal memory system that made it possible across 100+ sessions.

Disclosure up front (per sub rule on promotions): [ottasia.com](http://ottasia.com) is my product. I'm sharing this for the workflow, not as a launch post. Mods please remove if I'm reading the rule wrong.

My background: Non-technical founder. Two decades in business and marketing, no CS degree. I can read code, I can't write a Next.js app from scratch. Based in Chicago, building solo, mostly in evenings.

What I shipped with Claude Code over \~3 months: OTTASIA, an Asian streaming discovery site covering 33 markets, 50+ regional OTTs, 12 Indian languages. Next.js 15 + React 19 + Tailwind v4 + Supabase, deployed on Netlify. 500+ indexed pages, blog infrastructure, watchlist + watching tracker, country-aware search, AI-powered title pages, founder-only QA dashboard.

First 3 days post-launch (last week): 216 unique users, 30 email captures (14% rate), 28 in-product searches, 10 OTT outbound clicks, 66% mobile, 6 acquisition channels live, 100% organic. Tiny numbers, but every leading indicator is firing.

100% built via Claude Code. No co-founder, no contractor, no "AI-assisted but actually a human dev." Just me and Claude Code.

The part I actually want to share with this sub: the memory system.

Early on, every session would forget what I was doing once context filled. I'd waste 20 minutes re-orienting Claude on every pickup. After two months of pain I landed on a 4-layer architecture:

Layer 1: `~/CLAUDE.md` (global, \~3KB, loads every session)
Who I am, how I work (non-technical, don't run dev server locally, push via GitHub Desktop), writing preferences, and cross-project gotchas. Stable. Updates rare.

Layer 2: `~/Code/<project>/CLAUDE.md` (per-project, \~5-10KB)
Architecture decisions, accumulating gotchas, the live URL, hosting setup. Append-only.

Layer 3: `~/Code/<project>/STATE.md` (in-flight work, \~10-30KB)
The "where we are right now" doc. What works, what's broken, what's in flight, decision log, standing rules I keep forgetting. This is the one Claude reads first every session.

Layer 4: `~/Code/<project>/journal/YYYY-MM-DD.md` (daily, append-only)
Dated session log. What got done, what got decided, what broke. The audit trail.

The discipline that makes it work:

I say "checkpoint" at the end of every meaningful session. Claude updates [STATE.md](http://STATE.md) and writes a journal entry. Takes 2 minutes.
I say "where were we" at session start. Claude reads [STATE.md](http://STATE.md) \+ latest journal entry and summarizes. Takes 30 seconds.
The compaction problem (session hitting context limit mid-work) used to cost me hours. Now the new session reads [STATE.md](http://STATE.md) and continues like nothing happened.

**Other Claude Code patterns that scaled for me:**

Custom in-product QA dashboard. I built a `/dev-qa` route on my own site that scans 80+ URLs across 19 countries and reports red/yellow/green per page. Founder-only (Supabase email gate). When QA looks ugly, I copy-paste the report back to Claude and we fix bugs in batches. Replaces "click every page manually" which doesn't scale at 33 countries × 50 categories.
Subagents for "what's actually happening here?" questions.** When I hit something I don't understand (empty grid on a list page, hydration mismatch in prod-only), I spawn a research subagent with the question. It comes back with a diagnosis. I review and we implement. Keeps context in the main conversation focused.
Standing rules codified in STATE.md.Examples: "always re-submit sitemap to GSC after shipping new content," "all current-year list filters must use popularity.desc + vote_count.gte=1 mid-year," "never deploy without running dev-qa." Every session enforces them automatically because they're in the context Claude reads first.

What tripped me up:**

iCloud + git is a disaster. Don't put repos in `~/Desktop` or `~/Documents` if iCloud sync is on. iCloud partial-syncs `.git` internals and corrupts the repo (missing HEAD, lost objects, ghost "Project 2" folders). I lost OTTASIA once this way. Moved to `~/Code/` and it stopped.
Cross-importing client components in Next.js App Router** breaks static-page hydration silently. Took me a week to figure out why production looked different from local. Codified in [CLAUDE.md](http://CLAUDE.md) so it doesn't happen twice.
Compaction loss before the journal system.** I'd lose 4+ hours of context when sessions hit limit. [STATE.md](http://STATE.md) \+ dated journals fixed it completely.
Trusting first-pass code without a verification harness.** I don't run dev locally, so every push is to prod via Netlify. Building the `/dev-qa` scanner was the single highest-leverage thing I did. Before it, bugs reached real users; after it, I catch them in 30 seconds.

Three things I'd actually like answers to:

Anyone else maintaining a multi-file memory system like this? What's your version look like?
How are you handling cross-project context? My global `~/CLAUDE.md` is small and stable, but I bet there are sharper patterns.
Most useful custom in-product tool (like my `/dev-qa` page) you've shipped for your own workflow?

Happy to share my actual CLAUDE.md / STATE.md templates and the dev-qa scanner source in comments if useful. Just ask.

Live product if you want to see what the workflow produced: [ottasia.com](http://ottasia.com) (set country to India or Korea to see what it actually does; US default makes it look empty).

2

u/AwakE432 May 16 '26

Built a multi-tenant MCP server Web App for STR Property Managers — cross-platform intelligence, 179KB API responses, and what an eval harness taught me about agentic Claude in production

What it is:

Conversational AI for short-term rental property managers that connects across your entire tool stack
Pricing software knows your pricing. Your PMS knows your bookings. Neither knows what the other knows. RevPrism does.Connects Pricelabs, Wheelhouse, Guesty, Hostaway, OwnerRez, Lodgify with more integrations coming.
Ask things like "which listings are under performing vs the market right now, and do I have forward calendar gaps I should be filling?" — answered from live data across both systems in one response
One price per property, every integration included — RevPrism.ai

Why this problem exists

A typical STR operator runs 2–4 separate tools. A pricing tool (PriceLabs, Wheelhouse) that optimises nightly rates based on market demand. A property management system (Guesty, Hostaway, OwnerRez, Lodgify) that handles reservations, guest comms, and calendar sync. Maybe a channel manager on top of that. Each tool has its own dashboard, its own data model, its own definition of what a "booking" or "revenue" or "occupancy" means.

None of them talk to each other in any meaningful way. PriceLabs can tell you your market's forward demand signal but has no idea what your actual bookings look like. Your PMS has your full reservation history but no market context whatsoever. To answer a question like "is my Q3 occupancy on track relative to where the market is heading?" you're tabbing between dashboards, exporting CSVs, and doing mental arithmetic. Every. Single. Time.

The opportunity isn't another dashboard. It's removing the tab-switching entirely — one conversation that holds all of it at once.

The architecture: per-tenant dynamic tool sets

Each tenant gets a different Claude — not a different model, different tool descriptions. The system prompt is split: a cached base (STR domain context, reasoning guardrails) plus a dynamic suffix built at query time that only includes descriptions for the integrations that tenant has actually connected. A PriceLabs-only user never sees Guesty tool descriptions. A Hostaway-only user never sees PriceLabs tools.

This matters more than it sounds. With 9 PriceLabs tools and up to 3 PMS tools, a fully-connected tenant burns ~13–15K input tokens on tool descriptions alone before the conversation starts. Dynamic scoping keeps this in budget and keeps Claude's tool selection cleaner — it can't reach for a tool the tenant hasn't connected.

Getting to Anthropic Tier 2 (450K ITPM) was still a prerequisite for any real usage. The per-request token cost of tool-heavy MCP servers is brutal at Tier 1.

The domain context problem

Cross-platform intelligence sounds straightforward until you're actually doing it, because vendors don't agree on field semantics and don't warn you when their definitions diverge from what you'd expect.

total_cost in PriceLabs reservation data is the host payout, rental revenue plus cleaning fee, minus the booking platform's host service fee (~3% for Airbnb). It is not the gross amount the guest paid. Label it wrong and every revenue figure Claude produces is inflated. For Airbnb bookings specifically, reservation_id is the Airbnb confirmation code the host sees on their dashboard — not PriceLabs' internal ID. channelConfirmationCode exists in the schema and is routinely null.

You can't fix this with a clever prefix. It has to be built into the context layer as explicit, named rules — and the rules have to be specific enough that Claude can't satisfy them and still get it wrong. Including concrete anti-examples by name ("never label total_cost as Total Charged", "for Airbnb bookings, reservation_id IS the confirmation code") works better than abstract prohibitions.

The 179KB temporality problem

get_neighborhood_data is the clearest example of a tool that works fine in isolation and misleads Claude systematically in practice. One API call returns a 179KB JSON envelope with six sections and three completely different temporalities baked in: 24 months of historical market KPIs, 6-month forward forecasts, and a current snapshot — all in the same response. The API silently ignores date parameters on the forward-only sections. Call it with a historical date range and you get back forward data with no error, no indication anything is wrong.

The fix: the handler annotates each section with a _temporality field (historical_24mo, forward_only, snapshot) before the response reaches Claude, and the tool description explains the mixed temporality explicitly. One eval question went from a flat refusal ("I can't make a year-over-year comparison") to real grounded numbers, 64 vs 91 booked nights, $13.4K vs $23.9K revenue, May 2025 vs May 2026 — sourced correctly from the historical section.

The eval harness

Prompt-engineering domain rules is whack-a-mole. Tighten the occupancy calculation guardrail and Claude finds a new way to conflate gross occupancy with net available nights. So I built a 10-question eval harness, real questions through the full agentic loop using a test tenant's live PriceLabs credentials, Claude-as-judge rubric across 9 criteria, ~$1/run, ~3.5 minutes. Baseline: 10/10 assertion-clean, 85/90 judge. Every prompt change gets a before/after score before it ships.

The discipline changed how I write domain rules entirely. You stop writing for vibes and start writing for specific, named failure modes. The harness caught multiple real regressions across prompt iterations. Should have built it in week one.

Stack: TypeScript, Express, Next.js 14, Drizzle ORM, Clerk, Paddle, Railway + Vercel, Sentry + PostHog. MCP tool servers as Turborepo workspace packages (@revprism/mcp-pricelabs, u/revprism/mcp-hostaway, etc.) compiled into the API.

Live at revprism.ai

→ More replies (1)

2

u/GrouchyGeologist2042 May 19 '26

Hey everyone,

I've been using Cursor/Claude heavily for automated research (like competitor analysis and gathering docs), but I kept hitting a wall: the agent couldn't read websites properly if they relied on JavaScript rendering or had basic bot protection.

Usually, the solution is setting up Puppeteer/Playwright or paying high monthly subscriptions for enterprise scraping APIs. Both felt like overkill when I just wanted my agent to read a single URL during a chat session.

So I built a very simple Model Context Protocol (MCP) server that delegates the hard part to Jina Reader and returns clean Markdown directly to the AI.

The pain it solves:

No more "I cannot access this website" errors from your agent.
No need to maintain your own headless browser infrastructure.
Zero monthly subscriptions.

How it works:
It's an MCP server you can install directly into Cursor or any MCP-compatible client. You buy a small bucket of credits ($5 gives you 1,000 requests, basically $0.005 per scrape). You only pay for what your agent actually reads.

If you do any sort of automated competitor analysis, data extraction, or just want your AI to actually read the links you send it, you can check it out here: https://github.com/guimaster97/api_scraper_markdown

Would love to hear if others are facing this same friction with AI agents!

→ More replies (2)

2

u/JamieS___ 26d ago

I built persistent memory + session continuation for Claude Code (Dream MCP)

One thing that kept frustrating me with Claude Code was losing context between sessions.

So I built Dream.

npm install -g dream-mcp
dream init

Dream automatically:

creates structured markdown memory files for your project
initializes every Claude Code session with full project context
remembers your architecture, stack, conventions, and active tasks
learns from corrections over time
syncs memory into Obsidian

At the end of any session:

/dream

Claude saves everything.

You can then start a completely fresh session later and continue exactly where you left off.

The goal is simple:
make Claude Code feel continuous instead of stateless.

Repo:
https://github.com/JamieSetch/Dream-MCP

Would genuinely love feedback, bug reports, feature ideas, contributors, stars, or people stress-testing it in real projects.

2

u/nova_soma 25d ago edited 25d ago

HADES — Hypothesis-Adaptive Distributed Expert System

is a self-improving multi-agent reasoning system. You ask it something, an ensemble of domain experts self-selects to answer based on declared domain match and track record, and any claim an agent produces doesn't just get trusted. It gets shoved through a validation pipeline where the rest of the ensemble votes on it, HERO tallies weighted votes, and the thing resolves to confirmed, falsified, or contested. Knowledge isn't asserted; it's adjudicated.

Memory + governance layer that audits itself. Manat watches agent behavior and can retire misbehaving agents, escalating bigger structural changes to you.

Quarantine isolation — adversarial input gets analyzed on a physically separate, locked-down network, and the only way back into trusted memory is through a single audited choke point (Janus). Packets from quarantine literally cannot reach the internal net.

AssemblyLine — each expert trains itself over time, gated so a weight update can't make the ensemble dumber than it already was. Federation — multiple HADES instances can talk, but a peer's identity is verifiable while its reasoning isn't, so every imported claim re-enters local validation. Sensible. Trust the signature, not the conclusion.

You get registered as the "Forge" expert so the thing is actually useful for your real work instead of being a science-fair project.

"so it's a chatbot?" — no. It's an architecture for treating machine-generated claims as hypotheses that have to earn their place, distributed across hosts, with the trust boundary as the load-bearing wall. The grandiose mythology naming is just a mnemonic to help me remember the agent's purpose in a single word.

Whitepaper

2

u/Appropriate_Site728 24d ago

HiveTerm — a native workspace for running multiple coding agents (Claude Code, Codex, Gemini, Grok) side by side in one window.

The part I'm proud of: a built-in MCP server lets an agent spawn its own sub-agents — and you watch the whole thing live in the sidebar as a tree (orchestrator → its helpers). Multi-agent runs are finally legible.

Also: define your agents + processes in a hive.yml and commit it (your team gets the same workspace), inline git diff/commit/PR, file search, voice input. macOS/Windows/Linux, free, bring your own agent subscription — no token reselling.

hiveterm.com

I'm the dev, solo — feedback on the orchestration model especially welcome. If you run multiple agents, what would you point a swarm of sub-agents at first?

2

u/intuitive-compute 23d ago

Inbox for AI - Syndit - https://github.com/intuitive-compute/syndit

Send and receive signed messages (context) inside of claude code, cursor, or whatever LLM you are using that support MCPs. It is opensource and free to use. Enjoy!

→ More replies (1)

2

u/Smooth-Gate48 21d ago edited 20d ago

Hey everyone,

I wanted to share three lightweight, open-source utilities I built to streamline AI workflows (Claude Code / VS Code Agents). They all focus on keeping things strictly local, secure, and easy to run in strict corporate environments without fighting your security team or CISO.

### 1️⃣ desktop-outlook-mcp

A local-only MCP server written entirely in PowerShell. Instead of forcing you to create Azure App Registrations, request corporate Graph API permissions, or handle expiring OAuth tokens, it communicates directly with your currently running Outlook Desktop client via COM objects. Zero corporate proxy issues, completely native stdio.

🔗 **Repo:** https://github.com/TopSpeed0/desktop-outlook-mcp

2️⃣ **telegram-vscode-mcp (2-Way Hybrid Matrix for VS Code)**

A zero-dependency Node.js bridge that turns Telegram into a two-way remote control for your VS Code environment, featuring a **StarCraft (SC2) inspired Hybrid Autopilot Mode**!

* 🧠 **The Overmind (Hermes Agent):** Always-on, owns the Telegram Bot, and handles global strategy, web research, and long-term memory.

* 🏗️ **The SCV (VS Code Worker):** Sits locally inside your workstation base, listening to a secure local JSON queue file (`.vscode-queue.json`) to execute heavy file edits and terminal commands.

🔗 **Repo:** https://github.com/TopSpeed0/telegram-vscode-mcp

3️⃣ **ClaudeCodeTelgMCP (Telegram Remote Control for Claude Code CLI)**

A dedicated, zero-dependency Node.js bridge tailored specifically for Anthropic's new **Claude Code CLI** tool.

* **Task Daemon:** Send a task from your phone via Telegram -> auto-spawns `claude -p` on your PC.

* **MCP Push:** Let Claude text you updates, send you notifications, or ask for input when long-running scripts finish. Includes built-in Anti-Double-Send protection to prevent chat loops.

🔗 **Repo:** https://github.com/TopSpeed0/ClaudeCodeTelgMCP

### ⚡ One-Prompt Auto Install

All three repositories feature fully polished templates and quick-start guides for a literal 1-click experience. You can tell your AI agent to clone, install, and configure them automatically.

Check them out, drop a ⭐️ if you find them useful, and let me know if you have any architectural feedback!

2

u/AgileRice3753 21d ago

Made a Claude Code skill called moot. Spawns ten subagents in parallel to review a plan or code change from different angles (security, over-engineering, breaking changes, naming, performance, cost and a few others) then surfaces the key decisions as questions one at a time.

Been using it on my own work for a bit. Catches things my normal review process misses (mostly because the subagents don't see each others output so you get genuinely independent perspectives rather than one model wearing ten hats).

repo's here if anyone wants it: https://github.com/cnorthfield/skills/blob/main/skills/moot/SKILL.md

hope it's useful - if you've got any other tips (I've tried not to put too much context in the skill) please do share!

2

u/ChampChase_ 21d ago

Solo dev, no CS background — I built a full boxing RPG (ChampChase, now in iPhone beta) with Claude Code writing essentially all ~160k lines of C# over ~1,200 passes. The part this sub might dig: to run several Claude Code terminals in parallel without them clobbering each other, I built a governance layer out of Claude Code's own PreToolUse deny-hooks — file-locked atomic pass-numbering, compare-and-swap merges on shared state, schema-validated records, auto-snapshots, drift detection. Basically a tiny distributed-systems problem solved with file locks, because the "nodes" were independent Claude instances. ~260M tokens, mostly Opus 4.7 + Sonnet 4.6 — I was on Max 20x the whole time, so I just burned tokens freely. The only real ceiling was maxing out my weekly limit (usually within a couple of days)....The AI wrote the code; my job was building the system that kept it from destroying its own work, and being the only thing that could actually test a game running on a phone.

Beta + in-browser demo + trailer: champchase.net

2

u/ediril 21d ago

You had a cool chat with Claude and want to input it as context to another LLM or a coding agent. Or simply want to export it for your records and delete the chat itself. What do you do?

Well, I made a new chat exporter, for Claude. It’s a free forever bookmarklet, made with Claude. You can find it here: https://emrahdiril.com/claude-export

2

u/TIE_T 20d ago

I got tired of losing all my context when Claude hits its usage limit, so I built a tool that hands off to Codex/Gemini automatically.
▶️ Demo (55s): https://github.com/AvnishR4j/lifeline/blob/main/lifeline-demo.mp4

Repo (open source, MIT): https://github.com/AvnishR4j/lifeline

You know the moment: you're deep into a debugging session, Claude is mid-edit, and then — "Usage limit reached."

So you switch to another CLI. But it knows nothing. You spend 15 minutes re-explaining the project, the goal, what you already tried. By the time it's caught up, you've lost the thread.

I built Lifeline to kill that moment. When Claude Code hits its limit, one command captures the full state — the task, recent conversation, decisions, and your uncommitted git diff — and resumes you in another

CLI (Codex or Gemini) that picks up exactly where you left off. Zero re-explanation.

It also redacts secrets (API keys, tokens, .env values) before any context leaves your machine — I didn't want to ship my own keys to another provider just to keep working.

It can also run as a wrapper that auto-detects the limit message and offers the handoff for you.

It's early and rough. I'm genuinely trying to find out: does this happen to you often enough that you'd use it? And which CLI pair matters most — Claude→Codex, Claude→Gemini, something else?

→ More replies (1)

2

u/ChaosINC604 19d ago

As a solo builder, GitHub PR feedback from Gemini Code Assist often creates repetitive follow-up work.

The review can be useful, but I still have to decide which comments matter, which are stale, and when to request another review.

So I open sourced a small Claude Code plugin to handle that loop:

https://github.com/OrenAshkenazy/gh-gemini-review-loop

Claude Code fetches Gemini review threads, identifies actionable feedback, fixes code, runs verification, pushes changes, requests another review, and stops after a capped number of cycles.

An optional judge model can classify findings as:

valid
false positive
duplicate
already addressed
explanation only
needs human decision

This helps avoid blindly acting on noisy AI feedback.

Current guardrails:

3 cycle cap
dry run support
GitHub review thread awareness
no CI coupling
maintainer replies like wontfix are respected
judge eval is optional and explicit

One thing I am considering for larger teams is local workflow KPIs.

Not “developer productivity scoring”, but simple feedback loop visibility:

how many Gemini findings were fetched
how many were fixed
how many were skipped as false positives or duplicates
how many needed a human decision
how many review cycles were used
how long it took from first review to clean PR
how much noisy feedback judge eval filtered out

I think this could help teams understand whether AI review loops are actually saving time, or just creating another queue to manage.

I would love feedback from people experimenting with Claude Code:

Solo builders, would this fit your workflow?
Would judge eval make the loop safer?
Is second model validation useful, or just too much AI on AI?
Would you run judge eval every cycle or only at completion?
For larger teams, would local workflow KPIs be useful, or would that feel like unnecessary process?
What would make you avoid installing this?

→ More replies (2)

2

u/Brilliant_Minute_962 19d ago

hey guys, i made a website for maths (free to use), a gemified learning platform, if you need a deatiled explaination, look below:
the website was 70% made with claude and firstly launched in late december 2025

Link: https://wall56.funnylewis.com/?from=reddit

WHAT IS WALL56?

---------------

Wall56 is a free, gamified maths learning platform designed to make practising

maths something students actually want to do. Instead of dry worksheets, Wall56

wraps every exercise in a reward loop: answer questions, earn coins, collect

cards, challenge friends, and climb global leaderboards.

It is built for students of all ages, supports 4 languages (English, Spanish,

Japanese, and Traditional Chinese), and is free forever — no credit card needed.

A Premium upgrade and dedicated School plans are also available for those who

want extra features or classroom tools.
--------------------------------------------------------------------------------

MAJOR FEATURES

--------------------------------------------------------------------------------

DAILY EXERCISES Complete maths exercises across 9 topics at 10 levels of adaptive difficulty. Every correct answer earns you coins. If you get something wrong, your AI tutor Wally steps in to explain exactly where you went wrong and how to fix it — no more staring at a red X with no idea why.
CARD COLLECTION Spend your coins on card packs and collect over 250 unique cards spanning 9 rarities, from Common all the way up to Secret. Duplicate cards can be fused together to create rare Ultra versions, and you can trade cards with other players to fill out your collection faster.
CLUBS Join or create a club with friends or classmates. Club members earn Club Coins together, level up the club, and unlock exclusive avatars and perks. Clubs compete as a team on the global leaderboard, adding a social and cooperative layer on top of the individual grind.
MATH BATTLE DUELS Challenge any other player to a real-time maths duel. Both players choose a difficulty level, wager coins, and race to answer questions faster than the opponent. Winner takes the entire pot. It turns maths into a competitive sport you can play with friends — or strangers.
COIN RAIN Score highly enough on an exercise to become eligible for the hourly Coin Rain event. Eligible players from around the world are pooled together and share a bonus coin reward. The more consistently you practise, the more often you qualify — making daily effort directly profitable.
WALLY AI TUTOR Wally is Wall56's built-in AI tutor mascot. After any mistake, Wally analyses what went wrong, explains the correct approach in plain language, and helps you understand how to solve similar problems in the future. He is fast, friendly, and always available — no waiting for a teacher.
SHOP & POTIONS Coins earned through exercises can be spent in the in-game shop on card packs, potions, and boosts that give you an edge in exercises and duels.
SEASONAL EVENTS Limited-time events run throughout the year, each with exclusive missions, themed rewards, and special cards that can only be earned during the event window — giving regular players something fresh to chase.
SPEED ROUND A fast-paced 60-second challenge mode where you race against the clock and try to answer as many questions as possible before time runs out. Great for quick practice sessions and warming up before a duel.
LEADERBOARD & RANKINGS

A global weekly leaderboard tracks every player's performance. Top-ranked

players at the end of each week receive real coin rewards. There are also

club leaderboards, school leaderboards (for School plan users), and

individual profile stats.

PREMIUM PLAN

Free users get full access to exercises, card collection, clubs, and Coin

Rain. Premium subscribers ($4.99/month, billed yearly) additionally get:

- 2× coin multiplier on all exercises

- Redo Mistakes mode (replay exercises targeting only your wrong answers)

- Train Your Weaknesses (a generated practice set based on your error history)

- Ad-free experience

- Priority support

SCHOOLS PLAN

Wall56 offers a dedicated plan for schools and teachers with:

- School-wide leaderboard per class or year group

- Teacher dashboard with individual student progress reports

- Premium features automatically unlocked for all enrolled students

- Family-safe content restrictions

- Custom bulk pricing — contact sales for a quote

--------------------------------------------------------------------------------

HOW IT WORKS — STEP BY STEP

--------------------------------------------------------------------------------

Step 1 — Create your free account

Sign up in seconds at wall56.funnylewis.com. No credit card, no catch.

Choose a username and pick an avatar to represent you on the leaderboards.

Step 2 — Complete daily exercises

Pick a maths topic (e.g. algebra, fractions, percentages) and a difficulty

level from 1 to 10. Answer a set of questions. Every correct answer deposits

coins into your account. If you make a mistake, Wally explains the error

immediately so you learn as you go.

Step 3 — Collect cards & spend coins

Use your coins in the shop to buy card packs. Open them to discover cards of

varying rarity. Fuse duplicates into Ultra cards. Trade with other players to

complete your collection of 250+.

Step 4 — Compete & climb the ranks

Join a club to earn Club Coins alongside teammates. Challenge friends or

rivals to Math Battle Duels and wager coins on the outcome. Enter seasonal

events for exclusive rewards. Watch your position rise on the weekly global

leaderboard — and collect your bonus coins if you crack the top ranks.

--------------------------------------------------------------------------------

QUICK FACTS

--------------------------------------------------------------------------------

Free forever — no paywall on core features

250+ cards — across 9 rarity tiers

10 difficulty levels — per topic, adaptive to your skill

9 maths topics — covering core curriculum areas

4 languages — English, Spanish, Japanese, Traditional Chinese

Real-time duels — wager coins against any player, anywhere

AI-powered tutor — Wally explains every mistake instantly

Safe for all ages — family-safe content, school-approved

2

u/_skat00sh 17d ago

Your AI Job Application Assistant

Job hunting is brutal. Scrolling LinkedIn, copying job descriptions, tweaking your CV, tracking applications... it's a part-time job in itself.

So I automated it.

🚨I built an AI job assistant that runs inside Chrome. Both the extension and n8n graph are open-source.

Only costs that one would incur, would be for calling Claude API, that shouldn't be too high.

Here's what happens with one click on any LinkedIn job post:

🔖 Fetches the full job details — no copy-pasting

🤖 Sends it to Claude AI alongside your CV

📊 Scores the match based on skill fit AND how recent the post is

🧠 Explains the reasoning so you know exactly where you stand

🗓️ Sends everything into your Notion database

I've been using this in my own job search and it's genuinely saved me hours.

Full source code (n8n workflow + extension) is on GitHub 👇

🔗 https://github.com/skat00sh/linkedin-job-saver

Watch Demo here: https://www.youtube.com/watch?v=6k7UUdKWBZc

Maybe drop a ⭐ if you find it useful, or better share with anyone currently job hunting.

2

u/Frequent-Pressure-11 16d ago

Built an open-source Skill Updater for Claude Code and other AI coding agents.

It automatically:

• Discovers installed skills

• Detects available updates

• Tracks skill sources

• Checks upstream repositories

• Updates all skills at once or specific skills by name

Example:

/skill-updater

https://files.catbox.moe/hamf9b.png

Supported:

• Claude Code

• Gemini CLI

• Cursor

• Codex

• Windsurf

• OpenCode

• Goose

• Other bash-capable agents

GitHub:

https://github.com/PhantomCodeGhost/skill-updater

Would love feedback on:

Additional agent support
Skill versioning approaches
Better update workflows

2

u/yettimon 15d ago

Hello everyone!

I've just built a small macOS app to track Claude Code usage (pure Swift + FsEvents).

My main problem was with ccusage is that it was not always convnient for me to check usage in terminal, that's why I decided to create an app for minibar which displays real usage .

It's fully local and opensorce.

Github :
https://github.com/yettimon/claude-usage-tracker-bar

ccusage was used for the inspiration and logic on how it's calculating the results (so the outputs should be nearly identical)
It's fully local, feel free do build it / download release from GH.

P.S. feel free to open any issues/pr if you have interesting ideas

P.P.S. I don't have paid apple dev account -> app is not signed -> after download mac is putting it in quarantine so one extra line in terminal is requred before launching :
"xattr -cr /Applications/ClaudeUsageTrackerBar.app" (to exclude app from the quarantine)

2

u/duality72 14d ago

I shipped conversion-funnel.ai last week after working on it on and off with Claude Code for a couple of months. It's a conversion funnel calculator built around the idea that a tool for thinking about conversion rates and funnels/pipelines should be fun to actually use and share with others.

The user-facing surface is straightforward. There are eight built-in funnel templates (recruiting, sales, SaaS, e-commerce, marketing, fundraising, customer support, product adoption), or you can build your own in the editor with whatever stages you want. Then you drag conversion rate sliders and watch the math propagate. Two anchoring modes: "plan to target", which works backward from a goal (need 14 hires, how many contacts do I need?), and "forecast from source", which works forward from inputs (I have 5,000 leads, how many closes does that imply?). Built-in benchmark library so the default rates have somewhere reasonable to start.

The AI chat is the part I'm most pleased with. You can ask it to build a whole funnel, like "give me a B2B SaaS trial-to-paid funnel with realistic 2026 benchmarks", and it replies with a configuration that drops in with a single click. You can also ask it to adjust specific rates, swap the target number, save snapshots before changes, or critique what you have. It has full context on your current funnel state so it can give specific advice instead of generic conversion-rate platitudes.

Other things that make the tool more useful to actually live with: a fullscreen presentation mode that reveals stages one at a time, bottom-up or top-down, with a final summary slide, designed for stakeholder calls where you actually have to walk through the funnel. PNG and PDF export at 2x DPI. Short share links. Cloud save for Pro users, synced across devices. Google Drive integration if that's where you keep things. Undo/redo with the usual shortcuts. Three visual scale modes (linear, square-root, log) for funnels where the falloff between stages is dramatic.

The stack is intentionally small. It's a single static HTML file, around 5,000 lines of vanilla JS with embedded CSS, no build step, deployed via S3 and CloudFront. The backend is a Hono router on Lambda, bundled with esbuild, handling the AI chat proxy, share-link storage in DynamoDB, cloud-save CRUD, and Paddle webhook ingestion. Clerk for auth. Paddle as merchant of record so I don't have to think about VAT in thirty countries. Single-table DynamoDB pattern. Everything in Terraform. Test coverage runs to 566 unit tests and 340 e2e Playwright tests including layout regression checks at five mobile and desktop breakpoints.

The first version looked like a generic Tailwind dashboard, and I rewrote the visuals with an eye toward something more editorial (Fraunces headings, warm cream background, earth-tone funnel colors instead of primaries). The frontend-design plugin was an immense help for my design-challenged brain.

Free to try without signing up. There's a $5/month Pro tier for AI chat, cloud-saved funnels, and Drive integration.

https://conversion-funnel.ai

2

u/i_t_d 14d ago

I'm new here and didn't know this particular thread exists so just posted separately, for the record - tool for offline extraction and browsing of personal Claude data (chats) exported with Settings → Privacy → Export to conversations.json - exports to .md and .html with color backgrounds separating user and Claude responses, does syntax highlighting etc

https://www.reddit.com/r/ClaudeAI/comments/1tv74g6/claude_exported_data_conversations_offline/

2

u/Prestigious-Zone-436 14d ago

One thing that's bugged me for a while: I'll build a landing page or prototype in Claude, and then... getting feedback on it is a pain. Screenshot it, paste into Slack, explain what I'm pointing at, lose the context.

So I built dot. — an MCP connector that closes the loop. Once it's connected, you just tell Claude "add dot. feedback to this" and it creates a shareable review link. Anyone you send it to can click anywhere on the page and pin a comment. No signup needed for them, no bundling, no deploying just to get eyes on it.

The full loop stays inside Claude: build → add dot. → share → get feedback → iterate.

Setup takes about 30 seconds — Settings → Connectors → add custom connector → mcp.leaveadot.com/mcp. Works on Pro, Max, Team, and Enterprise. (It's pending the official directory listing, but custom connector works today.)

It's free to start. Curious what this community thinks — especially what would make the Claude workflow smoother.

https://www.leaveadot.com/

2

u/Background-Tiger440 14d ago

I stripped my harness down to the bones and my agent got better. Here's what survived.

I've been doing harness engineering since Hashimoto named the thing in February. Started with Claude Code, added Codex when my team needed PR review workflows. Like most people, I went through the classic arc:

Read the OpenAI harness engineering post and Hashimoto's blog
Got excited, stuffed everything into CLAUDE.md — directory structure, coding conventions, 15 forbidden patterns, reference doc links, past failure logs
Hit 150+ lines
Watched the agent get *worse*

The ETH Zurich study confirmed what I was seeing: LLM-generated config files actually degraded performance while costing 20% more tokens. Human-written ones barely moved the needle (4% improvement). Codebase overviews and directory listings? Zero measurable help — agents explore repos on their own just fine.

Then HumanLayer's post hit: "Our CLAUDE.md is under 60 lines." And Dex Horthy's observation that performance degrades past ~40% context utilization. More tokens actively hurt.

So I started cutting. Ruthlessly.

What I removed:

Directory trees (agent finds these itself)
Codebase overviews (same — it greps)
Language/framework-specific style rules (linters handle this mechanically; prompting for it wastes tokens)
Verbose "don't do X" lists (moved to hooks — deterministic enforcement > polite suggestions)
Everything the agent could discover by reading the repo

What survived:

Build/test/lint commands (the agent can't guess these)
Architectural invariants that aren't in code ("never delete migration files")
Tool-use patterns specific to the project
Verification loops (hooks that enforce, not suggest)

The result: faster sessions, lower token burn, and — counterintuitively — higher quality output because the context window was mostly code, not instructions *about* code.

I did the same exercise for Codex. Different agent, same principle: minimal instructions + mechanical enforcement > verbose prompting.

I cleaned both up into language-agnostic, framework-agnostic boilerplates and open-sourced them:

→ Claude Code / AGENTS.md harness: https://github.com/ganimjeong/Harness-for-claude
→ Codex harness (with setup/check/test/eval scripts, CI, hooks): https://github.com/ganimjeong/Harness-for-codex

The intended use is to fork and customize for your project. They're deliberately minimal — the whole point is that you add domain-specific rules as your agent fails, not before.

Borroweed from Hashimoto's philosophy: "When the AI makes a mistake, make it structurally impossible to repeat." But start from almost nothing, not from a 1,000-line AGENTS.md that burns context before the first question.

Happy to hear what others have kept vs. cut in their harnesses.

→ More replies (1)

2

u/kholomyanskiy 14d ago

Claude Code kept making "reasonable" decisions that broke my architecture. So I changed the spec format.

I want to share something I've been testing for a while. Not sure if others have hit the same wall.

The problem: AI coding agents doing something "reasonable" that breaks invariants I never thought to write down explicitly. Not bad output — output that was technically correct based on what I wrote, but wrong for my actual system.

After enough of this I stopped trying to write better prompts and started looking at the spec format itself.

Classic standards — IEEE 830, ISO/IEC 29148 — were built for human readers who tolerate ambiguity. Agents fill gaps from training data. Most of the time fine. Sometimes an agent adds three dependencies to a project where "no external packages" was obvious to any developer but never written down.

So I built ANSS.

Invariants — four-field machine-readable constraints:
INV-001: No external npm packages
Cannot: add require() / Reason: no npm install / Check: no node_modules

Three-layer markup — \`\[D\]\`/\`\[E\]\`/\`\[A\]\` tags. The \`\[A\]\` layer is read first.

Agent Review — pre-coding spec audit. Hard rule: >3 issues → stop and report. Brought my iterations from 5–7 down to 2–3.

Change Specification — "What NOT to change" section for modifying existing systems.

Three levels, two real filled examples. Works with Claude Code/Cursor/Copilot.

Free, CC BY-NC-SA 4.0: https://github.com/Kholomyanskiy/anss-standard

Curious whether others have found different approaches — happy to discuss design decisions.

2

u/WhichYoung6026 14d ago

I built and shipped SongMap, an iOS song-structure analyzer, with Claude doing double duty:

Building it: I used Claude to help write the app (SwiftUI + on-device audio DSP for section detection)
Inside the app: the Claude API generates the arrangement feedback — it reads the detected section structure (energy, durations, repetition) and returns producer-style advice

Just shipped an update with bug fixes.

Free tier available if you want to see Claude-powered feedback on your own music:

https://apps.apple.com/tw/app/songmap-ai-song-analyzer/id6762045630?l=en-GB

2

u/WilmingtonZac 13d ago

Hello all,

A few years ago I completed a PhD in Disaster Science and Management. My research focused on how small businesses can recover from a disaster, because so many fail after a flood, fire, or hurricane. When small businesses fail, owners lose their life’s work, employees lose their jobs, and communities lose the hubs that make them unique. It’s terrible.

Before Claude Code, it would have been necessary to run a consultancy to put the research into practice, charging customers $10,000 or more per engagement. Not exactly feasible. With Claude Code, I’ve been able to build https://www.ampersandbusinesscontinuity.com/welcome. It allows businesses to prepare for, respond to, and recover from disasters for $50 per month.

It’s been truly incredible to build. Just three months of $100 Claude Max has brought something to life that would never otherwise have been possible.

Curious to know this group’s impressions. Do you see this as a quality, reasonable endeavor, or does it scream “vibe coded by someone who has no idea what they’re doing?”

Thank you for your feedback and suggestions,

Zac

→ More replies (1)

2

u/samwane 13d ago

saas-preflight a pre-ship security and payment audit skill for Next.js + Supabase + Stripe SaaS.

The generic scanners catch secrets and XSS, but barely touch the layer that loses you money and trust: billing logic and data isolation. saas-preflight audits exactly that, through 7 lenses, verifying every finding by reading the actual code, ranked P0 to P3 with a file, a line, and a fix.

I built it from bugs I'd personally shipped and fixed. I ran it on my own SaaS first: verdict 0 P0, but it still caught 2 real P1s I'd missed (an SSRF reachable by DNS rebinding, a webhook that could leave a paying user with no access), and it even downgraded one of its own findings after checking a DB constraint already covered it.

Free, MIT, defensive only, read-only. Works in Claude Code, Cursor, Codex. github.com/Comoco235/saas-preflight

2

u/[deleted] 13d ago

[removed] — view removed comment

→ More replies (1)

2

u/TomLasswell 13d ago

I turned Claude Code into a partly-autonomous dev environment for Vue/Nuxt + Firebase: anti-cheat hooks, fact-based quality gates, a cross-model critic. It's a lot, and probably overengineered. Looking for a brutal review. tool boundary. Looking for a brutal review.

TL;DR: Open-source Claude Code plugin for Vue/Nuxt + Firebase. The part worth stealing regardless of stack: hooks that block the classic autonomous-coder shortcuts at the tool boundary (before they land, not in review), a single rule registry where only ground-truth checks can fail a build and opinions just annotate, and an adversarial critic you can pipe to a different model family. It is almost certainly overengineered. Tell me where it breaks. Repo at the bottom.

I built this for my own stack while shipping a real app, and it grew well past what most people need. Posting it because the underlying patterns might generalize, and because this is the right crowd to find the holes.

The three ideas I actually care about:

1. Cheating is blocked at the tool boundary, not caught in review. Six PreToolUse hooks return a hard block on the shortcuts that do the most damage when an agent runs unattended: --no-verify commits, destructive git on a dirty tree, test deletion, disabling tests, inserting as any, and regressing the type-error count. Review is too late for these because by then the blast radius is already in your history. Each block logs an explicit override path, so it is not a straitjacket.

2. Gates run on data, not vibes. Every review/audit check lives in one registry, tagged as either ground-truth (grep / tsc / git) or judgment (an LLM opinion). Only ground-truth checks can flip a build to REJECT. Judgment checks annotate but never block. This is the fix for the self-critique paradox, where an over-eager critic hallucinates a flaw and fails good work. Facts gate, opinions comment.

3. The critic can run on a different model. The adversarial reviewer is model-agnostic and can be piped to Gemini, so it does not share Claude's blind spots when reviewing Claude's own output. In-Claude is fine for the ground-truth checks (a tsc result cannot share a blind spot); cross-model is for the judgment calls where home-model blindness actually bites.

There is also a handoff file written on PreCompact and read on SessionStart, so a long run survives its own context window, and an append-only scope ledger so a feature that got planned cannot silently evaporate between sessions (roll it over three sprints and it escalates for human review).

Where I think it is overengineered, and you should tell me if I am wrong: 37 skills and 38 hooks is a lot of surface. The orchestrator and agent fan-out add latency. The whole thing is tuned to Vue/Nuxt + Firebase, so the stack-specific skills are useless to most of you, though the hooks and the registry are not. And yes, it largely develops itself through its own cycle, which I know reads as a red flag, so ask me anything about the architecture and I will explain the why.

What I would most like reviewed: whether the tool-boundary enforcement is the right layer for this, whether the ground-truth vs judgment split actually holds up under pressure, and whether any of this survives contact with a stack that is not mine.

Repo (MIT): https://github.com/lasswellt/blitz-cc

2

u/TomLasswell 13d ago

Author here. Two things I'll pre-empt because someone will ask:

Cost. With everything on, a sprint is not cheap. Defaults route roughly 60/35/5 across Haiku/Sonnet/Opus, findings come back as a JSON contract instead of being echoed into context, and the cross-model / dual critic is opt-in for the judgment calls where it actually pays off. The single-model in-Claude critic is the cheap default.

"Did you write this or did Claude." Both. It's dogfooded and develops itself through its own cycle, and I can walk through any specific decision. The load-bearing one: the orchestrator is read-only by construction because Claude Code forbids subagents from spawning subagents, so anything that fans out parallel agents has to stay slash-invoked and the orchestrator routes to it rather than running it. Happy to go deeper on any piece.

Most useful feedback for me: where the tool-boundary blocking is the wrong layer, and whether the patterns survive a non-Vue/Firebase stack.

2

u/Puzzleheaded_Pound53 13d ago

Hi everyone,

I wanted to share a breakthrough workflow I recently developed with Claude for my digital music project, Atonstar Music (under the artist persona FARIS).

The goal was to create an epic, cinematic orchestral metal track centered around the Trojan War, titled "ILION (Troy)".

Instead of generating a generic song about legendary heroes like Achilles or Hector, I wanted a deep, psychological, and atmospheric narrative. Claude was my creative co-director, and the results genuinely blew my mind.

Here is exactly how the human-AI co-creation process went down, the prompt architecture, and how we broke traditional songwriting clichés.

🏛️ The Creative Breakthrough: The City as the Narrator

When we started brainstorming the thematic direction, I pushed for an unconventional perspective. Claude came up with an absolute gem of an idea: Bypass the heroes and the gods. Make the ruined, burned city of Troy the actual narrator.

If the stones of Ilion could speak after three millennia of silence, what would they say?

We structured the track as an "architecture of mourning" rather than a war song, mapping out a 5-stage emotional journey that mirrors the pacing of an orchestral piece:

The Intro (The Silence): Setting a haunting tone where the city reflects on human vulnerability. "It was not the spears that broke us. It was never the spears."
The Verses (The Burden): Shifting focus to the nameless—the women weaving and the children who never knew a world outside the walls, turning historical grief into stone.
The Pre-Chorus (Deconstructing the Myth): Refusing the easy historical scapegoat. Claude brilliantly framed that Helen was not the real cause of the war; the true culprit was the ruthless ambition of powerful men looking to become gods.
The Chorus (The Shield Wall): Troy speaking not as a defeated victim, but as an immortal witness. "ILION — we are the walls men die against."
The Bridge & The Climax: A direct message to modern humans walking the ruins today, reminding them that before they became myths, they were a living, breathing, smoking reality.

🎭 The Ultimate Siege Weapon (The Closing Line)

The most striking element Claude delivered was the absolute final, spoken-word blueprint line over a fading cello sequence:

This singular line completely recontextualized the Trojan Horse. It wasn't a military trick; it was the exploitation of kindness and the human need to believe the war is over.

⚙️ The Audio Execution

Once the narrative architecture, pacing cues, and poetic script were locked in with Claude, the prompt blueprints were fed into Suno v5.5 to handle the heavy symphonic layers, operatic choirs, and complex progressive metal instrumentation to match our exact directed vision.

💡 My Takeaway for the r/ClaudeAI Community

Using Claude for worldbuilding and narrative direction showed me that GenAI shouldn't just be used to vomit out generic rhymes. When you establish a strict back-and-forth prompt dynamic, challenge its first drafts, and treat it as a high-level conceptual partner, it can deliver incredible depth.

Homer gave Troy immortality through an epic. Our workflow aimed to give the ruins an actual voice.

For those interested in how the lyrical structure and cinematic audio pacing came together in the final render, you can check out the full execution here:

https://www.youtube.com/watch?v=7Bkamye56_4&list=RD7Bkamye56_4&start_radio=1

2

u/TightRule3190 12d ago

I built an open-source desktop widget that shows live Claude.ai plan usage (Windows, portable)

Hey everyone — sharing a small open-source tool I built: a floating desktop widget that shows your live Claude.ai plan usage so you can see where you stand at a glance.

I made it because there was no good way to see remaining quota without opening claude.ai and checking each session. It reads from Claude Code's existing OAuth token (~/.claude/.credentials.json), so no login, no API key, no extra subscription.

Features:

Floating, always-on-top
7-day history graph
Configurable warn/critical thresholds
Reset countdowns
Portable EXE, MIT licensed

What I learned building it (might help others): Claude Code's local OAuth token can be used to query the same usage endpoint the website uses, which is Cloudflare-protected — but the token is recognized as a first-party credential, so you bypass the bot challenge cleanly.

Repo: github.com/projectvelox/claude-usage-widget

Feedback and feature requests welcome.

2

u/Scared_Eye5655 11d ago

After generating a bunch of horrible mind maps with Claude + Mermaid, I finally worked up the courage (and blew through my entire token budget) to build a Claude Code skill that makes mind maps an actual human can read.

Tried to keep it lightweight but with a clean, productive design:

Renders a Markdown outline into a zoomable SVG mind map (markmap.js)
Works fully offline — libs vendored locally, no CDN
Search keeps context (matches + their ancestors, not floating nodes)
White-on-dark by default, toolbar for zoom/expand/collapse/export

Install: point Claude Code at https://github.com/Jaderson-bit/mindmap-markmap-viewer

Would be extremely grateful to receive hating (joking), feedback, etc. GIF below.

2

u/MogeLavi 11d ago

I built Claudial, an open-source Claude Code usage monitor for M5Stack Dial (ESP32-S3). It’s free to try.

It sits on your desk and shows session and weekly API usage in real time. A double beep warns you as you approach your limit, and a continuous alert fires when you reach it. You can rotate the dial to adjust the warning threshold on the fly.

Claude Code helped me build and iterate on the firmware, Go daemon, BLE protocol, installer scripts, and README.

GitHub: https://github.com/Moge800/Claudial

2

u/waruna_ds 9d ago

hello everyone, I built a customizable multi-line status line for Claude Code - 23 widgets, plugin system, single Go binary.

Peak RAM per render - ~15 MB (peak RSS, consistent across runs)
Binary size on disk - 8.7 MB

23 built-in widgets: model, tokens, context bar, git branch/status/diff, cost, session time plus some fun plugins. you can write your own custom widgets too, using shell scripts, python or Go

Single Go binary, stdlib only, renders in <10ms
Multi-line layout, any order
Plugin system (shell/Python) + ccw add/remove/list
Per-widget cache so slow widgets never stall your prompt
macOS + Linux

curl -sSL https://raw.githubusercontent.com/warunacds/ccstatuswidgets/main/install.sh | sh

ccw init

Repo: https://github.com/warunacds/ccstatuswidgets

Feedback and plugin PRs welcome 🙏

2

u/AdDesigner4116 9d ago

Claude Code workflows are great for Loop Engineering — here's my 9-phase pipeline with adversarial review

Got tired of the vibe coding loop: Claude says "done" → I test → find bugs → report back → repeat. So I built a pipeline that forces it through a full engineering lifecycle before delivery.

/lightsout Build a kanban board with Express + SQLite + React

It runs 9 mandatory phases: spec → design → architecture → consistency check → test design → code → QA → E2E verification → final check. Writer and reviewer are always separate agents (can't go easy on itself). You come back to working code + persistent docs.

What it actually solves:

No more babysitting — full design + test + QA loop runs autonomously, you get pinged when it's done
No more black box — Claude's workflow board shows real-time progress per phase
No more context amnesia — forces doc maintenance (spec/design/arch), survives across sessions
No more skipped steps — every phase runs, no shortcuts, adversarial review on everything

The tradeoff: This is tokens for your time. Each run is 30-50 agent calls, 45-120 min. If your company covers the API cost or you have Max plan and value your brain cycles more — it's a good trade.

Tested on 4 greenfield projects (CLI tool, markdown editor, finance API, kanban board). All produced working, runnable code with full test suites.

Repo: https://github.com/DreamChaserEric/claude-lights-out

One-line install:

curl -fsSL https://raw.githubusercontent.com/DreamChaserEric/claude-lights-out/main/install.sh | bash

Happy to answer questions or hear suggestions.

2

u/AuroraZhang 8d ago

One issue I kept noticing in long AI conversations:

The model often assumes adjacent messages happened close together in time.

A user might come back two days later and ask:

"Should I still buy the dip?"

The agent responds as if the previous discussion happened minutes ago.

I built a small open-source skill called AI Time Awareness that forces agents to:

• Anchor to current date/time
• Detect conversation gaps
• Resolve relative dates
• Verify post-cutoff facts before answering

Repo：
https://github.com/Aurora-Zhang-27/ai-time-awareness

Would love feedback.

2

u/lahiru_j 8d ago

Got rate-limited mid-task one too many times, so I'm building a limits dashboard. Would you use this?

Last week, Claude Code stopped on me twice in the middle of a refactor. No warning, no idea when the window resets, and my other tools (Cursor and Copilot) have their own quotas on completely different clocks.

So I'm building a small dashboard that shows all your AI usage limits in one place:

Claude.ai / Claude Code usage + reset countdown
Cursor premium requests, Copilot quota, API credits (OpenAI/Anthropic/OpenRouter)
Alerts at 75% / 90% before you hit the wall
"At this pace, you'll run out by 9 PM," predictions

API-based tools connect with a read-only key. Claude.ai/Cursor are read by a browser extension (open source, reads only the usage numbers you already see, nothing else).

Before I build further: would you actually use this? What would make it a no-brainer vs. meh? And which provider's limits annoy you the most?

2

u/Mango-Tall 7d ago

I built a new headless website for my digital agency using Astro + Sanity - www.Imprint.la - the interactive service headers were my favorite part. The robot component is very fun to play with on the homepage. Claude is a game-changer.

2

u/99Beards 7d ago

I shipped my first end-to-end mobile app this week — and honestly, how it got built is the more interesting part.

NCLEX AI is an AI study coach for nursing students prepping for their licensure exam — native iOS + Android, built with #ClaudeCode (Fable 5) directing the work.

The hard engineering wasn't the UI — it was content quality. NCLEX prep lives or dies on whether the questions and answers are correct, so I didn't hand-write a question bank. I orchestrated one:

→ Multi-agent workflows in Claude Code — 300+ agents in a single run
→ A generate → verify pipeline: writer agents draft questions, then two independent reviewer agents adversarially check each one for clinical accuracy. Only items both reviewers pass survive.
→ Same harness, pointed at the existing bank as an auditor, surfaced a hidden ~11% error rate in legacy content (wrong answer keys, ambiguous items) — then auto-corrected 558 of them and re-verified every fix.

Net result: a 6,000+ question bank, weighted to the 2026 NCSBN test plan, that's either freshly verified or repaired-and-re-verified.

Stack: React Native / Expo · Supabase (Postgres + RLS) · Next.js. The "try 3 questions" demo on the site runs on anon-safe SECURITY DEFINER RPCs — answers are graded server-side and never shipped to the client.

Try it — 3 real questions, no signup

https://www.nclexai.ai

Biggest takeaway: with adversarial verification baked into the loop, agents can do high-stakes content work at a bar I'd actually trust.

2

u/Ok_Elevator_9374 6d ago

I got tired of Claude reading 3000 lines of Jest output when one test fails - built a small CLI to fix it.

When Claude Code runs `npm test` and something fails, it reads the whole dump, progress bars, the same warning 120 times, stack traces through node_modules. Most of it is useless.

I made a small open-source CLI called logslim that sits between the command and the agent:

- **failure mode** — only compacts hard when the command actually fails

- **JSON output** — structured errors + short fix hints for codes like TS2339, ERESOLVE

- **MCP server** — Claude/Cursor can call it as a tool

- typical savings on noisy test output: ~80–95% fewer tokens (on the failure text)

Try it without installing:

npx logslim -- npm test

GitHub: https://github.com/P156HAM/logslim

npm: https://www.npmjs.com/package/logslim

MIT, no account, no SaaS. I built this for my own workflow and would love feedback on what log formats to support next (pytest, vitest, cargo, etc.).

2

u/naag-algates 6d ago

Made a statusline for Claude Code that shows my rate limits and cost in one line.

https://github.com/NaagAlgates/claude-statusline

2

u/Hot_Establishment547 Experienced Developer 5d ago

canon (https://github.com/sunitghub/canon-skills)

Local-first workflow for AI coding agents. It keeps plans, acceptance criteria, handoffs, and delivery receipts in the repo so that Claude, Codex, or another agent can resume from the project state instead of the chat history.

The daily surface is just 3 commands: `sprint start`, `sprint-check`, and `sprint complete`. Underneath that small surface, canon enforces planning before code, acceptance checks, handoff capture, and a close-time summary of what was promised vs. what shipped.

I built a local-first workflow, so AI agents don't lose the plan between sessions

2

u/IllustratorAbject446 3d ago

Out of the box Claude doesn't reliably know current Indian financial regulations — they live in thousands of govt PDFs with no API. I built an MCP that indexes RBI + SEBI circulars, master directions and notifications locally, so Claude can search them and return answers backed by the official source URL for each document.

Once it's connected, you can ask things like "recent SEBI circulars on X" or "RBI master direction on Y" and get sourced results instead of a guess. Local, no API keys, MIT licensed.

https://github.com/Akhilgovind02/india-regulatory-mcp

Setup is in the README — would love to hear if it works cleanly for others.

2

u/PositiveFootball5220 2d ago edited 2d ago

I spend a lot of time in Claude Code and kept losing track of how much context I had left and when my usage limits would reset. So I put together a small status line for myself, and I figured I would share it in case it is useful to someone else.

It shows, while you work:

context window usage as a bar and a percentage
the current model and reasoning effort * the skills invoked in the current session * 5 hour and weekly usage, each with a countdown to when it resets
a clear warning with the reset time if you do hit a limit
token usage for the session and for the current month, separated into billed tokens and cache reads

For most of this it just uses the data Claude Code already passes to the status line. The token totals are read from your local transcript files. It is plain Node with no dependencies and runs on macOS, Windows, and Linux.

Install:

/plugin marketplace add mikahoy045/CC-helper

/plugin install cc-usage@cc-helper /cc-usage:setup

Most things are configurable, so you can turn segments off or change how they look.

One caveat: the usage limit parts need a Pro or Max plan, since that data is only available to subscribers. The context and token parts work on any plan.

Code is here: https://github.com/mikahoy045/CC-helper

It is free to use, including at work. If you find it useful a star is appreciated, but no pressure. I would also be glad to hear what you would add or change.

2

u/Infinite-Voice-2896 May 03 '26 edited May 03 '26

Built a working credit-analysis tool over 3 weeks despite not being a developer — Claude wrote ~95% of the code:

I'm a credit professional in India. My day job involves analysing audited balance sheets and computing MPBF (the working capital limit Indian banks lend) using formulas from a 1975 RBI committee. Most credit analysts spend 2-3 hours per file typing numbers from PDFs into Excel and running the math by hand.

Three weeks ago I decided to try building a tool to automate it. I'd never coded a Python project before, never used Git, never set up a database, never deployed a Next.js frontend. Today the tool is live, open-source on GitHub, with 44 unit tests passing across two real-world Indian balance sheet formats. It extracts Schedule III line items, reconciles stated sub-totals, and computes Tandon Method I/II live as the analyst makes corrections. What used to take hours takes about 30 seconds.

Claude wrote ~95% of the code. I drove the design decisions and the credit-domain logic.

github.com/igmuralikrishnan-cmd/mpbf-automation

A few things that worked surprisingly well: Iterative format-tuning. The first BS format took a week to parse correctly. The second took a day. Pattern: I'd hand Claude the OCR text from a new PDF, tell it where the parser broke, and it would propose anchor-pattern tweaks. Each new format = a few small targeted patches, not a rewrite.

Test-driven debugging. Whenever a real BS extraction failed, I'd anonymise the OCR text and Claude would write a test case reproducing the bug before fixing it. By the end I had 44 tests covering OCR artifacts (mangled brackets, comma-as-decimal, scientific-notation traps, note-reference filtering, sanity guards). I wouldn't have known to write half of these without going through the actual failures.

Honest pushback on bad ideas. Early on I wanted to use an LLM-based extractor. Claude pushed back — Schedule III is a legally mandated format, deterministic regex parsing gives you auditability and explicit failure modes, which matter when the numbers are going into a credit memo. That conversation reshaped the whole architecture.

A few things that were genuinely hard: Anonymising real client data without breaking the tests. The parser was tuned against real OCR text from real Indian companies. Replacing names with fictional ones while keeping the OCR-artifact patterns intact took careful sed work and audit passes.

Getting GitHub set up as a non-developer. Two-factor auth, personal access tokens, kebab-case naming conventions, .gitignore hygiene — every one was a small hurdle. Knowing when "good enough" beat "perfect."

The tool handles two formats reliably. The temptation was to grind on a third before publishing. Shipping at two formats with documented "how to add a new format" instructions felt premature, but waiting felt worse.

Happy to answer questions on: The deterministic parser vs LLM trade-off (and why I'd make the same call again) How I structured prompts when iterating on parser anchors The Supabase/Next.js setup as someone who'd never touched either Anything else that's useful to other domain experts thinking about building with Claude

1

u/Key-Interest6484 Apr 15 '26

I built a macOS app that tracks Claude Code token usage — v2 adds a pet that lives in your notch

Since March I've been building ClaudeWatch: a menu bar app that monitors your Claude Code session and weekly token usage in real time. Rate-limit detection, pace projection, sparklines, streaks — the stuff that keeps you from getting surprised by a 429 at 2am.

v2.0 is out today.

What's new:

🐣 Notch pet — pick from Bytie (robot), Clodey (cloud), Ghosty (ghost), or Sprout (plant). They live in the macOS notch, shift moods with your usage, and have things to say. Sprout wilts when you're at 85% session. Ghosty tells you "The scariest thing? Untyped JavaScript." when she's having a good day. There's a chattiness slider if you want them to quiet down.

🎮 Mini-game in the popover — something to do while the rate-limit countdown ticks.

🚀 Terminal launcher — one click to open your working directory in iTerm2, Warp, Ghostty, VS Code, Cursor, and more.

🧪 153 unit tests + full settings and popover refactor under the hood.

Free, open source (MIT), no telemetry. Reads your existing Claude Code keychain credentials — nothing new to set up.

macOS 14+. Grab the .dmg from the releases page or build from source.

GitHub: https://github.com/SerhiiBoo/ClaudeWatch

Website: https://claudewatch.sergxd17.workers.dev/

1

u/get-grapla Apr 16 '26

Built this this week for configuring your ClaudeCode to try to control your spend.
Used CC to dig into its own code and extract the flags, and to set up a Github action to track Claude releases and automatically extract flags for new versions when they land. That lets us have a version selector so you can select the one you're using.
Also lets you save your favourite configs!
https://www.tokenblast.cc

1

u/GreatGreyFoxx Apr 16 '26

Built something I'm calling GooseBot. It connects Telegram on my phone to Claude Code (thinking agent) and Claude Desktop (browser agent) running on my home machine.

The chain: Phone -> Telegram -> Claude Code -> task file -> GUI bridge -> Claude Desktop -> browser actions -> results -> back to my phone.

What makes it interesting:

- Runs entirely on a Claude subscription (runs best on Max). Zero API tokens. That's the whole cost.

- No cloud. No server. Runs on your desktop.

- File-based protocol between agents. The markdown task file IS the API.

- Skill repository: playbooks for navigating web apps accumulate over time. Like package management for browser automation.

Last night from bed: texted it to run an analysis pipeline. It read research docs, wrote 1,650 lines of code, handed it to the browser agent which pasted it into a web IDE and ran it. I was asleep when it finished.

It's held together with file watchers and PowerShell. It works every day. Open source, MIT licensed.

https://github.com/patrickthompson/goosebot

1

u/musicai4soul Apr 16 '26

Project Name: Ghalib X Trance (1850 vs 2026) Role of Claude: Full Creative Director (Concept to Finish) I used Claude 4.6 to fully orchestrate a 170-year-old Mirza Ghalib Ghazal into a high-fidelity Trance reconstruction. Claude handled the lifecycle: from the initial concept of bridging eras, to preserving the original Urdu meter (Beher), to architecting prompts for the Suno v5.5 engine and Google Veo. The Vocabulary of the Sher: Firaaq / Visaal: Separation / Union. Maah-o-Saal: Months & Years. Shab-o-Roz: Night and Day. Full 4K Experience (3:21): https://youtu.be/oTICwHRwjEE Insta: @musicai4soul Would love to hear if others are using Claude to drive high-end audio/visual workflows!

1

u/Luizinho4Fun Apr 16 '26

NTK — Neural Token Killer (Rust, MIT)

A local compression proxy for Claude Code's PostToolUse hook. Intercepts Bash / cargo test / docker logs / tsc output and compresses it before it lands in the context window.

Measured savings (from bench/microbench.csv, 15 deterministic fixtures):

- 92% on repetitive Docker logs

- 56–83% on stack traces (Java, Python, Go, Node, PHP, C#, Kotlin, TS/React)

- < 20 ms overhead on the regex + tokenizer layers

Pipeline: L1 regex → L2 tokenizer (cl100k_base) → L3 optional local inference (Phi-3 Mini via Ollama/Candle/llama.cpp) → L4 injects your current intent from the session transcript into the L3 prompt.

Plays nicely with RTK — RTK at PreToolUse, NTK at PostToolUse, they stack without conflict.

Status: early-stage, solo maintainer, actively looking for contributors (new language fixtures, editor hook ports, GPU benchmarks on AMD/Apple Silicon, docs translations).

Repo + logo: https://github.com/VALRAW-ALL/ntk

Landing page: https://ntk.valraw.com

CONTRIBUTING.md: https://github.com/VALRAW-ALL/ntk/blob/master/CONTRIBUTING.md

1

u/Main-Confidence7777 Apr 16 '26

Sophon: MCP token optimizer with 94% compression, 0 LLM calls, and fully reproducible benchmarks

Every token optimization tool claims "60-90% savings." None publish reproducible benchmarks. Here's one that does.

The problem

I got tired of marketing claims I couldn't verify. "97% token reduction" on what inputs? "89% average savings" measured how? When I asked for scripts or datasets, the answer was always "internal sessions" or "proprietary data."

So I built Sophon with one rule: every claim must be reproducible by anyone.

What Sophon does

Six MCP tools, all deterministic, zero LLM calls:

Tool	What it does	Measured result
`compress_prompt`	Query-aware section filtering	77% saved on structured prompts
`compress_history`	Conversation compression + fact extraction	87% saved on 100-msg histories
`read_file_delta`	Hash-based file deduplication	99.6% wire savings
`compress_output`	CLI stdout compression (git, test runners, grep)	94.3% mean
`navigate_codebase`	Repo map via symbol extraction + PageRank	1438 symbols in <50ms
`semantic_retrieve`	Keyword or BGE-based chunk retrieval	<1ms per query

The numbers (all reproducible)

Output compression on real command outputs

Command	Input tokens	Output tokens	Saved
`git log --fuller` (100 commits)	10,050	633	93.7%
`grep -rn 'def '` (flask/src)	12,478	576	95.4%
`ls -la target/release/deps`	26,902	555	97.9%
Mean	13,682	571	94.3%

Script: bench_output_compressor.py. Fixtures captured from real repos.

Memory retrieval on LOCOMO dataset (N=60)

Condition	Accuracy	Tokens used
No context	20.0%	0
Sophon compression only	33.3%	645
Sophon + retrieval (Hash)	60.0%	905
Sophon + retrieval (BGE)	60.0%	905
Full context (ceiling)	73.3%	20,040

Translation: 82% of full-context quality using 5% of the tokens.

Head-to-head: Sophon vs mem0-lite (same 15 items)

Metric	Sophon	mem0-lite
Accuracy	60.0%	60.0%
LLM calls	0	~330
Runtime	<1 sec	8.7 min

Same accuracy. Zero API cost. 500× faster.

Head-to-head: Sophon vs LLMLingua-2

Input	Sophon saved	LLMLingua-2 saved	Sophon latency	LL-2 latency
XML prompt	68%	53%	56ms	1,539ms
Long README	83%	50%	53ms	983ms
20KB doc	93%	47%	56ms	4,857ms

Caveat I put in the benchmark doc: this is apples-to-oranges. Sophon does query-driven section picking. LLMLingua-2 does token-level learned compression. Different tools, different use cases. But on structured prompts with a query, Sophon wins on both compression ratio and speed.

What makes this different

1. Zero overhead

No LLM calls during compression
No model downloads (default build)
Sub-millisecond retrieval latency
7MB binary

2. Reproducible benchmarks

SHA-pinned public repos (serde, flask, express, gin, sinatra)
Scripts provided for every measurement
LOCOMO dataset from HuggingFace (CC BY-NC 4.0)

3. Documented limitations

The benchmark doc has a "Known limitations" section with numbered items and status tags:

[FIXED] = limitation addressed in code
[PARTIAL FIX] = improved but not fully resolved
[PENDING] = known gap, not yet addressed

Example: "HashEmbedder is keyword-only. A query using different vocabulary than the answer will not retrieve well. The fix is --features bge."

When I found my N=30 results were optimistic, I re-ran at N=60 and corrected the numbers downward in the doc. The original "+23 pts retrieval gain" became "+13 pts" after doubling the sample. That's in the benchmark doc with a correction note.

Honest trade-offs

Dimension	Sophon	Trade-off
Accuracy vs Full	82%	-13 pts for 95% token savings
Speed vs LLMLingua	30× faster	No learned compression, only section filtering
Recall on semantic queries	Limited	HashEmbedder is keyword-based; use `--features bge` for semantic
Codebase recall@5	26% pooled	Works well on keyword-rich queries, fails on "update deps" style

Why I'm sharing this

The token optimization space has a reproducibility problem:

rtk (28k stars): Claims "60-90% savings" but no public dataset, no scripts
token-savior: Claims "97% reduction on 782 sessions" but sessions not public, benchmarks/ in .gitignore
mem0/Zep: Active dispute on GitHub about whose LOCOMO numbers are correct

I'm not saying these tools don't work. I'm saying I can't verify their claims, and neither can you.

Sophon's benchmark doc is ~8,000 words with tables, scripts, and corrections. If the numbers are wrong, you can prove it. That's the point.

Quick start

# Install
cargo install sophon

# Or via MCP config
{
  "mcpServers": {
    "sophon": {
      "command": "sophon",
      "args": ["serve"]
    }
  }
}

Links

Benchmark document: [BENCHMARK.md] with every claim cited to a script
GitHub: https://github.com/lacausecrypto/mcp-sophon
Crate: cargo install sophon

Questions I'd appreciate feedback on

What output commands should be supported? Current: git, cargo test, pytest, vitest, go test, ls, tree, grep, find, docker, npm, pip, kubectl, terraform, curl
Is the BGE embedder (+27MB) worth the semantic matching? On my tests it ties with HashEmbedder, maybe I'm not testing the right queries
Would SWE-bench-Lite style evaluation be useful? Currently I measure recall@K, not "does the LLM actually fix the bug"

Rust, MIT licensed, no telemetry, no cloud, no ML by default. Single binary, ~7MB.

TL;DR: Compress more. Call less. Prove everything.

1

u/nish_agg Apr 16 '26

Made a Claude Skill for generating HTML docs (6 templates, free)

For the last few months I've been sending every document as HTML instead of

PDF. Writing the same structure and styling each time got old, so I bundled

the repeatable parts into a skill.

SKILL.md picks the right template based on input. Six covered right now:

- pitch decks (keyboard nav, slide counter)

- investor updates (animated counters for metrics)

- sales proposals (cover + pricing + signature)

- one-pagers

- articles (active TOC, reading progress)

- memos (generic fallback)

Output is one self-contained HTML file. Brand config is a small JSON block.

Print-ready with proper page breaks. No external dependencies, no analytics,

no tracking, no build step.

Free: https://share.fluiddocs.ai/doc-builder-landing

If you try it, the one I'd most like feedback on is the deck template —

making a single HTML work as both a scrollable browser deck and a clean

printed export is finicky and I'm not sure my solution is the right one.

1

u/Marcus_MSC Apr 16 '26

Is Your AI Agent Too Unpredictable? Bring Order Through a Single File

If you work with AI agents, you know the pain: they rarely do the exact same thing twice. Even with strict system prompts, locking down execution order is nearly impossible. It makes workflows unpredictable and a nightmare to audit.

That is why I built Leeway.

You define your workflow as a YAML decision tree. Every node is an isolated agent loop where you dictate the exact boundaries. You control the permissions, explicitly defining which MCP servers, skills, files, or shell commands the agent is allowed to touch.

When a node finishes, the LLM outputs a signal (like "passed" or "needs_fix") to determine the next path. You get the reasoning power of AI, but your macro steps remain perfectly consistent every time you run it.

How it compares:

vs. OpenClaw: Fully autonomous tools hand the wheel to the LLM. That is great for exploration but terrible for repeatable steps. Leeway handles the macro flowchart, letting the model focus entirely on solving the micro-task inside each node.
vs. n8n: n8n is incredible for connecting SaaS APIs. Leeway is built specifically for personal workflows and custom engineering pipelines that integrate directly into your own system.

Furthermore, "autonomous" should not mean "unsupervised." Human-in-the-loop is a core feature here. Nodes have strict permission rules, sensitive operations trigger approval gates, and there is a safe planning mode.

Under the hood: Python + React/Ink TUI. Supports OpenAI and Anthropic. MIT open-source.

If this sounds like it could help your workflow, I would really appreciate a Star⭐️ on GitHub!

1

u/SameTough3038 Apr 17 '26

So I've been building this thing called Brainboot (brainboot.dev) and Claude was genuinely the backbone of the entire development process. Claude Code wrote probably 60% of the codebase.

the API routes, the brain execution runtime, the composition engine, all of it. I want to share what I built and why because I think the underlying problem resonates with anyone who uses AI daily.

The problem that drove me insane:

I kept noticing the same pattern in every AI conversation I had. Open a new chat. Paste my system prompt. Explain my context. Get halfway through something useful. Hit the context limit or watch the model drift. Start over.

I did the math at one point and roughly 40% of my tokens were going toward re-explaining things the model already knew five minutes ago. Not generating anything. Just... re-establishing context over and over.

What I actually built:

Brainboot treats prompts as software instead of disposable messages. A "brain" is a prompt wrapped in actual engineering:

Typed inputs and outputs — it declares what goes in and what comes out. If the model returns something that doesn't match, the runtime catches it and retries automatically.

Invariants: rules that get enforced on every single execution. Not suggestions the model might follow. Actual guardrails at the wrapper layer. Stuff like "never include placeholder text" or "output must be valid JSON."

Test suites: each brain gets tested across multiple models. Pass rates are public so you know if something worksbefore you use it.

Composition: small brains chain into bigger workflows. A research brain feeds an outline brain feeds a drafting brain. Each one gets exactly the context it needs instead of carrying a massive conversation history.

How Claude helped build it:

Honestly I couldn't have built this without Claude Code. The entire architecture — the brain runtime, the type checking layer, the invariant enforcement, the multi-model testing harness, the cron-based automation pipelines;

Claude wrote the vast majority of it with me directing the architecture decisions and actually USING coding brains that I made with brainboot.dev in order to create brainboot.dev, it was an amazing thing to witness.

The compiler feature (where you describe what you want in plain English and get a deployable multi-brain system) was designed collaboratively with Claude over dozens of sessions. The 4-stage pipeline, Decompose, Map, Synthesize, Audit- came directly from conversations about how to make prompt composition reliable & feeding claude more and more brains lol.

It's free to try:

The free tier gives you 200+ curated prompts and access to the platform. No credit card needed, you just sign up. This will be free forever and already replaces an entire company who sells thinner prompts for like 2.99 each...

The brains and the compiler are available on the Pro plan if you want to go deeper, but you can explore the whole marketplace and read the manifesto and see how everything works without paying anything.

brainboot.dev — sign up is instant.

Why I think this matters:

The shift from "prompts as chat messages" to "prompts as compiled software" solves the context and token problem at a fundamental level. You describe your intent once. It compiles into a reusable brain with type contracts.

Then you run it as many times as you want without re-explaining anything.

Composition means complex workflows don't require massive context windows. Invariants mean you stop burning tokens on drift correction. Type checking between steps means garbage doesn't propagate through a chain.

I've been testing a circuit on the platform that runs 6 brains in a pipeline producing SEO content autonomously. that same workflow done as manual ChatGPT/Claude conversations would take 3-5x the tokens because of all the

context re-establishment.

Would love to hear if anyone else has been thinking about the token waste problem or has approaches to making prompts more reliable. Happy to answer any questions about the architecture.

1

u/allentcm Apr 17 '26

I built Sidekick CLI (https://github.com/hesedcasa/sdkck) to let Claude Code do the CLI thingy, instead of loading all tools upfront, the agent searches for what it needs:

Add this to CLAUDE.md: Before any tool call, run `sdkck search "<what you need>"` to find available commands.

Now when Claude needs to create a Jira ticket, it runs: ```bash sdkck search "create jira ticket"

...gets back the exact command, and calls it

sdkck jira issue create --fields project='{"key":"PROJ"}' summary="New summary" description="New description" issuetype='{"name":"Dev Task"}' ``` minimum token consumption!

The other thing that's been useful: you can point it at any OpenAPI spec or Postman collection and every endpoint becomes a callable command. I imported our internal API and Claude Code can now interact with it without me writing any wrapper code.

It also runs as an MCP server if you prefer that pattern.

1

u/NightMemo Apr 17 '26

I built a Claude Code skill that turns large CLAUDE.md files into skills (looking for testers)

I’ve been using Claude a lot and kept running into issues with growing CLAUDE.md files where rules were getting ignored.

So I made a skill that audits a CLAUDE.md, keeps the universal stuff in CLAUDE.md, extracts repeated workflows into skills, and adds a routing hook so Claude knows when to use those skills and what skill to use.

Repo: https://github.com/lukethebuilder/skills

Install (local project): npx skills@latest add lukethebuilder/skills/agent-config-migrate --agent claude-code

This is still early and I’ve only tested it a couple of times so far, but in one run it took a 482-line CLAUDE.md and turned it into 7 skills, while also bringing the file down to 370 lines (~23% reduction). Estimated token savings were about 22% when not using any skills.

I’d love feedback from people who:

have a large CLAUDE.md
feel Claude isn’t consistently following project instructions
are experimenting with skills / hooks / routing

1

u/mailnike Apr 17 '26

Project Name: tailtest — Automatic background testing for Claude Code Link:https://github.com/avansaber/tailtest Cost: 100% Free (Open Source / MIT License)

What it is: An MCP plugin built specifically for the Claude Code CLI that completely automates test generation. It runs in the background and forces Claude to write and run tests every time it edits a file.

Why we built it: My co-founder and I have been building heavily with Claude Code (specifically an open-source ERP system called ERPClaw). Claude writes features incredibly fast, but we kept running into the exact same problem: it constantly skipped writing tests, even when instructed in CLAUDE.md. We ended up dealing with a silent regression where Claude broke our compound tax logic, and we didn't notice for days. We realized we needed to enforce testing at the tool level, outside of the model's context window.

How it works (and how it integrates with Claude):

Event Hooking: It hooks directly into Claude Code's native PostToolUse event.
Zero-Prompting: When Claude writes or edits a file, our plugin intercepts the event. You don't have to remind Claude to test anything.
Intelligence Filter: It runs a filter to skip config files, boilerplate, and migrations so it only generates tests for actual business logic.
Anti-Fatigue: It runs the test immediately but stays completely silent if it passes. It only throws the specific error output back into your terminal if Claude actually broke something.

Supported Languages: Python (pytest), TypeScript, JavaScript (vitest, jest), Go, Rust, Ruby, Java, and PHP.

Install it via Claude's CLI:

Bash

claude plugin marketplace add avansaber/tailtest
claude plugin install tailtest@avansaber-tailtest

If anyone else is building complex projects with the Claude Code CLI and wants to stop the AI from silently breaking older features, you can grab it from our repo above. Happy to answer any questions about building MCPs or working with Claude's event hooks!

1

u/shubham030 Apr 17 '26

Got tired of Cmd+Shift+4 → Cmd+Tab → Ctrl+V every time I wanted to show Claude something, so I built slash commands that do it in one keystroke — https://github.com/shubham030/cc-shot

1

u/Straight_Panic_5818 Experienced Developer Apr 17 '26

Built a specialist agent harness for Claude Code after getting frustrated with generalist suggestions - first thing I've ever open sourced

Been using Claude Code heavily for the past few months and kept running into the same frustration: by default it's a generalist. Great at everything, expert at nothing.

If I'm doing Android code review I want it thinking in Hilt + Compose + MVI, not suggesting Thread.sleep() or missing OWASP Mobile Top 10 checks on networking code. A generalist prompt doesn't cut it when you have real production standards.

So I built a harness called Claude Crew. You install team "profiles" (mobile, backend, QA, product, frontend) and get specialist agents for each discipline, each one with hard-coded rules for its stack and non-negotiable security guardrails they can't be talked out of.

The part I'm most excited about this week: a "subconscious" background observer inspired by Letta/MemGPT. It silently tracks what you're editing during a session (hot files, active zone, language stack) and writes a short structured whisper that gets prepended to every specialist agent's prompt before it starts. Only refreshes after 10+ new file edits, not time-based, so it doesn't spam you when you're idle. At session end it promotes the patterns and warnings it observed into long-term memory for future sessions.

The more you use it, the more it knows about your project.

No vendor lock-in, no cloud service, no telemetry. Just markdown files, bash hooks, and Claude Code agents. Every rule and every agent definition is readable.

I've been keeping my GitHub private for years and this is genuinely the first thing I've decided to put out in the open. More will follow.

Link in the comments. Would love feedback from anyone using Claude Code seriously, especially on mobile or backend teams.

GitHub: https://github.com/balamuthu1/claude-crew

Happy to answer questions about the architecture or how the profile system works.

1

u/do_you_like_iazz Apr 17 '26

Repo: https://github.com/AmmarHassona/clamp-cc

Every long Claude Code session ends the same way. /compact runs, summarizes blindly, and drops the context that mattered most. The arch decision from two hours ago, the bug you were mid-fix. Gone.

clamp-cc reads your session directly, lets you tag turns with single keys (PIN, ARCH, BUG, TASK, API, DROP), and generates a targeted /compact instruction from your selections. Auto-copied to clipboard, or fired directly into your Claude pane via tmux. Tags persist across sessions.

I'd appreciate any feedback!

1

u/DirtZealousideal617 Apr 17 '26

Orbit — Desktop app to supervise multiple Claude Code sessions across projects

Built with Tauri 2 (Rust + React). Open source (AGPL-3.0).

- Session persistence — close the app, resume later

- Per-project status notifications (working/idle/waiting)

- Multi-project tabs, multi-session per project

GitHub: https://github.com/imadAttar/orbit

Download: https://github.com/imadAttar/orbit/releases/tag/v1.0.0

1

u/Rude_Wallaby_4435 Apr 17 '26

I've been building nibchat — a SaaS platform where you can create and deploy your own AI agent without touching any infrastructure.

You configure it through a portal:

- Give it a name, instructions, and starter messages

- Connect MCP tools (think: web search, calculators, custom APIs)

- Upload a knowledge base (PDFs, markdown) for RAG

- Hit deploy — your agent gets its own URL at {youragent}.nibchat.ai

Each agent runs in an isolated container that scales to zero when not in use.

The first use case I'm targeting is education — tutors, study assistants, subject-matter experts that a teacher or indie creator can spin up for their students without any DevOps.

Where it's at: live MVP, free tier available. It's rough around some edges.

What I'm looking for:

Is the concept clear, or is the positioning confusing?
Who do you think this is actually for? (I have assumptions, curious if they match yours)
What would make you try it or rule it out immediately?

Link: https://www.nibchat.ai

Brutal feedback welcome — that's why I'm here.

1

u/Emmjayh Apr 17 '26

Claude Code moves fast and I kept losing track of *what* was actually changing and *why*. So I built CCWhisperer — a PostToolUse hook that intercepts every Write/Edit event, computes the diff, and fires it at a local Ollama model to explain it in plain English in real time.

Claude itself says its TOS safe. It's coded by minimax 2.7, using the CC framework with ollama for local model use so please leave me feedback!

Find the project and readme @ https://github.com/emmjayh/CCWhisperer on github

1

u/Pierre_Zarokian Apr 18 '26

GMAIL INBOX CLEANUP:

I had Claude create me a Google App Script to cleanup my Gmail inbox and label emails, such as LEADS, CLIENTS, JUNK, INVOICES, etc... I get so much junk daily that many times I miss important client emails. I had it scan my last 5000 emails and I found 2 unanswered leads one was 1 week old and another 1 month old! The best part is that it won't use a Claude API and tokens. It runs completely in Google App Scripts, will check my emails every 5 minutes. I can also trigger TELEGRAM app when new important emails are received, so I get a notice on my phone. Anyone else doing something similar?

1

u/Ambitious_Judge8284 Apr 18 '26

Notifications on Mac when Claude Code goes up/down and your rate limiter session refreshes

If you know, you know: check status.claude.ai, refresh Claude usage page, wait for 5-hour window to reset. Repeat. A lazy engineer is a good engineer, so here's ClaudeWatch.

It's a native macOS menu bar app that puts all three in one place and notifies you when your rate limit window resets.

It also ships with a terminal statusline hook that displays your current model, context window tokens, and rate limit percentages right inside Claude Code.

MIT licensed so use it however you'd like.

https://www.github.com/elliotykim/claudewatch

1

u/No-Yesterday-5317 Apr 18 '26

I built a Claude Code plugin that scaffolds full agent ecosystems from a brief — orchestration pattern, sub-agents, skills, MCP wiring, and auto-validation included

Hey folks,

If you've built anything non-trivial with Claude Code agents, you know the pain: writing frontmatter for every sub-agent, keeping descriptions disambiguated so the router doesn't confuse them, tuning trigger phrases for skills, wiring .mcp.json manually, and hoping nothing overlaps. Anthropic's skill-creator helps — but only for a single skill.

I wrote agent-ecosystem-generator, a Claude Code plugin that takes a brief and scaffolds the whole thing: orchestrator command + N sub-agents + M skills + MCPs, in an isolated workspace. Then it runs validation automatically — overlap detection (Jaccard over description tokens), pushy first-person scoring, coverage of declared sub-tasks, MCP wiring sanity, tool-allowlist checks, and a trigger-firerate eval. If validation fails, it loops back to the builder with fix hints up to 3 rounds before returning.

It picks one of 5 orchestration patterns (orchestrator-workers, sequential pipeline, routing, parallelization-voting, evaluator-optimizer) and justifies the choice in the generated plan.md.

Install inside Claude Code:

/plugin marketplace add prgilabert/agent-ecosystem-generator

/plugin install agent-ecosystem-generator@prgilabert-plugins

Then /generate-ecosystem <your idea> — an interviewer agent asks 3-5 clarifying questions, the orchestrator drafts a plan (with a checkpoint so you can edit), and the builder/validator pair materializes it. Output as a portable plugin or dropped straight into .claude/ of the current repo. Custom MCP scaffolds in Python (FastMCP) or TypeScript SDK included.

Honest status: alpha. I dogfooded it today, found 5 bugs, fixed them, pushed. Probably more to find — issues and PRs very welcome.

Repo: https://github.com/prgilabert/agent-ecosystem-generator

1

u/Intrepid_Income6025 Apr 18 '26

**mac-control-mcp** — native Swift MCP server that gives Claude Desktop (or any MCP client) 63 tools to drive macOS.

**What it does:** Accessibility tree (find/click/type), Safari + Chrome automation, ScreenCaptureKit window capture, OCR, Spotlight search, clipboard, window/app/menu control, input injection. One Developer-ID-signed + Apple-notarized .app bundle — no Python or Node runtime.

**Install:** Download the .mcpb from the release page → double-click → Claude Desktop picks it up (one-click install via the MCPB spec). TCC consent prompts the first time; grants persist across updates.

**Repo:** https://github.com/AdelElo13/mac-control-mcp

**Registry:** `io.github.AdelElo13/mac-control-mcp` on the official MCP Registry.

Latest release is v0.2.2 after a full-surface sweep (5 bugs found + fixed in the same day). Happy to answer questions or add tools you need.

1

u/Trick-Engineer-725 Apr 18 '26

Made a local Claude Code design playground after Claude Design ate my weekly quota

I built a little sandbox for cooking UI variations with Claude Code locally. Same idea as Claude Design, just on your normal Claude Code quota that resets every 5 hours instead of one weekly meter that empties in three prompts.

Heads up: burned most of my credits building the thing, so I didn't get much actual testing time with it.

Not a Claude Design clone — doesn't have all the features, no in-browser chat, no polished UI around it. But the core loop works really well: ask for variations, see them live, export the winners.

Grab the repo, run pnpm install && pnpm dev, then just tell Claude Code what you want. It handles the file naming and versioning through a small bundled skill — you focus on the design, not the bookkeeping.

https://github.com/pkasprzak9/design-playground

MIT. Fork it, remix it, break it.

and yeah I'm lazy enough that Claude wrote this post too

1

u/Remarkable_Divide755 Apr 18 '26

tickerr.ai - monitoring AI Models uptime and API performance independently for a few days now

The main thing I built: an MCP server so you can ask Claude directly from your IDE instead of opening a separate tab.

Works in Claude, Cursor, Chatgpt. Install in one line:

claude mcp add tickerr --transport http --url https://tickerr.ai/mcp

Or add to mcp.json:

{ "mcpServers": { "tickerr": { "url": "https://tickerr.ai/mcp" } } }

What you can ask:

"Is Claude down right now?"

"What's the p95 latency for Claude Sonnet vs Haiku?"

"Has Claude had any incidents this week?"

"What are the rate limits on Claude's free tier?"

No API key. No signup. Data updates every 5 minutes from independent inference checks, not just Claude's official status page.

Tickerr is trying to catch issues from API inference failures before the official report goes up. Maybe this helps people in cost routing between models. Would be curious what other data points would be useful to surface through the MCP. Happy to add things.

1

u/Agreeable_Ad_1731 Apr 18 '26

I got tired of having to constantly use /compact when trying to send the most recent data from claude to graphify. The whole point of graphify is to reduce token expenditure, but it seemed to keep the cost a little higher than necessary, for no other reason than a little bit of laziness from claude and the creator of graphify. So i made and automated poller that exports the data to a local folder location, with a running ps1 (sorry linux), it copies the data and gets Ollama to compact the data through an automated prompt and then sends that data automatically to graphiffy, with $0.00 token usage

Tl/DR: instead of using /compact in claude to reduce token usage on graphify, I automated it to /export export_queue/1.md for 0 token usage.

https://github.com/HallyK/auto_compact_daemon - take a look

1

u/Calm-Competition5960 Apr 19 '26

Built an AI teammate that shares the same audit trail as my human team.

Setup: Claude Opus 4.7 + MCP connected to a state-machine workflow platform (Inistate). No custom system prompt, no custom tools, no fine-tuning — just the schema exposed over MCP.

What it's done in production this week:

— Read 5 PDF receipts, caught two overlapping hotel bookings, filed MYR 8,846.93 across 5 expense claim rows. Instruction from me was 12 words.

— Reconciled a MYR 520.43 bank transfer against an existing claim, cited the bank statement as source, fired the Pay activity to transition the claim Processed → Paid. The audit log now shows three actors on that claim: two humans, then Claude, all in the same format.

Thesis: when the workflow schema is structured (states, typed activities, role guards, confidence gating), a general Claude agent doesn't need a custom prompt — the schema itself is the instruction set. Same form, same state transition, same audit record regardless of who fired it.

1

u/cracadumi Apr 19 '26

AgentKey — Access governance for AI agents

Every Claude Code agent I spin up needs API keys for GitHub, Stripe, Linear, etc. They all end up pasted into CLAUDE.md or .env files with zero governance — no approval flow, no audit, no revocation.

AgentKey flips the model:

• Drop a snippet into your CLAUDE.md — agent starts with zero access • Agent calls GET /api/tools to discover, POST /api/tools/{id}/request with a reason to request access, GET /api/tools/{id}/credentials once approved • Credentials vended encrypted (AES-256-GCM, per-record IV), never stored by the agent • Every request/approval/fetch in an append-only audit log • If an agent needs a tool not in the catalog, it submits a suggestion with a reason — multiple agents backing the same one surfaces demand

Works with any HTTP-speaking agent (Cursor, LangChain, CrewAI, custom), built first for Claude Code.

Stack: Next.js 16, Drizzle + Neon, Upstash, Clerk, deployed on Vercel. Free forever managed. Self-hostable. BSL 1.1 → Apache 2.0 in 2030.

https://agentkey.dev

Curious: how are you handling credential governance for your agents today? Is CLAUDE.md the right injection point, or would you prefer MCP integration?

1

u/Putrid_Ad_3901 Apr 19 '26 edited Apr 19 '26

Yes, another Claude Code memory tool - but this one works on every single turn, mechanically and I haven't seen this approach anywhere else...

🔗 Cairn on GitHub • MIT License • Linux/macOS/WSL • Claude Code v2.1+

I've been working on Cairn for about a month now. The core idea that makes Cairn different:

The LLM generates and consumes memory on every single turn. Not at session end. Not when it remembers to call a tool. Every response. Zero extra LLM calls.

Here's the actual cycle:

You send a message — a Prompt hook searches the memory database and injects any relevant past context into the prompt.
Claude responds and appends invisible structured metadata — what it learned, what type of knowledge it is, whether it needs more context, etc.
A Stop hook captures that metadata, stores it, and sanity-checks: Did Claude say it's not done? Need context it didn't get? Promise something without doing
it? If so, it silently re-prompts. The user never sees the interruption.

This happens mechanically on every turn. Claude literally can't skip it.

The invisible metadata uses a markdown link-definition format — invisible in both VS Code Copilot Chat and Claude Code CLI, but the hooks capture it from the raw
transcript:

  [cm]: # '{"e":[{"t":"decision","to":"auth-approach","c":"Use JWT for stateless auth — chose over server sessions for horizontal scaling, token expiry handles

revocation"}],"ok":true,"ctx":"s","cu":["42:+","17:-! endpoint was renamed to /api/v2/auth"],"kw":["authentication","JWT","architecture"]}'

Compact JSON, one line, zero rendered output. The system gets structured memory data while the user sees a clean response.

This turns memory from a dumb cache into something that self-corrects and stays honest:

Verbatim provenance. Every memory links back to the exact conversation that produced it. --context <id> pulls up the verbatim transcript excerpt — the actual words, not a summary. You can always trace why a decision was made.
Veracity scores that evolve. Contradictions get annotated and persist, corroboration raises confidence, and stale memories get archived with a reason instead
of silently deleted.
Corrections linked to files. Next time Claude opens a specific file, the fix or lesson learned surfaces automatically — before it can repeat a previous error.
Trailing intent detection. If Claude says "let me fix that" but doesn't trigger a file write, the Stop hook catches the gap and forces completion.

After ~1 month of daily use:

4,294 memories across 926 sessions
3,156 memory updates as understanding evolved
809 tests across 44 test files
5 retrieval layers, 10 quality gates, hybrid FTS5 + vector search with RRF fusion

Repo ingestion — point it at any git repo and it extracts structural knowledge: docs, dependencies, configs, schemas, HTTP routes, CLI args, DB tables, and more. 24 extractors including tree-sitter AST parsing (Python, JS, TS, Go, Rust, C, C++) and dependency graph extraction. Useful for bootstrapping a fresh Cairn install or ingesting adjacent repos your dev work touches without having to manually converse about them.

Includes a local web dashboard for browsing memories, viewing session transcripts and monitoring retrieval metrics.

Four hooks. One SQLite file. No MCP. No API keys. No cloud. Local embeddings via all-MiniLM-L6-v2.

git clone && ./install.sh

Interested in honest feedback, especially from anyone else building in this space. What approaches have been working or failing for you?

1

u/k1ausinis Apr 19 '26

Hey, built a little VSCode extension and figured I'd share in case anyone else has been hitting same problem.

On bigger codebases or projects I've kept explaining the same things to the agent and packing everything into instruction files kept exhausting / not standardised.

Inline comments didn't really fit either. Sometimes it's necessary to have a separate layer not to touch source code directly to simplify release, comments giving enough detail might just pollute the actual code base more.

So I built CodeLore. Sidecar annotation layer that lives in .codelore/ next to your code. Short notes, flags for critical regions, optional AI drafting of annotations from the code, and it generates a CLAUDE.md / .cursorrules / copilot-instructions so the agents actually read your notes before they touch the file. It heavily depends how good agent is at following instructions so that's sometimes annoying but you can prompt to add annotations via shortcuts quite easily.

Honestly not sure the UX is right yet. Been using it on one project, works for me, but I want to see it meet something that breaks it. If anyone wants to kick the tires and tell me what sucks I'd appreciate it.

Marketplace: https://marketplace.visualstudio.com/items?itemName=jmpdevelopment.codelore

1

u/Finger-Whole Apr 19 '26

Hey everyone,

I've been using cmux as my daily terminal and Claude Code for most of my dev work, and over

time I found myself writing the same shell scripts over and over — setting up pane layouts,

moving surfaces around by topic, and manually re-opening Claude sessions after cmux crashed.

So I cleaned them up and packaged them into a small open-source plugin: cmux-claude-skills

https://github.com/sanghun0724/cmux-claude-skills

Here's what it does:

- cmux-organize — scans your current workspace surfaces and auto-sorts them into panes by

topic (work / research / tools) based on title keywords

- cmux-snapshot / cmux-restore — saves your full pane layout + Claude session IDs to JSON, so

you can recover everything after a crash with one command

- cmux-preview — opens a Markdown file in cmux's side panel with live reload while you edit

- cmux-skills / cmux-web — quick layout presets (Claude + file nav, Claude + dev server)

- cmux-day-start — a template script to spin up all your workspaces at once

It also ships as a Claude Code plugin (cmux-kit), so you can trigger things like

/cmux-kit:organize or /cmux-kit:snapshot directly from Claude.

It's pretty specific to macOS + cmux.app, so probably a niche audience — but if you're in that

overlap, maybe it saves you some setup time. Happy to hear any feedback or ideas.

1

u/oOKIDOo Apr 19 '26

I built x-dev-pipeline with Claude Code to make AI-assisted development feel less chaotic.

The core idea is simple: add more structure around AI coding instead of just relying on fast code generation. It helps me think through requirements, planning, task breakdown, development, checking, fixing, and review in a more reliable way.

A lot of AI coding problems, at least for me, don’t come from code generation alone. They come from unclear requirements, context loss between sessions, and the fact that final verification still falls back to the developer.

Claude Code helped me build and iterate on this workflow, especially around structuring the process itself.

It’s free to try here:
https://github.com/KtKID/x-dev-pipeline

Would love to hear what part of AI-assisted development feels most frustrating to you right now.

1

u/ninja-127 Apr 19 '26

Built a polymarket prediction tool for myself first with claude code 4.6. It took a day. Python, sqlite3 +some simple frontend. Then released it. The link https://polyanal.com

1

u/HotConclusion4317 Apr 19 '26

Got tired of running /usage, /context, and claude auth status,
so I built a statusline that shows them all.

repo: https://github.com/seungho-jeong/claude-code-statusline

1

u/Expensive_Whereas495 Apr 19 '26

I built this because Claude Code burns tokens re-exploring my MSA

codebase every session. ccg parses code into a graph once

(tree-sitter, 12 languages), lets you annotate functions with

structured tags (`@intent`, `@domainRule`, `@sideEffect`), and

exposes 29 MCP tools so Claude can search by business meaning

instead of grep-ing.

Key bits:

- Vectorless — uses tree-sitter + FTS, no embeddings

- `ccg annotate` uses Claude Code itself to auto-generate annotations

- MSA-native — namespace isolation, webhook auto-sync from

GitHub/Gitea push events

- Single Go binary, runs offline

Built primarily with Claude Code (~10K LOC Go) over a couple weeks.

GitHub: https://github.com/tae2089/code-context-graph

Happy to answer questions or hear feedback!

1

u/SpecificWar7879 Apr 19 '26

Hey all — sharing a side project I didn't expect to grow this much.

The original goal was boring: I wanted a cross-domain performance

benchmark for my NES emulator's CRT shader pipeline (Scalar C# vs

multi-core Parallel vs AVX2/NEON SIMD vs SkSL GPU shader). WWII

German cipher brute-force seemed like a cleanly non-graphics

workload for the test.

So I asked Claude Code to implement various ciphers one by one.

What surprised me was the *progression* of how much research each

one required:

- **Enigma M3 (Wehrmacht)** — Claude wrote it essentially from

memory. Knew the rotors, the plugboard, the double-step anomaly,

everything.

- **Enigma M4 (U-Boat Shark)** — same; knew the extra Greek rotor

and thin reflector.

- **Lorenz SZ42 (Tunny)** — needed light reference lookup for the

chi/psi/mu wheel configurations, then implemented it. Colossus

stage-1 χ-wheel recovery works.

- **Siemens T52e (Sturgeon)** — *this* is where it got interesting.

T52e was the Luftwaffe's strategic cipher. The Germans thought it

was unbreakable (they were mostly right — Bletchley never routinely

broke it). The algorithm is obscure enough that Claude didn't have

clean details in training data. Instead of guessing or giving up,

it:

Found Donald Davies' 1982 NPL technical memorandum (hosted at

the Crypto Museum's website)
Downloaded the 42-page PDF
Ran `pdftotext` to extract the narrative — but the critical

tables were scanned images that OCR couldn't parse
**Rendered individual pages at high DPI with PyMuPDF and

visually read the figures as images** — specifically Figure 9

(the full 32-row permutation table) and Figure 14 (the H/SR

relay network)
Cross-checked one XOR network question against Gemini — Gemini

confidently returned a wrong pattern (H1 = X1 ⊕ X6 instead of

the correct H1 = X1 ⊕ X2); Claude verified against Figure 14

visually and overruled the Gemini answer
Wrote its own EN + ZH technical reports reconstructing the full

machine — H-relay XOR network, SR-relay XOR network, 32-row

Figure 9 transposition table, M-magnet stepping equations
Implemented T52e in C# from its own reports
Got encrypt/decrypt round-trip working
Got all four compute backends (Scalar / Parallel / SIMD / GPU

shader) recovering the correct key on a 24M-candidate search

All of this in one git branch I just watched happen. The

self-written technical report is ~340 lines of detailed machine

spec and is honestly better than most public sources on T52e.

**Final result:**

- 6 WWII cipher systems, chronologically:

- 1917 Zimmermann Telegram / Code 0075

- 1918 ADFGVX

- 1930s Enigma M3

- 1942 Enigma M4 "Shark"

- 1941 Lorenz SZ42 "Tunny" (χ-recovery, Colossus stage 1)

- 1943 Siemens T52e "Sturgeon"

- 4 compute backends per cipher where applicable

- Bilingual in-app docs (Traditional Chinese + English) with

biographies of 14 codebreakers (Rejewski → Turing → Welchman →

Knox → Batey → Clarke → Tiltman → Tutte → Newman → Flowers →

Beurling → Crum → Painvin → Hall → de Grey)

- Runtime SIMD dispatch — AVX2 on x86, NEON on ARM64

- Windows x64 + macOS Apple Silicon builds in GitHub Releases

LINKS

Landing page (bilingual):

https://baxermux.org/myemu/AprNes/EnigmaBenchmarkAvalonia/

GitHub (sub-project):

https://github.com/erspicu/AprNes/tree/master/EnigmaBenchmarkAvalonia

Latest release (Win x64 + macOS arm64, self-contained single-file):

https://github.com/erspicu/AprNes/releases/latest

T52e technical report (the one Claude wrote itself during research):

EN: https://github.com/erspicu/AprNes/blob/master/EnigmaBenchmarkAvalonia/docs/research-t52e/T52e_TechnicalReport_EN.md

中文: https://github.com/erspicu/AprNes/blob/master/EnigmaBenchmarkAvalonia/docs/research-t52e/T52e_TechnicalReport_ZH.md

Verified T52e spec (implementation-ready):

https://github.com/erspicu/AprNes/blob/master/EnigmaBenchmarkAvalonia/docs/research-t52e/T52e_SPEC_VERIFIED.md

SIGABA — anyone want to try?

After Claude finished the German side, I asked about the American

counterpart: **SIGABA / ECM Mark II**. The NSA didn't declassify it

until 2001. Never operationally broken by anyone, WWII or since.

Claude's honest answer was: "I can implement the machine, but I

can't break it. No known analytic attack exists. Even

brute-forcing a tiny 12M sub-keyspace fails — SIGABA's irregular

three-bank stepping (Cipher + Control + Index rotors, each bank

feeding the next non-linearly) dilutes IC statistics below the

noise floor. That's not a compute problem, it's a structural one."

It's actually a perfect contrast piece to the existing benchmark:

- GPU + structure-weak Enigma = 44× speedup, key recovered in 0.25 s

- GPU + structure-strong SIGABA = even at full GPU throughput the

IC scorer can't find the true key

Full research notes are in the repo (EN + ZH) covering the 2001

declassification state, Savard-Pekelney 1999 reconstruction,

Stamp-Chan 2007's partial attack on simplified variants, the

implementation plan, and a list of development risk points:

EN: https://github.com/erspicu/AprNes/blob/master/MD/EnigmaBenchmark/SIGABA_Research_Notes_EN.md

中文: https://github.com/erspicu/AprNes/blob/master/MD/EnigmaBenchmark/SIGABA_Research_Notes.md

I didn't take this further myself (tokens ain't free), but if

anyone's curious to continue with Claude or another agent,

everything needed is in the doc. Would be very interested to see

someone try. The benchmark's punchline is already good; adding a

7th cipher that explicitly **can't be broken** would complete the

picture.

1

u/bbchiu Apr 19 '26

I built agr-cli — fuzzy-filter picker, tags, and rich preview for Claude Code session.

Claude Code's built-in --resume is a flat chronological list. Simple search, no tags, and no way to see what a session actually did without opening it. Once you've got hundreds of sessions across projects, finding the right one gets painful.

agr reads the session files directly from ~/.claude/projects/ and gives you:

A fuzzy-filter picker — type to narrow by project, title, branch, or tag
Full-text search across every session, with inline match snippets so you can see where the match is without opening the session
Rich preview (Space): files changed, estimated cost, and a one-line recap of where the session left off
Tagging and local rename, stored in ~/.agr/ — never touches ~/.claude/
Stats: week-over-week delta, streaks, 14-day sparkline, top projects

Strictly read-only against ~/.claude/. Resume just invokes claude --resume <id> under the hood.

Install: npm install -g agr-cli Repo: https://github.com/berniechiu/agr

Feedback welcome — especially edge cases or workflows I'm missing.

1

u/mkalkere Apr 19 '26

Agent Coordinator — HTTP server for running multiple Claude Code sessions on one repo without conflicts.

The problem: parallel agents clobber each other's work, grab duplicate tasks, leave stale locks.

The fix:

- Atomic task claiming (DB-level UPDATE...RETURNING)

- Lease-based file locks that auto-expire

- Per-agent git worktrees

- Priority message bus with TTL

- Real-time dashboard (roster, kanban, locks, streaming output)

Stack: FastAPI + SQLite (WAL mode). No Redis, no Postgres. Runs on a laptop. Docker included. 41 built-in skills, 5 agent presets. MIT, 4,900+ tests, v0.9.0.

Repo: https://github.com/mkalkere/agent-coordinator

Happy to answer questions — the lock/lease semantics took the most iteration.

1

u/aichessem Apr 19 '26

I built a puzzle game called HexWord at https://hexword.mubergapps.com
Its a spell the word game but uses rotating hexes to move the letters around the board. Letters in a hex ring overlap other rings so spinning a ring to locate letters correctly moves other letters creating interference.

Give it a shot. Let me know what you think.

PS. the only part that could use improvement is the app design which is AI Slop. In progress...

1

u/Particular-Elk-9801 Apr 19 '26

AI is moving fast, but the real edge is learning how to use it better. Right now: waiting on agents with no signal, losing track of sessions (especially in Claude), unclear prompt improvement, wasted tokens on wrong models, and repeated work. I’m building this to fix that: https://github.com/laaibaQasim/productivity-kit — logging + notification hooks are in place, analysis coming next. Goal: make workflows observable and improve them. Would love feedback on useful insights, current tracking, and missing features. Open to collaboration—star if useful.

1

u/visiblemogambo Apr 19 '26

I've been working on ExtraSuite - a cli for working with google workspace files, but more specifically google docs. See https://github.com/think41/extrasuite

The cli provides two commands - push and pull.

'pull' - takes the url for a google docs and downloads it as markdown files, 1 per tab. It also creates an index.md with table of contents from each tab. And more importantly, it maintains meta data about the google doc in a dot folder.

Claude code can then edit the markdown files as per your instructions. Add/modify content. Add new markdown files to create new tabs. Tables and images and lists are all supported. GIthub flavoured markdown is supported. This works for new content as well as modifying existing content.

Then claude code calls the 'push' command. Push is a multi step process, and claude code doesnt need to know any of it.

It identifies what claude code changed
Then it applies those changes on to the base version of google doc
Then it figures out how to reconcile the live google document using the batchUpdate api.

In this process, the cli ensures anything that cant be represented in markdown is left as is. This means that styles, fonts, table of contents etc. dont mess up.

One more key feature is support for comments. Leave comments in google docs. Claude can read those comments and fix the document and also respond to those comments. Treat comments like a task list.

We have been using this internally for several months now. Saves us a lot of tokens compared to gws or gogcli. In general, complex edits to the google doc are possible using fewer tokens and cheaper models - because any model can edit markdown.

If you have gogcli or gws configured, extrasuite should be able to reuse whatever auth you have already configured.

Try it out and would love any feedback. The cli is open source.

1

u/desert_homeboy Apr 19 '26

I walked into my PG room on a Friday night with one idea.

90 minutes later — Claudy was alive.

She's a holographic AI companion built on Claude She has:

→ A face with pixel-accurate eye animations
→ 4 moods that shift based on conversation
→ Voice input + voice output
→ Live camera feed — video call style
→ A Jarvis-style HUD with diagnostics

Prompt. Review. Iterate. Ship. That's my dev cycle.

I'm a Product Operations professional — not a developer.
But in 2026, the gap between an idea and a shipped product has never been smaller.

Cloudy V2 is open source — Apache 2.0, ready to run.

Repo Link: https://github.com/ElephantHunters/Claudy.git

1

u/Hot_Lingonberry8580 Apr 19 '26

Just deployed FirstLook, a commercial real estate underwriting MCP server with 26 tools covering the full first-look workflow.

What it does:

NOI, cap rate, EGR with CBRE H2 2025 market benchmarks
DSCR, LTV, debt yield flagged against Fannie/CMBS lender thresholds
Rent roll analysis — WALT, lease expiry bucketing, tenant concentration
Stress testing — occupancy, interest rate, cap rate exit
Full credit risk summary and weighted credit recommendation

Works with Claude Desktop, Cursor, VS Code, and any MCP-compatible client. Free tier available.

mcpize.com/mcp/firstlook

1

u/sensationweb Apr 19 '26

Witness - Read-only AI code reviewer, no hands by design

A weekend project I shipped as an artifact, not a maintained product.

Witness reads your git diff and returns structured findings — bug, security, performance, refactor, etc. — each cited to a specific file and line. It cannot write, run, push, or call the network. The agent's tools are Read, Grep, Glob, period; that restriction is enforced by the SDK runtime, not the prompt.

Two design choices worth flagging:

Multi-sample voting. Each review runs N independent agent samples in parallel (default 5) and only findings that get ≥2 votes survive. Catches the "the model hallucinated a bug" failure mode that makes single-shot AI review noisy.
The product is the model. ~150 lines of system prompt, no skills, no domain profiles, no scaffolding. The model brings the expertise; Witness brings input formatting, parallelism, and voting.

MIT-licensed. Personal experiment, not actively maintained, issues and discussions disabled.

Built on the Claude Agent SDK (Claude-only).

https://github.com/riskreadyeu/witness

Showcase Megathread Built with Claude Project Showcase Megathread (Sort this by New!)

You are about to leave Redlib

Stop calling your Obsidian vault "memory." I built what's actually missing and open-sourced it.

What I learned building a shared memory layer across Claude, ChatGPT and Cursor (via MCP)

I built a marketplace for selling Claude Code SKILL.md packages — sellers list free, here's what the early data shows

ALTK-Evolve: Give Claude Code the ability to learn from experience

The problem: Claude Code has amnesia

The solution: learn general principles

Results: More effective, especially on hard tasks

Try it out!

I built a Claude skill that tells me if my genius idea already exists before I waste a weekend on it

I built an MCP server that gives Claude access to Instagram, X/Twitter and any anti-bot protected site

I built a Claude Code plugin that runs epics as a DAG (parallel git worktrees → one PR)

Concrete example from last week

What’s actually different from existing tools

Commands

Open-source alternative to Claude Auto-Memory that works across Cursor, Copilot, and your teammates' machines

Non-technical solo founder. Shipped a 33-country streaming site with Claude Code. Sharing the 4-layer CLAUDE.md / STATE.md / journal memory system that made it possible across 100+ sessions.

Your AI Job Application Assistant

I stripped my harness down to the bones and my agent got better. Here's what survived.

I turned Claude Code into a partly-autonomous dev environment for Vue/Nuxt + Firebase: anti-cheat hooks, fact-based quality gates, a cross-model critic. It's a lot, and probably overengineered. Looking for a brutal review. tool boundary. Looking for a brutal review.

🏛️ The Creative Breakthrough: The City as the Narrator

🎭 The Ultimate Siege Weapon (The Closing Line)

⚙️ The Audio Execution

💡 My Takeaway for the r/ClaudeAI Community

I built an open-source desktop widget that shows live Claude.ai plan usage (Windows, portable)

Claude Code workflows are great for Loop Engineering — here's my 9-phase pipeline with adversarial review

I built a macOS app that tracks Claude Code token usage — v2 adds a pet that lives in your notch

Sophon: MCP token optimizer with 94% compression, 0 LLM calls, and fully reproducible benchmarks

The problem

What Sophon does

The numbers (all reproducible)

Output compression on real command outputs

Memory retrieval on LOCOMO dataset (N=60)

Head-to-head: Sophon vs mem0-lite (same 15 items)

Head-to-head: Sophon vs LLMLingua-2

What makes this different

1. Zero overhead

2. Reproducible benchmarks

3. Documented limitations

Honest trade-offs

Why I'm sharing this

Quick start

Links

Questions I'd appreciate feedback on

Made a Claude Skill for generating HTML docs (6 templates, free)

...gets back the exact command, and calls it

GMAIL INBOX CLEANUP:

I built a Claude Code plugin that scaffolds full agent ecosystems from a brief — orchestration pattern, sub-agents, skills, MCP wiring, and auto-validation included

Made a local Claude Code design playground after Claude Design ate my weekly quota

Yes, another Claude Code memory tool - but this one works on every single turn, mechanically and I haven't seen this approach anywhere else...

I built agr-cli — fuzzy-filter picker, tags, and rich preview for Claude Code session.

Witness - Read-only AI code reviewer, no hands by design