Zero-Shot Log

Claude Code Routines × GitHub: Pitfalls and Workarounds

Sun, 26 Apr 2026 12:00:00 GMT

TL;DR

The GitHub MCP's create_or_update_file triggers a Stream idle timeout when saving large files (noticeable above ~11,500 bytes).
The sandbox is intentionally locked down, so git push over Bash isn't a viable fallback. You have to solve the problem inside the MCP boundary.

Background

The information in this article reflects the state of Claude Code Routines as of April 2026. It's still in research preview, so behavior may change.

Released in April 2026 as a research preview, Claude Code Routines runs prompts on Claude.ai automatically — triggered by schedules, API calls, or GitHub events (available on Pro / Max / Team / Enterprise plans).

GitHub integration uses the account-level GitHub connection on Claude.ai (either via the GitHub App or a gh token sync through /web-setup). The "Connectors" section in the Routine creation form lists services like Slack and Linear, but GitHub isn't there. Once your account is connected to GitHub, adding a repository to a Routine lets that Routine read and write files in the repo.

When a Routine commits files to a GitHub repository, in my environment it used the MCP tool mcp__github__create_or_update_file. "Auto-commit the daily output to GitHub" sounds like an obvious fit — but in practice I ran into unexpected limits with that MCP call. This post shares the actual blockers and the workarounds I landed on.

Issue 1: MCP Timeout — Large Files Won't Save

Symptom

I tried to save a Routine-generated report (~30KB Markdown) via mcp__github__create_or_update_file and hit:

API Error: Stream idle timeout - partial response received

Retrying didn't help. Shrinking the file content made it work, so the cause was clearly file-size related.

Investigation: GitHub API limit?

My first guess was a GitHub REST API file size limit. But the GitHub Contents API allows up to 100MB, so a ~30KB text file shouldn't be anywhere near the cap.

GitHub Contents API: 100MB max file size
Actual file: ~30KB
→ Not the API's limit

Investigation: Claude.ai platform timeout?

Next guess: a timeout on the Claude.ai side of the MCP call.

Claude Code CLI exposes an MCP_TIMEOUT env var, but that mostly governs MCP server startup — it's unclear whether it controls per-tool-call timeouts. Either way, the Routines cloud runtime gives users no way to set this value.

Findings

Given the error message (Stream idle timeout) and its correlation with file size, the most likely cause is a per-MCP-tool-call timeout.

create_or_update_file Base64-encodes the content and ships it to the GitHub API.
Larger payloads take longer to process; the response likely doesn't arrive before the call times out.
There's no user-facing way to adjust per-tool MCP timeouts.
In my measurements, calls under ~11,500 bytes were stable; from ~13,000–15,000 bytes upward, timeouts kicked in.

Approximate threshold from my own runs:

File size	Result
~5,000 bytes	Stable success
~8,500 bytes	Stable success
~11,500 bytes	Success (near the ceiling)
~15,000+ bytes	Timeout

Note: These thresholds shift based on network conditions and server load. As a practical safety margin, keeping each file under 10,000 bytes worked reliably for me.

Can `git push` over Bash help?

I also checked whether I could bypass the MCP tool by saving files via Bash. Routines have a code execution capability, so a git push from a shell sounded plausible.

But inspecting the sandbox: no GITHUB_TOKEN, no GH_TOKEN env var, no gh CLI, no SSH keys, no ~/.netrc. There's simply no GitHub auth surface inside the shell.

This is the right design from a security standpoint. Anthropic's docs explicitly state that git credentials and signing keys are not placed inside the sandbox; GitHub operations are routed through a secure proxy with scoped credentials. By keeping raw git push credentials away from the LLM, the risk of unintended repo operations is contained.

So Bash-based git push is not available as a fallback for the timeout issue. You have to solve it inside the MCP boundary.

Workarounds

Avoiding the Timeout: Chunked File Saves

To work around the MCP timeout, the approach I settled on is splitting content into chunks of <= 10,000 bytes and saving each piece.

Splitting rules in the prompt

I encode the splitting rules directly into the Routine's prompt:

### File size limit and save procedure

Due to MCP timeout limits, a single `create_or_update_file` call can save
**up to ~10,000 bytes** of content.

If the output exceeds 10,000 bytes:
1. Split the content into sections (each part <= 10,000 bytes).
2. Call `create_or_update_file` for Part 1.
3. Wait for success, then save Part 2 (no parallel calls).
4. After all parts are saved, append links to each part at the end of Part 1.

If a timeout occurs:
→ Split into smaller chunks and retry (target <= 8,000 bytes per part).

The key is to explicitly forbid parallel calls. LLMs love to parallelize tool calls for efficiency, but here that's counterproductive.

For navigation between parts, appending a link list to Part 1 makes the report much easier to browse on GitHub. To update Part 1 later, pass the SHA returned in Part 1's creation response back into create_or_update_file.

Trade-offs of chunked saves

This approach has clear downsides.

Worse readability on GitHub: One report becomes multiple files; readers have to jump between them.
More API calls: A 3-way split needs at least 3 MCP calls (+1 to append the link list). Routine execution time grows.
More complex prompt: Splitting logic clutters the prompt with concerns unrelated to the actual task.

I still chose this approach because nothing else was practical. Bash-based git push is blocked by the sandbox design, and per-tool-call timeouts can't be tuned. By elimination, file splitting was the most reliable workaround.

Side note: Watch out for timezone drift

It's no surprise that the cloud runtime is UTC, but because Routine schedules are configured in JST, it's easy to assume internal date logic will also be JST.

In practice, when an LLM determines "today's date," it sometimes uses UTC. Run at JST 4/25 07:00 — that's UTC 4/24 22:00, so date-based file names and commit messages can land on the previous day.

A timezone directive at the top of the prompt prevents this:

**⚠️ Timezone: All date/time logic must use JST (UTC+9).**
Dates, weekday checks, file names, and commit messages all use JST.
If the runtime is UTC, add 9 hours before deciding.

Summary

The main limits I hit when using the GitHub MCP from Claude Code Routines:

Issue	Cause	Workaround
Large files fail to save	Per-MCP-tool-call timeout	Split into chunks <= 10,000 bytes
Bash-based `git push` unusable	Sandbox security design	Stay within the MCP boundary
Date/weekday drift	Runtime is UTC	Force JST in the prompt

Given that Claude Code Routines is still in research preview, these limits will likely improve over time. In particular, exposing per-tool MCP timeouts as a configurable value would close the most painful gap, and is plausible if enough users surface the need.

For now, the most practical approach is to understand the limits and work around them at the prompt level. Hopefully this saves someone hitting the same wall.

References

Enabling and Using Codex's experimental Hooks

Sun, 22 Mar 2026 12:00:00 GMT

TL;DR

Codex ships an "under development" Hooks feature that you can enable with codex features enable codex_hooks.
It supports three events: SessionStart, UserPromptSubmit, and Stop.

Background

This article was verified against Codex v0.115.0 (March 2026). Since this is an in-development feature, behavior may change in future versions.

Codex doesn't officially have a Hooks feature yet. The official docs say nothing about it, and the Changelog only mentions a one-liner: "experimental hooks engine."

But it turns out a working Hooks implementation is already shipping under the "under development" status. Running codex features list revealed a codex_hooks feature flag, and once enabled, hooks actually fire.

The trouble is that the config JSON structure and the list of supported events are documented nowhere — so I read the Rust source in the OSS openai/codex repository (Apache License 2.0), specifically codex-rs/hooks/, to nail down the spec. This post summarizes what I found.

1. Discovery and enablement

Checking the feature flag

Codex has a codex features list command that prints the feature flag list. That's how I learned about Hooks in the first place: I ran it and saw codex_hooks listed as "under development."

$ codex features list
codex_hooks    under development  false

Disabled by default.

Enable it

# persisted to config.toml
codex features enable codex_hooks

Once enabled, you'll see this warning at startup:

⚠ Under-development features enabled: codex_hooks. Under-development features are incomplete
  and may behave unpredictably.

2. Configuration file

Path and format

Item	Value
Global config	`~/.codex/hooks.json`
Project config	`<project>/.codex/hooks.json`
Format	JSON

Codex's general config lives in ~/.codex/config.toml (TOML), but Hooks config is a separate hooks.json file in JSON. Writing hooks into config.toml won't be picked up.

~/.codex/
├── config.toml      ← general settings & feature flags
└── hooks.json       ← hooks config (JSON)

JSON structure

The structure is a bit unusual: instead of putting handlers directly under each event name, you need an extra wrapper layer. In the source (codex-rs/hooks/src/engine/config.rs), it's defined as a MatcherGroup struct with matcher (a regex filter) and hooks (an array of handlers).

hooks.<EventName>[].matcher  → filter condition (optional)
hooks.<EventName>[].hooks[]  → array of handlers to run

A concrete JSON looks like this:

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "echo 'session started'",
            "timeout": 10
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "echo 'session stopped'",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

timeout is optional and defaults to 600 seconds.

What doesn't work

If you don't know about the nested structure and put handlers directly under the event array, nothing fires:

{
  "hooks": {
    "SessionStart": [
      {
        "type": "command",
        "command": "echo 'session started'"
      }
    ]
  }
}

I actually tried this form first, hit silence, and only realized the nesting was required after reading the source. With no official docs, the source is the only accurate reference.

3. Supported events

The three supported events

Event	When it fires
`SessionStart`	When a session starts or resumes
`UserPromptSubmit`	When the user submits a prompt
`Stop`	When the agent finishes responding

Tool execution and subagent-related events aren't implemented yet.

Handler types

A "handler type" specifies what to actually run when a hook fires. The source (codex-rs/hooks/src/engine/config.rs) defines three types — command, prompt, and agent — but only command (run a shell command) actually works today.

Type	Behavior	Status
`command`	Runs a shell command	Works
`prompt`	Injects a prompt into the agent's context	Not implemented (logs `"skipping prompt hook"`)
`agent`	Spawns another agent to handle the work	Not implemented

async: true (asynchronous execution) is also defined but not implemented. Once prompt lands, you could "inject extra instructions on every prompt submit"; with agent, you could "kick off a review agent after each response." For now, though, the only option is "event fires → shell command runs."

JSON payload passed to stdin

Hook commands run via $SHELL -l -c "<command>", and a JSON payload is piped to stdin.

{
  "session_id": "019d07b0-...",
  "cwd": "/Users/user/project",
  "hook_event_name": "SessionStart",
  "model": "o3-pro",
  "permission_mode": "default",
  "source": "startup",
  "transcript_path": null
}

Field	Description
`session_id`	Session ID (UUID v7)
`cwd`	Working directory
`hook_event_name`	Event name
`model`	Model in use
`permission_mode`	Approval policy
`source`	Start type (SessionStart only)
`transcript_path`	Path to the conversation history file (currently null)

Some fields are added per-event:

Field	Event	Description
`source`	SessionStart	`startup` / `resume` / `clear`
`prompt`	UserPromptSubmit	The user's input text
`turn_id`	UserPromptSubmit / Stop	Conversation turn identifier
`last_assistant_message`	Stop	The last assistant response
`stop_hook_active`	Stop	Whether a Stop hook is active

session_id, cwd, hook_event_name, model, and permission_mode are common to all events.

4. Use case: unified management of multiple agent sessions

I built a desktop app that uses AI coding agent Hooks to centrally track the state of multiple sessions. It listens for SessionStart / Stop events and updates a session list view.

Now that Codex supports Hooks too, I extended the same app to manage Codex sessions alongside the others.

What I configure in hooks.json

When you write events and commands into hooks.json, Codex runs the commands when those events fire. Conversely, if hooks.json is empty or missing, nothing happens — even with the feature flag on.

In the example below, every event does the same thing: read the JSON payload from stdin via $(cat) and forward it as-is to my app's API with curl.

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "curl -s -X POST http://localhost:3000/api/hook -H 'Content-Type: application/json' -d \"$(cat)\"",
            "timeout": 10
          }
        ]
      }
    ],
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "curl -s -X POST http://localhost:3000/api/hook -H 'Content-Type: application/json' -d \"$(cat)\"",
            "timeout": 10
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "curl -s -X POST http://localhost:3000/api/hook -H 'Content-Type: application/json' -d \"$(cat)\"",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

The receiving app inspects session_id and hook_event_name in the JSON to update session state. Because this pattern is shared across other tools' Hooks, a single API endpoint can centralize multiple agent tools together.

5. Limitations and outlook

Current limitations

Under-development status: The API may change.
Coarse event granularity: There's no per-tool-execution hook, so you can't track real-time progress mid-task.
command type only: prompt, agent, and async handler types aren't supported.
No official docs: You have to read the source to learn the actual spec.

The OSS upside

For all those limitations, Codex is OSS under Apache License 2.0, which means you can verify the Hooks implementation directly against the source. Everything in this post came from reading codex-rs/hooks/src/. Being able to fall back to the source when the spec is unclear is a real comfort.

What's next

The Discussion #2150 thread on GitHub has lots of community comments requesting Hooks features, which signals strong interest. I'd expect tool events and additional handler types to land over time.

Summary

Codex Hooks is still in development with limited functionality, but flipping the feature flag gets you working hooks today. For use cases like detecting session start/end and running commands, it's already useful enough.

References

Codex Changelog - announcement of the hooks engine
openai/codex - GitHub - source code (Apache License 2.0)
Discussion #2150: Hook would be a great feature - community request thread

Hands-on with Claude Code's /loop Feature

Wed, 11 Mar 2026 12:00:00 GMT

TL;DR

Claude Code's /loop lets you define recurring jobs in natural language.
Great for lightweight automation in your dev flow — periodic API checks, issue monitoring, etc.
Recurring runs pause during auto-compact, so design intervals with context consumption in mind.

Background

Claude Code's /loop feature is rolling out gradually. It became available on my account, so I gave it a spin.

1. /loop basics

Syntax

/loop [interval] <prompt>

If you omit the interval, the default is 10 minutes.

Three ways to specify the interval

# Pattern 1: interval at the start
/loop 5m check git status

# Pattern 2: trailing "every" at the end
/loop check the deploy status every 20m

# Pattern 3: no interval (defaults to 10 minutes)
/loop check the test results

Allowed units are s (seconds), m (minutes), h (hours), and d (days). Cron's minimum granularity is one minute, though, so seconds get rounded up.

What happens under the hood

When you run /loop, three tools fire behind the scenes:

Tool	Role
`CronCreate`	Create a job
`CronList`	List jobs
`CronDelete`	Delete a job

You don't actually need to know these tool names — natural language is enough.

"Any jobs running right now?"  → calls CronList
"Stop the loop."               → calls CronDelete

This part of the experience is genuinely good. The fact that you can manage jobs in natural language alone makes it worth trying.

2. Practical use cases

Use case 1: Periodic API endpoint checks

Periodically poll an external API and have Claude report any changes from the previous run.

/loop 5m check https://api.example.com/v1/status and report any change since last time

Good fit for status monitoring or detecting changes in response payloads — anything along the lines of a simple API check.

Use case 2: GitHub issue monitoring

/loop 5m run `gh issue list --label "bug"` to check for new issues,
and if any show up, analyze them and propose a response plan

This pairs well with issue-driven dev flows. You could imagine, for example, automatically analyzing a new issue and spinning up a branch for it.

3. Auto-compact and interval design

Things to be aware of

When you run information-gathering jobs through /loop, results pile up in context with each iteration. Claude Code runs auto-compact when the conversation history gets long, summarizing and compressing older content (you can also trigger it manually with /compact).

Auto-compact happens during normal Claude Code use too, but combining it with /loop introduces a few quirks:

Processing is blocked during auto-compact, so loop intervals drift while it runs.
Cron events that pile up during auto-compact may all fire at once after it finishes.

Information-gathering jobs in particular consume a lot of context, so running them at short intervals tends to trigger auto-compact frequently. I don't think /loop is meant for super-precise timing anyway, but it's worth knowing when you design intervals.

Interval guidelines

Use case	Suggested interval	Reason
Lightweight checks (e.g. `git fetch`)	3–5 min	Low context consumption
API response monitoring	5–10 min	Depends on response size
Test runs	10–30 min	Long execution time and large output

When tuning intervals, think not just about "how often do I want to check," but also "how much context does each check consume."

You can also tell the prompt to "if there's no change, just say 'no change'" to keep context consumption down.

/loop 5m check the site, and if nothing changed just say "no change"

4. Combining with Hooks

Problem: cron triggers and manual input look identical

Wanting to combine Hooks with /loop so that "only cron-triggered runs trigger a specific action" is a natural thought. But right now, the UserPromptSubmit event payload has no field indicating the trigger source.

{
  "session_id": "abc123",
  "hook_event_name": "UserPromptSubmit",
  "prompt": "the submitted text"
  // no field like trigger_source
}

The Hook can't tell whether cron fired automatically or whether the user typed something in by hand.

Workaround: prefix convention

You can probably distinguish them by prefixing the prompt.

/loop 5m [CRON] run git fetch and analyze any new issues

The Hook then branches on the presence of [CRON].

# check_cron.py
import json, sys

data = json.load(sys.stdin)
prompt = data.get("prompt", "")

if "[CRON]" in prompt:
    # handle cron-triggered case
    print("Cron-triggered prompt detected", file=sys.stderr)
else:
    # handle manual input case
    pass

Until an official trigger-source field shows up, this kind of workaround seems to be how you handle it.

5. Constraints and where it fits

`/loop` constraints

Constraint	Detail
Session-bound	Closing the session (terminal) stops every job
3-day expiry	Jobs are auto-deleted 3 days after creation
Approval prompts	Destructive operations like `git push` still surface a confirmation dialog
Interval precision	Cron-based, minimum 1-minute granularity, plus runtime delay
Concurrent limit	Up to 50 jobs per session

How it compares to GitHub Actions and friends

Given those constraints, /loop feels best suited to lightweight, ephemeral automation.

Use case	/loop	GitHub Actions / traditional cron
Watch a situation for a few hours	Good fit	Overkill
Monitor a PR through to merge	Good fit	Setup overhead
Continuous production monitoring	Bad fit	The right tool
Automation shared across the team	Bad fit	The right tool

For one-off automation needs inside my personal dev flow, the appeal is being able to wrap things up without standing up an external CI/CD pipeline.

Summary

What I like about `/loop`

Easy to use: create and manage jobs in natural language.
Flexible: works across API checks, issue monitoring, test runs, and more.
Self-contained: no external tools needed; you can even automate GitHub operations.

Even just messing around for a bit, the convenience of "spinning up a quick automation in seconds" was real. Designing intervals together with context consumption seems to be the right mindset.

References

Claude Code Agent Teams: Reusing Existing Skill and Agent Knowledge

Fri, 13 Feb 2026 12:00:00 GMT

TL;DR

Agent Teams runs multiple Claude Code instances as a coordinating team. The architecture is different from traditional subagents (the Task tool).
To reuse existing skills or agent definitions inside Agent Teams, you have to spell out file paths in the prompt and tell teammates to read them — there's no structural way to wire them in yet.
Token consumption is several times higher than the skill approach. Use skills for work where skills are enough; reach for Agent Teams only when parallel execution clearly pays off.

What is Agent Teams?

Released as an experimental preview on February 5, 2026 — alongside Claude Opus 4.6. Multiple Claude Code instances run in parallel as a "team."

Claude Code's extension surface has several layers:

Skills (Skill tool)
  └── Expanded and executed inside the main session
       └── May invoke the Task tool internally

Subagents (Task tool)
  └── Spawned as independent instances
  └── subagent_type lets you point at a custom agent definition

Agent Teams (TeamCreate + SendMessage + TaskList, etc.)
  └── Spawning a teammate = Task tool + inter-team messaging + shared task list

Traditional subagents are independent instances spawned via the Task tool — a hub-and-spoke shape from parent to children. Children can't talk to each other; they only return results to the parent. The Task tool ships with four built-in types (Bash / general-purpose / Explore / Plan), and subagent_type lets you point at a custom agent definition (.claude/agents/*.md); the knowledge baked into that definition is loaded automatically.

Agent Teams is an orchestration layer on top of the Task tool that adds inter-team messaging and a shared task list. The big difference: teammates can send messages directly to each other.

[Subagent model]
  Parent agent
    ├── Task → Child A (returns result to parent)
    ├── Task → Child B (returns result to parent)
    └── Task → Child C (returns result to parent)
  * Children can't talk to each other. subagent_type lets you specify a definition.

[Agent Teams model]
  Team Lead
    ├── Teammate A ←→ Teammate B
    ├── Teammate B ←→ Teammate C
    └── Teammate A ←→ Teammate C
  * Teammates can message each other directly.

Note that teammates are themselves independent Claude Code instances, so in principle they should be able to invoke subagents via the Task tool (the documented restriction is only "no nested teams" — using the Task tool by itself isn't restricted). In practice, though, this didn't work for me. If it did, existing custom agent definitions could be reused as-is, so I'd love to see this fixed.

Here's a screenshot of seven reviewers running in parallel across tmux split panes:

The team has a shared task list with state and dependency management. When a blocker is cleared, an idle teammate autonomously claims the next task. There's no file-level locking, though, so concurrent writes to the same file need attention.

Enabling and using it

Enable it in settings.json:

// ~/.claude/settings.json
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  },
  "teammateMode": "auto"
}

"auto" for teammateMode picks split panes when running inside tmux, and in-process mode (toggle with Shift+Up/Down) elsewhere. You drive it in plain natural language:

Set up a team to review PR #42 in this project.
Spawn three reviewers:
- Security
- Performance
- Test coverage
Have each one review and report results.

As a baseline, Max 5x ($100/month) or higher is recommended. The Pro plan ($20/month) hits limits quickly.

Reusing Existing Knowledge: The Current State of Natural-Language Prompts

There's something to be aware of when using Agent Teams.

Claude Code already has extension mechanisms like skills (SKILL.md) and custom agents (.claude/agents/*.md). In a single session they're loaded automatically, and Task-tool subagents can invoke them simply by naming them in subagent_type.

But Agent Teams currently provides no structural way to tell a teammate "use this skill" or "run with this agent definition." If you want a teammate to use an existing definition file, you have to embed the file path in the prompt and tell them, in natural language, to read it.

You have to spell out paths in detail

Say you've curated code-review agent definitions under ~/.claude/agents/. To use these from Agent Teams, you have to write out the directory structure and file paths in the prompt and tell each teammate "which file to read and how to use it."

### Directory structure

~/.claude/
├── agents/                       # Review perspective definitions
│   ├── review-architecture.md
│   ├── review-naming.md
│   └── review-frontend.md
└── knowledge/                    # Reference knowledge
    ├── architecture/
    │   ├── patterns.md
    │   └── anti-patterns.md
    └── naming/
        └── conventions.md

### Teammate read procedure

1. Read `~/.claude/agents/review-{your-area}.md` to learn
   the review perspective and output format.
2. If the definition references other files, load the matching
   knowledge file.
3. Use that knowledge to ground your review.

In skills or with the Task tool, the framework handles agent definition paths and read order. With Agent Teams, you currently have to write all of that yourself, inside the prompt.

Comparison with the skill approach

Aspect	Skills / Task tool	Agent Teams
Specifying an agent definition	Name it via `subagent_type`	Write the file path in the prompt
Loading knowledge	Automatic via references inside the definition	Spell out the procedure in the prompt
Output destination management	Defined inside the skill	Specified per-teammate in the prompt
Execution control	Follows the skill's flow	Design Phase structure in the prompt

In other words, even when you've accumulated knowledge as skills or agent definitions, using it from Agent Teams requires translating that content into a natural-language prompt. This should resolve once Agent Teams can reference skills or agent definitions directly, but as of February 2026 we're not there yet.

Prompt Design Guidelines

Combining the official Agent Teams docs with community findings, here are the points worth keeping in mind.

Sharing context with teammates

Teammates don't inherit the lead's conversation history — they spawn as independent instances. CLAUDE.md and MCP servers are loaded automatically, but anything else has to be passed in the spawn-time prompt or via files.

Separating output files

There's no file-level locking, so design things so each teammate owns a different file set.

Phase structure for staged control

Splitting the prompt into Phases — "prep → parallel work → integration → completion" — makes the Team Lead's behavior easier to control.

Delegate Mode (Shift+Tab) is also useful. It restricts the lead's tool execution permissions so they focus on coordination, but as of February 2026 there's a reported bug where teammates lose tool access (GitHub Issue #24073).

Sample: Agent Teams Code Review Prompt

Below is a sample Agent Teams code review prompt that reflects the points above. It's structured so existing agent definition files get loaded by each teammate.

Assumption: This assumes you have agent definition files in ~/.claude/agents/. The team will spawn even without them, but the review perspectives and output format depend on what's in those definitions.

<details open> <summary>Boilerplate agent definition files (simple samples)</summary>

review-architecture.md (Architecture reviewer)

# Architecture Reviewer

## Role
Conduct architecture- and design-level code review.

## Review perspectives
- Soundness of directory structure and layer separation
- Direction of dependencies
- Adherence to single responsibility principle
- Excessive abstraction or unnecessary complexity

## Scoring
Tag each finding with severity:
- [Critical]: Serious issue
- [Warning]: Concern worth improving
- [Suggestion]: Suggestion for a better design

## Output format
- **[severity]** filename:line — issue
  - Why: why it's a problem
  - Fix: how to address it

review-naming.md (Naming reviewer)

# Naming Reviewer

## Role
Review naming conventions for variables, functions, classes, and files.

## Review perspectives
- Adherence to language-specific naming conventions (camelCase / snake_case / PascalCase)
- Whether the role can be inferred from the name (semantic clarity)
- Consistency of abbreviations (e.g. mixing btn vs button)
- Boolean prefixes (is / has / should)

review-frontend.md (Frontend reviewer)

# Frontend Reviewer

## Role
Review frontend-specific patterns in React/Vue/etc.

## Review perspectives
- Component decomposition granularity and Props design
- Appropriateness of state management patterns
- Performance (unnecessary re-renders, over/under-memoization)
- Accessibility (semantic HTML, ARIA attributes)

</details>

Full prompt

Run a code review on this repository using an agent team.
Follow the procedure and proceed autonomously. Only ask the user
when you're unsure.

## Reference file guide

The definition files and knowledge for the review live under `~/.claude/`.
Each teammate should load the files matching their assignment.

### Directory structure

~/.claude/
├── agents/                       # Review perspective definitions (perspective + output format)
│   ├── review-architecture.md    # Architecture review
│   ├── review-naming.md          # Naming convention review
│   └── review-frontend.md        # Frontend-specific (conditional)
│
└── knowledge/                    # Reference knowledge
    ├── architecture/
    │   ├── patterns.md
    │   └── anti-patterns.md
    ├── naming/
    │   └── conventions.md
    └── frontend/
        └── best-practices.md

### Teammate read procedure

1. Read `~/.claude/agents/review-{your-area}.md` to learn
   the review perspective and output format.
2. If the definition references other files, load the matching
   file from `~/.claude/knowledge/`.
3. Use that knowledge to ground your review.

---

## Procedure

### Phase 0: Confirm review scope

Confirm with the user:

1. **Review target**: specific files / recent commits / whole project
2. **Review depth**: full (default) / quick (Critical only)

### Phase 1: Preparation (Team Lead)

1. Create a scratch directory:
   `.claude/code-review-team/.scratch/{YYYY-MM-DD-HHmm}/`

2. Only when "recent commits" is selected, generate a diff context:
   - Save the result of `git diff HEAD~N` to `{scratch}/diff-context.md`

3. Run tech stack detection:
   Identify frameworks from `package.json`, `requirements.txt`, etc.
   Write the result to `{scratch}/stack-detection.md`

4. Read the detection result and decide the team composition for Phase 2

### Phase 2: Team creation & parallel review

Create a team named "code-review" and spawn the following
teammates in parallel.

#### Required members (always spawned)

1. **architecture-reviewer**
   - Role: Architecture / design-level review
   - Definition: `~/.claude/agents/review-architecture.md`
   - Output: `{scratch}/architecture-review.md`

2. **naming-reviewer**
   - Role: Naming convention review
   - Definition: `~/.claude/agents/review-naming.md`
   - Output: `{scratch}/naming-review.md`

#### Conditional members (added based on stack detection)

- **frontend-reviewer** — when React/Vue etc. is detected
  Definition: `~/.claude/agents/review-frontend.md`
  Output: `{scratch}/frontend-review.md`

#### Common rules for all teammates

- First, read `{scratch}/stack-detection.md`.
- Read your own agent definition and follow its perspective and output format.
- If the definition references other files, load them from `~/.claude/knowledge/`.
- Write review results incrementally to your output file.
- Tag each finding with severity: [Critical] / [Warning] / [Suggestion].
- When done, message the Team Lead:
  "Review complete. Critical: X, Warning: Y. See {output} for details."

### Phase 3: Report integration

Once all teammates are done:

1. Read every review file under `{scratch}/`,
   deduplicate, normalize priorities, and produce an integrated report.
   **Note**: Only files inside *this* scratch directory should be integrated.
2. Output: `docs/code-review-team/{YYYY-MM-DD-HHmm}-review.md`

### Phase 4: Wrap up

1. Delete the "code-review" team
2. Present the report to the user and confirm the fix strategy:
   - Fix all findings in one pass
   - Fix only Critical/Warning
   - User will fix themselves (report only)

The point of this sample is that the directory structure is laid out at the top of the prompt and each teammate is told exactly which paths to read. With skills you wouldn't need any of this; with Agent Teams, today, you do.

On Token Consumption

Agent Teams burns a lot of tokens. Each teammate runs as an independent Claude Code instance, so cost scales with team size.

On the Max 20x ($200/month) plan, running a team with five or more teammates 2–3 times per hour consumed about 4% of my Max usage. Because each teammate is its own instance, costs grow with headcount.

Honestly, I haven't run it enough times to measure how much the final output quality differs between the skill approach (Task tool) and Agent Teams. The speed benefit from parallel execution is tangible, but whether the quality improvement justifies the cost takes ongoing testing to find out.

You can rein in cost by assigning Sonnet to the teammates, but the Pro plan ($20/month) is realistically too tight; Max ($100–200/month) feels like the floor.

Leveraging the weekly reset

Claude Code usage limits reset on a 7-day rolling window. The /usage command shows the next reset, so timing Agent Teams sessions to weeks where you have headroom makes them easier to plan.

Limitations (as of February 2026)

Agent Teams is in experimental preview. The main limits:

Limitation	Impact
No session resume	`/resume` doesn't restore teammates
No file locking	Concurrent edits to the same file can overwrite each other
One team per session	Multiple teams can't run simultaneously
No nested teams	Teammates can't spawn sub-teams
Split panes constraints	VS Code integrated terminal, Windows Terminal, and Ghostty are unsupported
Slow shutdown	Shutdown waits for teammates to finish their current request or tool call, which takes time
No direct reference to existing knowledge	No structural way to point Agent Teams at skills or agent definitions

Wrap-up

Agent Teams enables autonomous coordination via direct teammate-to-teammate messaging and a shared task list.

That said, you currently have to express the entire team configuration in natural language, and reusing existing skills or agent definitions means painstakingly enumerating file paths. What the framework used to do for you in the skill approach, you now write yourself inside the prompt.

Token consumption is also high, and I haven't yet been able to clearly measure the quality delta against the skill approach. Looking forward to deeper integration with skills and agent definitions, but for now I'd say "use skills when skills are enough; reach for Agent Teams when parallel execution clearly adds value" is the practical split.

References

Lost in the Middle — Prompt Design that Beats LLM Position Bias

Wed, 04 Feb 2026 12:00:00 GMT

TL;DR

LLMs are prone to missing information placed in the middle of long prompts (the Lost in the Middle problem).
One major driver is the long-range decay of RoPE (Rotary Position Embedding); causal attention masks and biases in training data also play a role — it's a multi-factor structural issue.
Practical countermeasures include the tail checklist pattern, the sandwich strategy, and XML-tag structuring.
Each technique has different token costs and ideal use cases, so picking the right tool for the situation matters.

1. What is Lost in the Middle?

LLM position bias

If you've built an LLM application, you've probably seen "the instructions in the prompt got ignored." It happens often once the system prompt grows to hundreds of lines.

This is the phenomenon known as Lost in the Middle. It was systematically reported in Liu et al. (2023), "Lost in the Middle: How Language Models Use Long Contexts".

The core finding: LLMs exhibit a U-shaped performance curve.

Performance
 ▲
 │  ★                                    ★
 │   ★                                 ★
 │    ★★                            ★★
 │      ★★★                     ★★★
 │         ★★★★★★★★★★★★★
 │
 └──────────────────────────────────────► Information position
   Beginning      Middle (degraded)        End

Concretely, on tasks that ask the model to answer questions referencing multiple documents, performance for information placed in the middle drops by more than 30% compared to the beginning or the end (Liu et al., 2023; the magnitude depends on model and task).

Why middle information gets lost — RoPE's long-range decay

A leading cause is the long-range decay effect of RoPE (Rotary Position Embedding), which most modern LLMs use. Note that position bias isn't only about RoPE — the structure of causal attention masks (the triangular masks that prevent each token from attending to tokens after it) and biases in the positional distribution of training data are all considered contributing factors.

RoPE adjusts attention strength based on the relative position between tokens. A standard Transformer remembers absolute positions ("which slot in the input is this token in?"), while RoPE encodes the relative distance between two tokens as a rotation angle of the vector. The rotation angle grows with distance, which naturally attenuates attention scores between far-apart tokens.

Instructions at the very start of a prompt are "far" from the most recent generated tokens. But because of how a causal language model works — generating tokens left to right with each token only able to attend to earlier tokens — the leading tokens are referenced repeatedly while processing every subsequent token. This cumulative effect ends up preserving information at the beginning and the end strongly.

Tokens in the middle don't get the same cumulative leverage as the beginning, and they aren't close to the generation point like the end either. They fall into an "attention valley."

2. Concrete examples from real projects

How long does a prompt have to be before this matters?

With recent models, you almost never see this on prompts a few dozen lines long. In my experience, the impact starts to show up at system prompts of several hundred lines — for example, RAG setups injecting large amounts of context, or agent applications with complex rule sets.

The problem persists in the latest models. Modarressi et al. (2025), "NoLiMa: Long-Context Evaluation Beyond Literal Matching" (ICML 2025), found that 11 of 13 models — including GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet — fell below 50% of their short-prompt baseline performance at a 32K-token context. Subsequent evaluations showed the same trend on GPT-4.1 and Gemini 2.5 Flash.

Chroma Research's "Context Rot: How Increasing Input Tokens Impacts LLM Performance" (July 2025) tested 18 models for long-context degradation. Interestingly, the failure mode differs by model family: GPT-family models tend to confidently return wrong answers (hallucinate), while Claude-family models tend to abstain when uncertain. The same study reports that the U-shape Liu et al. observed wasn't consistently reproduced — so position bias may manifest differently depending on task and model.

I've personally observed similar behavior on GPT-4.1 mini. Across model generations, position bias is easing but not eliminated.

The example below is simplified for clarity. In real settings, you'd have dozens of similar sections stacking up to several hundred lines or thousands of tokens — that's when the problem appears.

Middle rules ignored in a system prompt

Imagine a system prompt with the following section structure spanning several hundred lines:

You are a customer support assistant.            ← Near the top: followed

## Basic rules
- Respond politely
- Address the user by name

... (dozens more sections) ...

## Response format                                ← Buried in the middle
- Keep answers within 3 sentences
- Use bullet points

## Data reference guide                           ← Buried in the middle
- Always look up pricing in the database

... (many more sections) ...

## Prohibited                                     ← Near the bottom: followed
- Don't recommend competitor products
- Don't ask for personal information

The "Basic rules" at the top and the "Prohibited" list at the bottom get followed, but the "Response format" and "Data reference guide" buried in the middle get ignored. The longer the prompt, the more often you hit this pattern.

Missing requirements in code generation

The same pattern shows up in code generation when the requirements section is long. The tech stack at the top and the response format at the bottom get followed, while validation and error-handling requirements written in the middle drop out entirely. If the whole prompt is short, no problem — but as context and examples grow, the impact starts showing.

3. The tail checklist pattern

Overview

The most practical countermeasure for Lost in the Middle is the tail checklist pattern. You restate the important instructions as a checklist at the very end of the prompt, prompting the LLM to "double-check."

Before / After

The example below simplifies a system prompt that would normally span several hundred lines. In practice each section is more detailed, with many rules and chunks of context in between.

Before (middle instructions get buried):

You are a code review assistant.

## Review perspectives
... (5 items)

... (many sections: coding standards, language-specific rules, edge cases...)

## Output format                       ← Buried in the middle
- Classify severity as High/Medium/Low
- Attach a code example for each suggestion
- State the impact scope

... (more sections)

## Code under review
{code}

After (checklist appended at the end):

You are a code review assistant.

## Review perspectives
... (5 items)

... (many sections: coding standards, language-specific rules, edge cases...)

## Output format
- Classify severity as High/Medium/Low
- Attach a code example for each suggestion
- State the impact scope

... (more sections)

## Code under review
{code}

---
## Final checklist before output       ← Added here
Before producing the answer, confirm all of the following:
- [ ] Did you address all 5 review perspectives?
- [ ] Did you assign a severity (High/Medium/Low) to each finding?
- [ ] Did you attach a code example to each finding?
- [ ] Did you state the impact scope?

By placing the checklist at the end, the LLM "re-recognizes" these constraints right before generating output. You're flipping the U-shape to your advantage by putting the verification items in the position where attention is highest — the end.

In my experience, after introducing this pattern the rate of middle-buried instructions getting ignored dropped noticeably. It works especially well for instructions like "output format," which tend to live in the middle of prompts.

Real example: improving structured JSON output

I ran into this concretely while using LangChain with OpenAI models for a task that extracted structured JSON from free-form user text. The setup used LangChain's with_structured_output, so the schema and field descriptions were defined via Pydantic's Field(description=...), while extraction rules for each field (required vs. optional, default values, format specifications, etc.) lived in the prompt.

As the number of fields grew, extraction accuracy for fields described in the middle of the prompt visibly dropped. Field rules near the top and bottom were applied fine, but fields buried in the middle came back as null or with wrong values — exactly the U-shape.

Adding a reminder at the end of the prompt (after the user input) measurably improved extraction accuracy for those fields.

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

# Pydantic schema (description is also passed to the LLM)
class TaskSchema(BaseModel):
    category: str = Field(description="Choose from the predefined categories")
    priority: int = Field(description="Integer 1-5")
    due_date: str | None = Field(description="ISO 8601 date, null if unknown")

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an assistant that extracts task information.
Extract structured data from the user's input."""),
    ("human", """{user_input}

## Pre-output check
Before responding, confirm the following fields are extracted correctly:
- "category": must be picked from the predefined categories (do not guess)
- "priority": must be an integer 1-5
- "due_date": must be ISO 8601 (null if no date is in the input)"""),
])

structured_llm = llm.with_structured_output(TaskSchema)
messages = prompt.format_messages(user_input=user_input)
result = structured_llm.invoke(messages)

What's notable is that adding this tail reminder had almost no effect on the other fields (the ones that were already being extracted correctly). When I tried the same instructions as emphasis on the field definition in the middle of the prompt, surrounding fields would sometimes drift slightly, but the tail placement showed virtually no such side effects. The tail checklist lets you reinforce a weak spot in a targeted way without breaking the existing output.

One caveat: when you tweak the extraction rules in the prompt, you have to update the Pydantic model's Field(description=...) to match — otherwise the prompt and the schema disagree, and accuracy can suffer despite your fix. with_structured_output passes the schema's description to the LLM as well, so prompt and schema need to stay in sync. It's a mundane point but easy to overlook in practice.

On injecting domain-specific knowledge in LangChain, the LangChain blog post "Incorporating domain specific knowledge in SQL-LLM solutions" recommends dynamically retrieving relevant few-shot examples rather than relying on a static prompt:

A more powerful approach is to have a robust dataset of good examples, and dynamically include those which are relevant to the user question.

Specifically, the post shows building a custom Retriever Tool backed by a vector database to fetch examples semantically similar to the user's question. For structured-output tasks with many fields, dynamically selecting and placing the rules relevant to the input — rather than statically listing every rule — may be less susceptible to Lost in the Middle.

When the tail checklist isn't enough

The technique has limits.

Too many checklist items reduces effectiveness: Per "Curse of Instructions" (ManyIFEval, 2024; ICLR 2025), even an LLM that follows individual instructions 90% of the time has a theoretical success rate of only 0.9^10 ≈ ~35% for satisfying 10 instructions simultaneously (actual values vary by model). Once the checklist gets long, you can also re-introduce Lost in the Middle within the checklist itself.
Vague instructions: Items like "handle this appropriately" let the LLM interpret loosely.
Highly creative generation: For free-form writing, an over-constrained checklist can hurt output quality.

Implementation tips

Tips for using the tail checklist effectively:

1. Write the checklist as "verification items," not as a copy of the body
   - Bad:  Pasting the same prose
   - Good: A concise list of points to verify

2. Cap the list at 5–10 items
   - Too many backfires (Lost in the Middle inside the checklist itself)

3. Explicitly say "verify before answering"
   - Encourage a verification pass

4. Prioritize the items most often missed
   - Don't restate everything; emphasize what's empirically dropped

4. Other approaches

The tail checklist is lightweight and effective, but there are also approaches that improve the structure of the prompt itself, or address the issue outside the prompt — like the RAG pipeline.

Sandwich strategy

Place the most important information at both the beginning and the end of the prompt.

## Most important rule
Always return output in JSON format.

## Context
{lots of context...}

## Additional info
{more context...}

## Reminder: always return output in JSON format.

You're putting the critical instruction at the two ends of the U-shape — the highest-performing positions — so it's simple but effective. The trade-off is that you have to pick a single "most important" item, which makes it a poor fit when you want to emphasize multiple instructions at once.

XML-tag structuring and section splitting

Use XML tags or Markdown headers to clearly partition the prompt into sections that are easy for the LLM to parse.

Anthropic's prompt engineering tutorial recommends separating data and instructions with XML tags. By bracketing input data with tags like <sentences>...</sentences>, the LLM can more clearly distinguish the data region from the instruction region, which can reduce the risk of missing middle information. Note that XML structuring doesn't eliminate position bias by itself — it's better used in combination with other techniques.

<system>
You are a data analysis assistant.
</system>

<rules>
<rule priority="high">Always cite the source of any number</rule>
<rule priority="high">Mark estimates explicitly as "estimated"</rule>
<rule priority="medium">Include axis labels in chart descriptions</rule>
</rules>

<context>
{the data to analyze}
</context>

<output_format>
{output format specification}
</output_format>

Adding a priority attribute also gives the LLM a hint for judging importance. Making it explicit "what is written where" through structure helps reduce the risk of middle information being buried.

Strategic document placement in RAG

In a RAG (Retrieval-Augmented Generation) pipeline, the ordering of retrieved documents directly affects answer quality.

def reorder_documents(docs: list[str], scores: list[float]) -> list[str]:
    """
    A Lost in the Middle countermeasure: place the highest-relevance
    documents at the beginning and the end.

    Example: scores [A(0.9), B(0.8), C(0.7), D(0.6), E(0.5)]
    Result:  [A(0.9), C(0.7), E(0.5), D(0.6), B(0.8)]
              ^^^^^^                           ^^^^^^
              High score at head        High score at tail
    """
    scored_docs = list(zip(docs, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)

    head = []  # head side (even indices: 1st, 3rd, 5th...)
    tail = []  # tail side (odd indices: 2nd, 4th, 6th...)

    for i, (doc, score) in enumerate(scored_docs):
        if i % 2 == 0:
            head.append(doc)
        else:
            tail.append(doc)

    # Reverse the tail so the highest-scoring item lands at the very end
    return head + tail[::-1]

By keeping the lowest-relevance documents in the middle and the highest-relevance ones at the ends, you reduce the risk of important information being overlooked.

Quantitative validation of position vs. accuracy

Lost in the Middle also affects few-shot prompting. Anthropic's blog post "Prompt engineering for Claude's long context window" quantitatively evaluates techniques for improving information retrieval from long contexts — like extracting relevant quotes first before answering, and adding correctly answered Q&A examples to the prompt.

If you want to measure how much position bias affects your own prompts, building a validation pipeline informed by these benchmarks is a good starting point.

5. Summary

Comparing the techniques

Technique	Use case	Token cost	Implementation difficulty
Tail checklist	System prompts in general	Low (just the list)	Low
Sandwich strategy	Single most-important rule	Low (one restated line)	Low
XML-tag structuring	Multiple kinds of information	Medium (tag overhead)	Medium
RAG document placement	RAG pipelines	None (reorder only)	Medium

Pay attention to information placement

Put the most important instructions at the beginning and the end.
If something has to live in the middle, restate it at the end.
The longer the prompt, the higher the Lost in the Middle risk.

Use a tail checklist for double-verification

Add a "Final checklist before output" at the end of the prompt.
Prioritize instructions that have been missed in past runs.
Keep the list to 5–10 items.

Continuously monitor prompt quality

Set up before/after accuracy comparisons for prompt changes.
Periodically check whether failures cluster around specific input patterns.
Position-bias impact can shift across model versions, so revalidate when you upgrade models.

Lessons learned

LLM position bias is a structural issue rooted in the architecture, and you can't fully solve it with prompt wording alone. But understanding the structure and applying countermeasures can substantially improve practical accuracy.
The latest 2025 models still don't eliminate the problem, and the failure mode varies by model family (hallucinate vs. abstain). Validation against your specific model is essential.

References

Liu et al. (2023) "Lost in the Middle: How Language Models Use Long Contexts" - arXiv:2307.03172
Su et al. (2021) "RoFormer: Enhanced Transformer with Rotary Position Embedding" - arXiv:2104.09864
Anthropic "Prompt Engineering: Use XML Tags" - docs.anthropic.com
LangChain Blog "Incorporating domain specific knowledge in SQL-LLM solutions" - blog.langchain.com
Anthropic "Prompt engineering for Claude's long context window" - anthropic.com
Modarressi et al. (2025) "NoLiMa: Long-Context Evaluation Beyond Literal Matching" - arXiv:2502.05167
Chroma Research (2025) "Context Rot: How Increasing Input Tokens Impacts LLM Performance" - research.trychroma.com
"Curse of Instructions: Large Language Models Cannot Follow Multiple Instructions at Once" (2024; ICLR 2025) - OpenReview

Escaping the AI-Generated 'Purple Gradient'

Sun, 18 Jan 2026 12:00:00 GMT

import { Tweet } from 'astro-embed';

TL;DR

Hand a UI to AI without specific direction and you'll almost certainly end up with a "purple gradient."
I rebuilt the design using Claude Code + Playwright MCP + a design AI (Stitch).
The new palette is amber-500 (amber), with a terminal/code-inspired look.

The Purple Gradient Problem

This blog, "Zero-Shot Log," originally had its design generated by AI.

It didn't look bad. But it felt familiar — like I'd seen it somewhere before.

The culprit: the "purple gradient."

I came across an interesting post about this on X.

A reply pointed out that this is heavily influenced by Tailwind's bg-indigo-500.

The theory is that AI learned indigo/purple gradients as the "optimal default" from its training data.

I decided it was time to break out of that pattern.

The Renewal Workflow

For this redesign, I combined three AI tools.

1. Capture the current state: Claude Code + Playwright MCP

First, I connected Playwright MCP to Claude Code and took screenshots of the current site — the top page and an article page, on both desktop and mobile.

This way I could communicate the "current state" accurately when briefing a designer AI.

2. Write the design brief

Based on those screenshots, I wrote a design brief (design-brief.md) for the designer AI. It included:

Site info (name, concept, page structure)
The current color scheme and what was wrong with it
Design elements to avoid (purple gradients, flashy glows, etc.)
The direction I wanted (minimal, warm dark, terminal/Vibe Coding aesthetic, etc.)

The idea is to control AI output by being explicit about what you don't want.

3. Hand it to the design AI (Stitch)

Stitch is a UI generation tool from Google Labs. Powered by Gemini 2.5 Pro, it generates UI designs from text or images. Currently in beta and free to use.

I passed the screenshots and the brief to Stitch and got back a new design proposal.

What came back was an HTML file plus preview images. The warm amber palette was applied as requested.

4. Implementation

Stitch's output includes HTML, but it's mainly there to fill in details that an image alone can't convey. You can't just drop it into existing Astro components, so I rewrote the styles in Claude Code while taking the color palette and font choices as references.

Wrap-up

The "purple gradient" problem in AI-generated design is avoidable with explicit direction.
The flow of "screenshots + brief → design AI → implementation" makes redesigns go smoothly.

It's a good time to be alive — design renewals are this easy with AI now.

I Built a Prompt Expansion Tool: Query Expander

Mon, 12 Jan 2026 12:00:00 GMT

TL;DR

I built Query Expander, a tool that expands LLM prompts into clearer, more explicit instructions.
It runs as a Claude Artifact, so it's available any time you're logged into Claude.
Three detail levels (Concise / Standard / Detailed) let you tune the output.
It auto-detects the input language (Japanese / English) and matches the output to it.

Updated February 2026: Bumped to v1.1.1. Fixes a bug where the copy button didn't work.

Why I Built It

When I ask Claude Code to investigate or implement something, the precision of the answer scales with how clear the request is.

In RAG, turning a user's vague query into something search-friendly is called "Query Expansion." I wanted to apply the same idea to LLM prompts in general.

I used to keep a set of expansion rules configured in Claude itself. A vague request like "look into X" got rewritten into a clear prompt that included the goal, the scope, and the expected output format.

The catch: I was doing this constantly. It came up so often during work that opening a chat every single time started to feel like friction.

So I packaged it as a standalone Claude Artifact. As long as you're logged into Claude, you can use it — no more context switching to expand a prompt. It runs inside your own plan, so you don't have to worry about API token usage either.

For the UI design, I used Google's Stitch. It generates UI from prompts, which makes it fast to put together a base design.

What Query Expander Is

Query Expander is a tool that turns vague prompts into clear, structured ones.

Main features:

Three detail levels: Concise / Standard / Detailed
Refine Only mode: Polish the wording while keeping the original structure
Automatic language detection: Japanese in, Japanese out; English in, English out
Markdown preserved: Headings and list structure survive the expansion

How to Use It

Paste the prompt you want to expand into the text area.
Pick a detail level (Concise / Standard / Detailed).
Click "ENHANCE QUERY".
Copy the expanded prompt and use it.

Choosing a Detail Level

Level	Output Size	Use Case
Concise	1–2 sentences	Minor tweaks, typo fixes
Standard	2–4 sentences	General prompt improvement (default)
Detailed	3–5 items	Complex tasks, when detailed instructions are needed

Refine Only Mode

The default mode expands the prompt. Refine Only mode keeps the original structure and just polishes it.

Difference from the default mode:

Mode	Behavior
Default	Adds purpose, scope, format, etc. to expand the prompt
Refine Only	Keeps the original structure, refines the wording

When Refine Only is the right fit:

You already have a structured prompt and just want to sharpen the wording.
You don't want the original format touched.
You only want a tone adjustment (casual → formal, etc.).

Example

Input:

Reactのstate管理について調べて

Standard output (illustrative):

Reactにおけるstate管理の主要なアプローチについて調査してください。
対象として、useState、useReducer、Context API、および外部ライブラリ（Redux、Zustand、Jotai等）を含めてください。
それぞれの特徴、適切なユースケース、パフォーマンス面での考慮点を比較形式でまとめてください。

A vague "look into it" turns into a clear prompt with purpose, scope, and expected format.

Wrap-up

Tightening up your prompts is unglamorous work, but it directly affects LLM output quality. Query Expander makes that step quick.

It's free to use as long as you're logged into Claude. If you use LLMs day to day, give it a try.

Handing Off Conversations When Antigravity Slows Down

Sat, 03 Jan 2026 10:00:00 GMT

TL;DR

Antigravity gets sluggish during long sessions.
Splitting work across multiple sessions helps, but carrying context over becomes the next problem.
Asking Antigravity itself how to hand off led me to the "brain directory."
With the brain directory, you can resume cleanly in a fresh session.

What Happened

Long conversations slow things down

I was working on a longer implementation task in Antigravity.

The agent was handling a sizable chunk of work, and gradually the responses started getting slower. The UI became unresponsive, and the agent was clearly lagging. A look at Activity Monitor showed memory usage was way up.

The cause: token budget pressure

Digging in, I learned Antigravity has roughly a 200,000-token budget per conversation. As the conversation grows, that budget gets consumed and processing load climbs.

On top of that, conversation history fills up local storage. Antigravity stores screenshots, recordings, and other "Artifacts" locally, which likely contributes to the slowdown.

Splitting Sessions Helps

Basic mitigations

A few things worked for me:

1. Delete old conversation logs

In the Agent Manager history, delete conversations you no longer need. Once you have dozens of accumulated sessions, IDE-wide responsiveness can take a hit.

2. Split long work across sessions

Instead of pushing one conversation to the limit, switch to a new conversation at natural breakpoints. The token budget resets and things feel snappy again.

The remaining problem: handing off context

Splitting sessions fixed performance — but introduced a new problem.

How do you bring the previous work's context into the new session?

Re-explaining "we got this far last time" and "we were going with this approach" every time is tedious. And in the middle of a complex implementation, it's easy to leave something out.

Antigravity has a Knowledge Items feature that learns automatically, and the accumulated knowledge does carry over. But the docs aren't clear about scope (workspace? directory?) or how much actually carries. Sometimes you specifically want to hand off the tasks and files from one conversation.

A Lucky Find: Antigravity's Own Answer

Before switching sessions, I asked Antigravity:

"How should I hand this off to a new conversation?"

The answer:

"When you start a separate conversation, just say 'continue from the previous conversation (a1b2c3d4-...)' and I can pick up the work by referencing these documents. In particular, current_issues.md has a detailed summary of what the next session should tackle."

Wait, that's a thing?

What is current_issues.md? Where does it live? I poked around and found an unfamiliar folder in my home directory:

ls ~/.gemini/antigravity/brain/

This is where the per-session Artifacts were being stored.

What Is the Brain Directory?

Overview

The brain directory (~/.gemini/antigravity/brain/) is where the Antigravity agent stores the records it generates while working.

~/.gemini/antigravity/brain/
├── a1b2c3d4-5678-90ab-cdef-1234567890ab/
│   ├── task.md
│   ├── implementation_plan.md
│   ├── walkthrough.md
│   ├── current_issues.md
│   ├── uploaded_image_1234567890123.png
│   └── uploaded_image_1234567890456.png
├── b2c3d4e5-6789-01bc-def0-2345678901bc/
│   └── ...
└── ...

Each UUID-named folder corresponds to one conversation session.

What gets saved

File	Contents
`task.md`	Task progress (checklist format)
`implementation_plan.md`	Technical details of the implementation plan
`walkthrough.md`	Summary of changes after implementation
`current_issues.md`	Issues to address in the next session
`uploaded_image_*.png`	Images uploaded during the conversation

current_issues.md is the key one for handoffs. It captures what's unresolved and what the next session should pick up.

Handing Off Conversations Using the Brain Directory

Approach 1: Hand off with a UUID

Start a new conversation and pass it the previous session's ID:

Continue from the previous conversation (a1b2c3d4-5678-90ab-cdef-1234567890ab).

The agent reads the matching Artifacts in the brain directory, understands the context, and picks up the work.

Approach 2: Point it at current_issues.md directly

For a more reliable handoff, give it the exact file path:

Check ~/.gemini/antigravity/brain/a1b2c3d4-.../current_issues.md and
resume from the unresolved items.

How to find the conversation ID

Just ask Antigravity directly:

What's the conversation ID (UUID) for this conversation?

Example response:

"This conversation's ID is a1b2c3d4-5678-90ab-cdef-1234567890ab."

Note: I confirmed this in Planning mode with Gemini 3 Pro. Other models or contexts may respond differently.

A Practical Workflow

Antigravity standalone

Start work: Begin a new conversation and assign the task.
During work: The agent automatically generates task.md, implementation_plan.md, etc.
At a breakpoint: When things get sluggish, ask the agent to "bring the Artifacts up to date." task.md, implementation_plan.md, current_issues.md, etc. all get refreshed.
Capture the UUID: Ask "What's the UUID for this conversation?" and save it.
Hand off: Start a new conversation and say "continue from the previous conversation (UUID)."

Key point: Update the Artifacts before handing off. If they're stale, the new session will start from a misaligned picture.

Combined with Claude Code

Brain files are plain Markdown, so other AI tools like Claude Code can read them.

# Inspect brain contents from Claude Code
cat ~/.gemini/antigravity/brain/a1b2c3d4-.../implementation_plan.md

You can hand an implementation_plan.md written by Antigravity to Claude Code and ask it to "review this plan."

Caveats

The brain doesn't store full conversation transcripts

Only Artifacts are stored in the brain — not the full back-and-forth. Subtle nuances and the path of the discussion can be lost, so make sure the agent explicitly records important decisions to the Artifacts.

For reference, ~/.gemini/antigravity/conversations/ contains .pb (Protocol Buffers) files, which appear to hold the conversation data in binary form. You can't read them directly, but the conversation history seems to live there.

Storage usage

Beyond the brain directory, Antigravity also writes evidence files to places like browser_recordings/. Accumulated screenshots and browser recordings can eat through disk space. Periodically cleaning up unused sessions is a good idea.

# Inspect old sessions
ls -la ~/.gemini/antigravity/brain/

# Delete sessions you don't need
rm -rf ~/.gemini/antigravity/brain/<unwanted-uuid>/

Wrap-Up

The Antigravity slowdown is solvable by splitting sessions. And when handing off conversations, you can reference the brain directory by UUID.

Problem: Long conversations consume the token budget and slow everything down.
Solution: Switch to a new session and use the brain Artifacts to carry context forward.
Hidden feature: Just say "continue from the previous conversation (UUID)" and the agent reads the brain Artifacts.

I stumbled into this feature by asking Antigravity itself how to hand off a conversation. There's a chance the answer was a hallucination — but the brain directory is unambiguously real, and pointing the agent at those files for handoff does work in practice. I couldn't find any mention in the official docs, so consider this a useful undocumented trick.

If you're hitting performance issues, give brain-based session management a try.

References

A Practical Neovim Setup Guide for macOS

Tue, 30 Dec 2025 11:00:00 GMT

TL;DR

The macOS IME problem can be solved with im-select.nvim + macism.
On Neovim 0.11, watch out for plugin compatibility (use the 0.1.x tag for Telescope).
You can build a perfectly usable environment with a simple config that doesn't need Nerd Font.
For LSP and completion, the mason.nvim + nvim-cmp combo is the most beginner-friendly option.

Background

This article walks through a Neovim setup that's comfortable to use on macOS. It assumes you know the basic Vim operations (mode switching, cursor movement, save/quit, etc.) but are new to configuring Neovim plugins.

It covers four topics:

Solving the macOS IME problem — the most common pain point for Japanese input
Neovim 0.11 plugin compatibility — what to watch out for on the latest version
A minimal config that doesn't need Nerd Font — an environment that works without font setup
Basic LSP / completion setup — go-to-definition and autocompletion

Prerequisites

The configuration in this article was verified on the following environment:

Item	Version
macOS	Sonoma 14.x or later
Neovim	0.11.3 or later (required for the new LSP API)
Homebrew	Installed
Terminal	iTerm2 (Terminal.app also works)

If you haven't installed Neovim yet:

brew install neovim

1. Solving the macOS IME Problem

The problem

When using Neovim (Vim) on macOS, many people get tripped up by the IME (Japanese input) issue.

Specifically: if you return to Normal mode while still in Japanese input mode, commands like j and k stop working.

The reason: even after switching from Insert back to Normal, macOS keeps the IME state as-is. After typing some Japanese and pressing Esc, the IME is still in Japanese mode — so typing jjj... produces っっっ... instead of moving the cursor.

The fix: im-select.nvim + macism

The fix combines two components:

im-select.nvim (Neovim plugin)
- Detects mode switches (Insert → Normal, etc.)
- Calls an external CLI tool to switch the IME
macism (CLI tool)
- The command that actually flips the macOS IME
- Invoked from im-select.nvim

So im-select.nvim watches for mode changes inside Neovim, and when it sees one, it runs macism to switch the IME back to ASCII.

Why macism

im-select.nvim shells out to a CLI tool to switch the IME on macOS. Several CLI tools exist for this, but the official im-select.nvim README recommends macism:

Please install macism, this is the only one CLI tool can switch CJK and English input methods in macOS correctly. — im-select.nvim README

macism reliably switches CJK input sources like Japanese. With other tools (input-source-switcher and similar), there's a known macOS bug where the menu bar icon flips but the input source doesn't actually change.

Installation

macism isn't in the official Homebrew tap, so you need to add a third-party tap:

# Add the tap
brew tap laishulu/homebrew

# Install macism
brew install laishulu/homebrew/macism

Once installed, verify it works:

# Check the current input source
macism

# Example output: com.apple.inputmethod.Kotoeri.RomajiTyping.Japanese
# or:            com.apple.keylayout.ABC

Neovim configuration

Use the im-select.nvim plugin to control behavior on mode transitions.

-- Example config for the lazy.nvim plugin manager
{
  "keaising/im-select.nvim",
  config = function()
    require("im_select").setup({
      -- Target input source (ASCII keyboard)
      default_im_select = "com.apple.keylayout.ABC",

      -- Absolute path to macism (use /usr/local/bin/macism on Intel Mac)
      default_command = "/opt/homebrew/bin/macism",

      -- When to switch to ASCII
      set_default_events = {
        "VimEnter",       -- On Neovim startup
        "FocusGained",    -- When the window regains focus
        "InsertLeave",    -- When leaving Insert mode
        "CmdlineLeave"    -- When leaving command-line mode
      },

      -- Setting to restore the previous IME on InsertEnter (empty = disabled)
      set_previous_events = {},
    })
  end,
}

Why I set `set_previous_events = {}`

By default, set_previous_events = { "InsertEnter" }, which restores the previous IME state when entering Insert mode.

I disabled this by setting it to an empty table. Reasoning:

Trade-off comparison:

Setting	Pros	Cons
`{ "InsertEnter" }` (default)	Convenient for typing Japanese in succession	Requires switching every time you write English code
`{}` (disabled)	Always starts in ASCII, so behavior is predictable	Manual switch needed when writing Japanese

For programming, English input dominates by far, so "always return to ASCII" feels much smoother. When I do need Japanese, I switch manually — no big deal.

If you write a lot of Japanese documentation, the default { "InsertEnter" } may be more convenient.

2. Neovim 0.11 Plugin Compatibility

Background

Neovim 0.11, released in March 2025, made significant changes to the LSP-related APIs. As a result, some plugins won't work as-is.

Telescope.nvim compatibility issue

The most common one users hit is with Telescope.nvim (the fuzzy finder).

Symptom: Errors when opening Telescope, or broken display.

Cause: The Telescope.nvim stable branch (0.1.x) hadn't caught up with the Neovim 0.11 API changes.

Fix: Use the 0.1.x tag (the latest version is already patched).

-- Neovim 0.11–compatible version
{
  "nvim-telescope/telescope.nvim",
  tag = "0.1.x",  -- Stable tag (Neovim 0.11–compatible)
  dependencies = { "nvim-lua/plenary.nvim" },
}

Choosing a version

As of late 2025, Telescope v0.2.0 has also been released.

Specification	Characteristics	Recommendation
`tag = "0.1.x"`	Stable, Neovim 0.11–compatible	Recommended
`tag = "0.1.8"` etc.	Pinned to a specific version	If reproducibility matters
`branch = "master"`	Latest dev branch	If you want to try new features

The 0.1.x branch was previously incompatible with Neovim 0.11, but that's been fixed. Unless you have a specific reason otherwise, tag = "0.1.x" is the way to go.

Other ways to check compatibility

When a plugin doesn't work, here's how I investigate:

Check GitHub Issues: Search for [plugin name] Neovim 0.11.
Check the Requirements section in the README: The supported Neovim version is usually documented.
Update to the latest version: Run :Lazy sync to refresh plugins.

3. A Minimal Config That Doesn't Need Nerd Font

What is Nerd Font

Most Neovim setup articles tell you to "install Nerd Font." Nerd Font is a font family that adds icons (file types, Git status, folders, and so on) on top of regular fonts.

But getting Nerd Font running requires several steps:

Download Nerd Font.
Install it on your system.
Change the font setting in your terminal.
Verify the changes took effect.

Why I went without Nerd Font

Reasons for skipping Nerd Font:

Pros:

No need to fiddle with terminal font settings.
Easier to share configs across multiple machines (no font install required).
Simple, lightweight look.

Cons:

Harder to tell file types at a glance.
Visually plain.

In actual coding, you can already tell the file type from the filename, so the lack of icons isn't a practical problem.

Disabling icons in plugins

How to disable icons in the major plugins.

nvim-tree (file explorer)

{
  "nvim-tree/nvim-tree.lua",
  config = function()
    require("nvim-tree").setup({
      renderer = {
        icons = {
          show = {
            file = false,        -- Hide file icons
            folder = false,      -- Hide folder icons
            folder_arrow = true, -- Show expand/collapse arrows
            git = false,         -- Hide Git status icons
          },
          glyphs = {
            folder = {
              arrow_closed = ">",  -- Arrow when collapsed
              arrow_open = "v",    -- Arrow when expanded
            },
          },
        },
      },
    })
  end,
}

lualine (status line)

{
  "nvim-lualine/lualine.nvim",
  config = function()
    require("lualine").setup({
      options = {
        icons_enabled = false,       -- Disable all icons
        section_separators = "",     -- No section separators
        component_separators = "|",  -- Simple component separator
      },
    })
  end,
}

Visual comparison

With Nerd Font:

  init.lua    main   lua  utf-8  100%  42:1

Without Nerd Font:

NORMAL | main | init.lua | lua | utf-8 | 100% | 42:1

Information shows up as text instead of icons. Once you get used to it, it's perfectly readable.

4. Basic LSP / Completion Setup

What LSP is

LSP (Language Server Protocol) is a communication protocol between editors and language servers. It enables features like:

Go to definition: Jump to where a function or variable is defined.
Find references: List everywhere a function or variable is used.
Autocompletion: Show candidates while typing.
Diagnostics: Surface errors and warnings in real time.
Rename: Rename a symbol everywhere at once.

Plugin layout overview

To get LSP and completion working, combine these plugins:

mason.nvim              # Language server installer
  └── mason-lspconfig   # Bridge between mason and lspconfig
        └── nvim-lspconfig  # Language server configuration

nvim-cmp                # Completion engine
  ├── cmp-nvim-lsp      # Completion source from LSP
  ├── cmp-buffer        # Complete words from the current buffer
  ├── cmp-path          # Complete file paths
  └── LuaSnip           # Snippet expansion

Why this stack

Concern	Choice	Reason
LSP installation	mason.nvim	GUI-managed, easy for beginners
Completion engine	nvim-cmp	Most widely used, lots of resources
Snippets	LuaSnip	Smooth integration with nvim-cmp

Auto-installing language servers with mason.nvim

With mason.nvim, you can manage language servers from a GUI via the :Mason command. With mason-lspconfig, you can also auto-install the language servers you need.

{
  "neovim/nvim-lspconfig",
  dependencies = {
    "williamboman/mason.nvim",
    "williamboman/mason-lspconfig.nvim",
  },
  config = function()
    -- Initialize mason (enables the :Mason command)
    require("mason").setup()

    -- Specify which language servers to auto-install
    require("mason-lspconfig").setup({
      ensure_installed = {
        "lua_ls",   -- Lua
        "ts_ls",    -- TypeScript/JavaScript
        "pyright",  -- Python
      },
    })
  end,
}

On first launch, the specified language servers are installed automatically.

Neovim 0.11's new LSP configuration API

Neovim 0.11 changed how LSP is configured significantly.

The old way (depends on nvim-lspconfig):

local lspconfig = require("lspconfig")
lspconfig.lua_ls.setup({
  settings = {
    Lua = {
      diagnostics = { globals = { "vim" } },
    },
  },
})

The new way (Neovim 0.11.3+ native):

-- Per-language-server configuration
vim.lsp.config('lua_ls', {
  settings = {
    Lua = {
      diagnostics = { globals = { "vim" } },
    },
  },
})

-- Enable language servers
vim.lsp.enable({ "lua_ls", "pyright", "ts_ls" })

Note: nvim-lspconfig isn't deprecated. Internally it functions as a wrapper that calls vim.lsp.config, so the traditional lspconfig.xxx.setup({}) form still works. For new setups, the new API is recommended since it reduces plugin dependencies.

Benefits of the new API:

Built into Neovim, so it should remain stable long-term.
Simple, easy-to-read syntax.
Supports file-based configuration (under ~/.config/nvim/lsp/).

LSP keymap setup

Bind LSP features to keys. Using the LspAttach event ensures keymaps are only set on buffers where LSP is active.

vim.api.nvim_create_autocmd("LspAttach", {
  callback = function(args)
    local bufnr = args.buf
    local opts = { buffer = bufnr, silent = true }

    -- Jump to definition / declaration / implementation
    vim.keymap.set("n", "gd", vim.lsp.buf.definition, opts)
    vim.keymap.set("n", "gD", vim.lsp.buf.declaration, opts)
    vim.keymap.set("n", "gi", vim.lsp.buf.implementation, opts)

    -- Documentation and edit operations
    vim.keymap.set("n", "K", vim.lsp.buf.hover, opts)
    vim.keymap.set("n", "<leader>rn", vim.lsp.buf.rename, opts)
    vim.keymap.set("n", "<leader>ca", vim.lsp.buf.code_action, opts)

    -- Navigate diagnostics
    vim.keymap.set("n", "[d", vim.diagnostic.goto_prev, opts)
    vim.keymap.set("n", "]d", vim.diagnostic.goto_next, opts)
    vim.keymap.set("n", "<leader>e", vim.diagnostic.open_float, opts)
  end,
})

Why use the LspAttach event

There are several ways to set up keymaps:

Approach	Description	Issue
Global keymaps	Active in every buffer	Keys are bound even in files without LSP
lspconfig.on_attach	Configured via lspconfig	Depends on lspconfig
LspAttach event	Set when LSP attaches	None

LspAttach is a built-in Neovim event, so it doesn't depend on any plugin and works reliably.

Completion setup with nvim-cmp

Basic configuration for the nvim-cmp completion engine.

{
  "hrsh7th/nvim-cmp",
  dependencies = {
    "hrsh7th/cmp-nvim-lsp",   -- LSP completion source
    "hrsh7th/cmp-buffer",     -- Buffer completion
    "hrsh7th/cmp-path",       -- Path completion
    "L3MON4D3/LuaSnip",       -- Snippet engine
    "saadparwaiz1/cmp_luasnip", -- Snippet completion
  },
  config = function()
    local cmp = require("cmp")
    local luasnip = require("luasnip")

    cmp.setup({
      -- Snippet expansion settings
      snippet = {
        expand = function(args)
          luasnip.lsp_expand(args.body)
        end,
      },

      -- Keymaps
      mapping = cmp.mapping.preset.insert({
        ["<C-Space>"] = cmp.mapping.complete(),   -- Manually trigger completion
        ["<C-e>"] = cmp.mapping.abort(),          -- Cancel completion
        ["<CR>"] = cmp.mapping.confirm({ select = true }), -- Confirm
        ["<Tab>"] = cmp.mapping(function(fallback)
          if cmp.visible() then
            cmp.select_next_item()  -- Next candidate
          else
            fallback()
          end
        end, { "i", "s" }),
        ["<S-Tab>"] = cmp.mapping(function(fallback)
          if cmp.visible() then
            cmp.select_prev_item()  -- Previous candidate
          else
            fallback()
          end
        end, { "i", "s" }),
      }),

      -- Completion source priority
      sources = cmp.config.sources({
        { name = "nvim_lsp" },  -- LSP (highest priority)
        { name = "luasnip" },   -- Snippets
      }, {
        { name = "buffer" },    -- Words in the buffer
        { name = "path" },      -- File paths
      }),
    })
  end,
}

The completion sources are prioritized: the first group (LSP, snippets) takes precedence, and if there are no matches, candidates come from the next group (buffer, path).

On Splitting the Config File

All the configuration in this article lives in a single ~/.config/nvim/init.lua.

Why a single file:

Layout	Pros	Cons
Single file	Easy to see and search the whole thing	Hard to read once it gets long
Multiple files	Clear separation of concerns, scales better	Inter-file dependencies get complex

For a config of around 400 lines, a single file is plenty manageable. You can always split it up once it grows.

Summary

This article covered four configurations for using Neovim comfortably on macOS.

IME problem: Solved with im-select.nvim + macism. Setting set_previous_events = {} (always return to ASCII) is the most programming-friendly choice.
Neovim 0.11 compatibility: Use the 0.1.x tag for Telescope (already fixed).
No Nerd Font needed: Disable icons in each plugin to get a simple environment.
LSP / completion: The mason.nvim + nvim-cmp combo gives you GUI-managed installation and rich completion.

This setup is just one option among many. As you use it, customize it to match your taste and use cases.

References

Official documentation

Plugins

im-select.nvim - Automatic IME switching
macism - macOS IME control CLI
nvim-lspconfig - LSP configuration
mason.nvim - Language server management
nvim-cmp - Completion engine
telescope.nvim - Fuzzy finder
lazy.nvim - Plugin manager

Auto-Switch Git and GitHub CLI Accounts Just by cd

Tue, 30 Dec 2025 10:00:00 GMT

TL;DR

Switching between work and personal GitHub accounts requires three layers of configuration.
Combining includeIf + insteadOf + a gh function wrapper delivers full automation.
Just cd into ~/work/ and everything switches over to the work account.

January 2026 update: I revised the Layer 3 gh command switching scheme. The previous chpwd hook + gh auth switch approach has been replaced with a function wrapper + GH_TOKEN environment variable approach. This now handles parallel work across multiple terminal windows.

Background and Problem

The problem I wanted to solve

"I want to use separate work and personal GitHub accounts."

This is a common challenge. Manual switching works, but it has issues:

Forgetting to switch: Committing to a personal repo with the work email.
Operational friction: Manually toggling git config and SSH keys every time is tedious.
The gh CLI: Not just Git — GitHub CLI (gh) needs its own switching too.

The ideal state

"Just change directories, and everything switches automatically."

Specifically: when I move under ~/work/ it should be the work account; otherwise, personal. All of the following should switch automatically:

Git commit identity (user.name, user.email)
SSH key (GitHub authentication)
GitHub CLI (gh command) account

Options Considered

Option A: Manual configuration each time

Manually set git config user.email per repository.

Pros:

No additional setup needed.

Cons:

Configuration required every time you create a repo.
Forget to set it, and you commit with the wrong account.
SSH keys and gh CLI still need separate management.

Option B: direnv + GH_TOKEN

Use direnv to set environment variables per directory.

# ~/work/.envrc
export GH_TOKEN="ghp_xxxx"
export GIT_AUTHOR_EMAIL="work@example.com"

Pros:

Unified management via environment variables.
direnv is widely adopted.

Cons:

Requires installing direnv.
Tokens have to be written to a file (security concern).
Each project needs its own .envrc.

Option C: includeIf + insteadOf + gh function wrapper (chosen)

Combine standard Git features with shell functions.

Pros:

No extra tools (uses Git/Zsh built-ins).
Configure once; everything is automatic afterward.
No need to write tokens to a file.

Cons:

Many configuration items (3 layers).
Requires understanding the mechanism.

Comparison Table

Aspect	Manual	direnv	includeIf+α (chosen)
Initial setup effort	None	Medium	High
Daily friction	High	Low	None
Extra tools	None	direnv	None
Security	-	Token stored in file	OS keychain
Risk of forgetting	High	Low	None
Parallel work across windows	-	Supported	Supported

Final Decision

Adopted: includeIf + insteadOf + gh function wrapper

Deciding factors:

No additional tools: Achievable with Git and Zsh built-ins alone.
No forgetting: Determined automatically by directory structure.
Security: gh CLI tokens are stored in the OS keychain.
Multi-window support: Per-process auth via environment variables, so other terminals are unaffected.

Trade-offs accepted:

Initial setup is more involved (covered in this article).
Without understanding the mechanism, troubleshooting is harder.

Why "Three Layers" Are Necessary

Here's the key point. Switching Git and GitHub accounts requires three distinct configurations.

┌─────────────────────────────────────────────────────┐
│                   Your Machine                      │
├─────────────────────────────────────────────────────┤
│                                                     │
│  [Layer 1] Git user settings                        │
│    → Name and email recorded in commits             │
│    → Switched via includeIf in .gitconfig           │
│                                                     │
│  [Layer 2] SSH key                                  │
│    → Authentication for git push/pull to GitHub     │
│    → Switched via SSH host alias + insteadOf        │
│                                                     │
│  [Layer 3] gh CLI account                           │
│    → Account used for gh pr create and other ops    │
│    → Switched via gh function wrapper + GH_TOKEN    │
│                                                     │
└─────────────────────────────────────────────────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │      GitHub         │
              │ (personal or work)  │
              └─────────────────────┘

Why each layer needs its own configuration

Each one is invoked at a different moment.

Layer	When used	What it identifies
Layer 1	At `git commit`	Commit author
Layer 2	At `git push/pull`	Connection auth to GitHub
Layer 3	At `gh pr create`, etc.	The actor for GitHub API operations

Configuring only Layer 1, for example, still leaves you pushing with the wrong SSH key. Full switching only works once all three layers are set up correctly.

How to Configure Each Layer

The example below treats ~/work/ as work and everything else as personal.

Layer 1: Auto-switching user settings via includeIf

includeIf is a Git feature that loads a different config file based on a condition.

~/.gitconfig (the main config file)

# Default (personal) settings
[user]
    name = Your Personal Name
    email = personal@example.com

# Under ~/work/, additionally load the work config
[includeIf "gitdir:~/work/"]
    path = ~/.gitconfig-work

~/.gitconfig-work (the work config file)

[user]
    name = Your Work Name
    email = work@company.com

How it works

[includeIf "gitdir:~/work/"]: Applies when the current repo is under ~/work/.
path = ~/.gitconfig-work: Loads this config file.

Important details:

The path after gitdir: must end with / (it's ~/work/, not ~/work).
The trailing / is interpreted as ** (any subdirectory).
Settings loaded later take precedence (which is why user.name is overridden by the work value).

Verification

# Check from a personal directory
cd ~/personal/some-repo
git config user.email
# → personal@example.com

# Check from a work directory
cd ~/work/some-project
git config user.email
# → work@company.com

To confirm exactly which file each setting comes from:

git config --list --show-origin

This shows the source file for every setting.

Layer 2: SSH host alias + insteadOf for SSH key switching

SSH key switching combines two pieces of configuration.

Step 1: Define a host alias in the SSH config

Add the following to ~/.ssh/config:

# Personal (default)
Host github.com
    IdentityFile ~/.ssh/id_ed25519_personal

# Work (alias)
Host github-work
    HostName github.com
    IdentityFile ~/.ssh/id_ed25519_work

How it works

Host github.com: Settings used when connecting to github.com.
Host github-work: Defines a fictional host name called github-work.
- The actual destination is HostName github.com (real GitHub).
- But the SSH key used is id_ed25519_work (the work key).

So accessing git@github-work:org/repo.git uses the work SSH key.

Step 2: Auto-rewrite URLs with Git's insteadOf

For work repos, automatically rewrite github.com access to github-work.

Add the following to ~/.gitconfig-work:

[user]
    name = Your Work Name
    email = work@company.com

# Rewrite SSH-form URLs
[url "git@github-work:"]
    insteadOf = git@github.com:

# Rewrite HTTPS-form URLs too
[url "git@github-work:"]
    insteadOf = https://github.com/

How it works

insteadOf: Git auto-substitutes the URL during resolution.
For example, git clone git@github.com:org/repo.git:
- Internally becomes git@github-work:org/repo.git.
- SSH config then connects github-work using the work SSH key.

Verification

# Check the remote URL in a work repo
cd ~/work/some-project
git remote -v
# → origin  git@github.com:company/repo.git (fetch)
#    ↑ The displayed value is still github.com

# Check the URL actually used
git config --get-regexp 'url.*'
# → url.git@github-work:.insteadof git@github.com:

Layer 3: gh CLI account switching via a gh function wrapper

GitHub CLI (gh) gained multi-account support in v2.40.0.

Prerequisite: log in with both accounts

# Log in with the first account
gh auth login

# Log in with the second account (added)
gh auth login

# Check login status
gh auth status

gh auth status lists every account you're logged into and which one is currently active.

Add to ~/.zshrc

########################################
# GitHub CLI account switching (function wrapper approach)
# Work account under ~/work, personal account elsewhere.
# Set the account names in ~/.zshrc.local:
#   GH_PERSONAL_ACCOUNT="your-personal"
#   GH_WORK_ACCOUNT="your-work"
########################################
gh() {
  local token
  if [[ "$PWD" == "$HOME/work"* ]]; then
    token=$(command gh auth token --user "$GH_WORK_ACCOUNT" 2>/dev/null)
  else
    token=$(command gh auth token --user "$GH_PERSONAL_ACCOUNT" 2>/dev/null)
  fi

  if [[ -n "$token" ]]; then
    GH_TOKEN="$token" command gh "$@"
  else
    command gh "$@"
  fi
}

Set account names in ~/.zshrc.local

# Set your GitHub usernames
GH_PERSONAL_ACCOUNT="your-personal-username"
GH_WORK_ACCOUNT="your-work-username"

How it works

What the function wrapper does:

It wraps the gh command in a shell function that, at invocation time, checks the current directory and uses the appropriate token.

Element	Description
`gh auth token --user`	Fetches the token for the specified account.
`GH_TOKEN="..." command gh`	Passes the env var only to that command (process-local).
`command gh`	Calls the actual `gh` command, not the function.

Why the command keyword matters:

command gh "$@"

Inside a shell function, calling gh would recursively call the function itself. The command keyword bypasses the function and invokes the real binary.

Why this approach:

I previously used a chpwd hook + gh auth switch, but it broke when working across multiple terminal windows in parallel.

Old approach: gh auth switch mutates global state, so switching in one window affected the other.
New approach: GH_TOKEN is only set for the duration of that command (process-local), so other windows are unaffected.

Why account names go in ~/.zshrc.local:

Account names are personal, so I don't want them in my dotfiles repository.
~/.zshrc.local is treated as a Git-untracked file.

Verification

# Check from a work directory
cd ~/work/some-project
gh api user --jq '.login'
# → work-username

# Check from a personal directory
cd ~/personal/my-repo
gh api user --jq '.login'
# → personal-username

Note: gh auth status shows global state and does not accurately reflect the current state with the function wrapper approach. To check which account is actually used, use gh api user as shown above.

End-to-End Verification

Once everything is configured, verify with these steps:

# 1. Check from a personal directory
cd ~/personal/my-repo
git config user.email          # → personal@example.com
gh api user --jq '.login'      # → personal-username

# 2. Check from a work directory
cd ~/work/company-project
git config user.email          # → work@company.com
gh api user --jq '.login'      # → work-username

# 3. Confirm push works (in a work repo)
git push --dry-run             # Authenticates with the work SSH key

Results After Adoption

What worked

Zero forgotten switches: Determined automatically by directory structure, no need to think about it.
No setup overhead at the start of work: Previously I'd always check "which account am I on?"
Fewer incidents: No more accidental commits/pushes from the wrong account.
Parallel work across windows now works: Thanks to the function wrapper approach, work and personal terminals can stay open side by side.

How I got here

I originally adopted the chpwd hook + gh auth switch approach, but it broke when working in parallel across multiple terminal windows. Because gh auth switch mutates global state, switching in one window affected the other.

So I moved to the function wrapper + GH_TOKEN environment variable approach. With this approach:

The env var is only effective during command execution (process-local).
No cd hook is needed; it's simpler.
Global state (gh auth status) is never mutated.

What I Learned

1. Git auth and GitHub auth are different things

It's tempting to think "just change Git settings and you're done," but in reality there are multiple layers — SSH auth, GitHub API auth, and so on. Each needs to be understood and configured.

2. Built-in features are often enough

includeIf, insteadOf, and shell functions are all standard features. Without adding new tools, I achieved the goal by combining what was already there.

3. Pursuing UX is worth it

I obsessed over the "just cd" experience. Setup is complex, but day-to-day operations stay simple. Balancing upfront investment against ongoing cost is the call worth making.

References

Official documentation

I Built a Tech Blog with Astro

Tue, 30 Dec 2025 09:00:00 GMT

Welcome to Zero-Shot Log

This blog is built with Astro.

Why Astro?

Performance: Strips out unnecessary JavaScript and delivers the fastest possible static-site performance.
DX (Developer Experience): Drop in components from React, Vue, Svelte, or whatever framework you prefer.
Markdown-first: Content is managed as Markdown, which is great for engineers.

console.log("Hello, Astro!");

Zero-Shot Log

Claude Code Routines × GitHub: Pitfalls and Workarounds

TL;DR

Background

Issue 1: MCP Timeout — Large Files Won't Save

Symptom

Investigation: GitHub API limit?

Investigation: Claude.ai platform timeout?

Findings

Can git push over Bash help?

Workarounds

Avoiding the Timeout: Chunked File Saves

Splitting rules in the prompt

Trade-offs of chunked saves

Side note: Watch out for timezone drift

Summary

References

Enabling and Using Codex's experimental Hooks

TL;DR

Background

1. Discovery and enablement

Checking the feature flag

Enable it

2. Configuration file

Path and format

JSON structure

What doesn't work

3. Supported events

The three supported events

Handler types

JSON payload passed to stdin

4. Use case: unified management of multiple agent sessions

What I configure in hooks.json

5. Limitations and outlook

Current limitations

The OSS upside

What's next

Summary

References

Hands-on with Claude Code's /loop Feature

TL;DR

Background

1. /loop basics

Syntax

Three ways to specify the interval

What happens under the hood

2. Practical use cases

Use case 1: Periodic API endpoint checks

Use case 2: GitHub issue monitoring

3. Auto-compact and interval design

Things to be aware of

Interval guidelines

4. Combining with Hooks

Problem: cron triggers and manual input look identical

Workaround: prefix convention

5. Constraints and where it fits

/loop constraints

How it compares to GitHub Actions and friends

Summary

What I like about /loop

References

Claude Code Agent Teams: Reusing Existing Skill and Agent Knowledge

TL;DR

What is Agent Teams?

Enabling and using it

Reusing Existing Knowledge: The Current State of Natural-Language Prompts

You have to spell out paths in detail

Comparison with the skill approach

Prompt Design Guidelines

Sharing context with teammates

Separating output files

Phase structure for staged control

Sample: Agent Teams Code Review Prompt

Full prompt

On Token Consumption

Leveraging the weekly reset

Limitations (as of February 2026)

Wrap-up

References

Lost in the Middle — Prompt Design that Beats LLM Position Bias

Can `git push` over Bash help?

`/loop` constraints

What I like about `/loop`