[ Blog_Posts ]

Claude Code + Codex Plugin: Two AI Brains, One Terminal

How to use OpenAI's Codex plugin inside Claude Code — turning Claude Opus and GPT-5.4 into a dual-brain coding system. Setup, commands, rescue workflows, and when each brain wins.

2026-04-07 Harrison Guo 5 min read claude-code codex ai-agent openai plugin dual-ai

Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs

Claude Code's memory system looks simple on purpose. This piece breaks down the tradeoffs behind Markdown memories, Sonnet side-queries, and the decision to avoid vector databases.

2026-04-05 Harrison Guo 9 min read claude-code memory ai-agent openclaw hermes-agent architecture

Claude Code Deep Dive Part 3: The 5-Level Compression Pipeline Behind 1M Tokens

Claude Code doesn't just stuff conversations into a 1M-token window. It uses a 5-level compression pipeline, cache-aware edits, and a final autocompact fallback to keep sessions alive.

2026-04-04 Harrison Guo 9 min read claude-code context-engineering ai-agent compression source-leak

Claude Code Deep Dive Part 2: The 1,421-Line While Loop That Runs Everything

Inside query.ts — the 1,729-line async generator that is Claude Code's beating heart. 10 steps per iteration, 9 continue points, 4-stage compression, and streaming tool execution. With line numbers.

2026-04-03 Harrison Guo 9 min read claude-code ai-agent architecture source-leak query-loop

Observability and Billing for AI API Calls: A T-Shaped Architecture

AI API calls are unlike ordinary RPC: per-request cost varies 100×, tokens and models are first-class, streaming muddies timing, caching changes the pricing. A T-shaped instrumentation architecture — shared stem, specialized arms — that handles tracing, billing, and cost analytics without any of them contaminating the others.

2026-04-01 Harrison Guo 13 min read ai-infrastructure llm observability billing cost-attribution architecture openai anthropic backend-engineering system-design ai-operations

Claude Code's Memory Is Simpler Than You Think — And That's a Problem

I read the leaked source code. Claude Code's memory system is just Markdown files + an LLM picker. No vector search, no embeddings, no RAG. Here's why that matters.

2026-04-01 Harrison Guo 6 min read claude-code memory ai-agent openclaw source-leak

Claude Code Source Leaked: 5 Hidden Features Found in 510K Lines of Code

Claude Code v2.1.88 accidentally exposed its entire source. We found hidden pets, undercover mode, permanent memory, and more.

2026-03-31 Harrison Guo 7 min read claude-code source-leak anthropic ai-agent security

Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works

Strong, eventual, causal, read-your-writes, linearizable — consistency models are taught as a taxonomy. Production uses them as a menu. Ten scenarios, the right consistency choice for each, and the engineering that makes the choice work.

2026-03-28 Harrison Guo 13 min read distributed-systems consistency cap-theorem pacelc eventual-consistency linearizability saga crdt database system-design backend-engineering

The AI Stack Explained: LLM Talks, Program Walks

A first-principles breakdown of the entire AI stack — from LLM to Agent in one mental model. An LLM can only output text. Everything else is the program.

2026-03-28 Harrison Guo 8 min read ai llm function-calling mcp agent rag ai-architecture first-principles

gRPC Interceptors in Production: Design Patterns That Survive Real Load

gRPC interceptors are where cross-cutting concerns live — auth, tracing, retry, metrics, rate limiting. Most examples online show toy single-interceptor demos. Production systems need to stack, order, and compose them correctly. A practical guide.

2026-03-24 Harrison Guo 9 min read golang go grpc interceptors middleware distributed-systems observability backend-engineering api-design microservices

[ Blog_Posts ]

[ Connect_With_Me ]