2 Books·36 Chapters·~182K Words

The AI Engineer's Library

Production handbooks for engineers shipping AI.

Two books that cover the full operational stack for LLM features in production — from tracing and evals to multi-step agents and recovery. Written for backend and platform engineers.

Observability for LLM ApplicationsAgents in Production

Paperback · Available now on Amazon

Observability for LLM Applications

Book 1

Observability for LLM Applications

Tracing, Evals, and Shipping AI You Can Trust

The production handbook for backend and platform engineers shipping LLM features. Covers OpenTelemetry GenAI semantic conventions, the 2026 tool landscape (Langfuse, LangSmith, Arize Phoenix, Braintrust), evals as a first-class observability signal, cost tracking, drift detection, and incident response. 18 chapters, ~88K words.

What you'll learn

OpenTelemetry GenAI semantic conventions
Langfuse, LangSmith, Arize Phoenix, Braintrust
Evals as a first-class observability signal
Cost tracking and token-level accounting
Drift detection and model regression
Incident response for AI systems
Prompt versioning and lineage
Structured logging for LLM pipelines
amazonPaperback

Available now in Paperback and Hardcover. Kindle eBook coming soon.

Book 2

Agents in Production

Building, Tracing, and Shipping Multi-Step AI You Can Trust

How to build, trace, evaluate, guard, deploy, and recover LLM agents in production. Covers LangGraph, OpenAI Agents SDK, Anthropic Claude Agent SDK, CrewAI, Microsoft agent-framework, Meta's Agents Rule of Two, and the full agent operations stack. 18 chapters, ~94K words.

What you'll learn

LangGraph, OpenAI Agents SDK, Claude Agent SDK
CrewAI and Microsoft agent-framework
Meta's Agents Rule of Two
Multi-step agent tracing and debugging
Guardrails and safety layers
Agent evaluation frameworks
Deployment patterns and rollback
Recovery and self-healing agents
amazonPaperback

Available now in Paperback and Hardcover. Kindle eBook coming soon.

Agents in Production

The series

Two books. One operational stack.

Book 1 builds the observability foundation — tracing, evals, cost tracking, drift detection. Book 2 applies it to agents — multi-step workflows, guardrails, deployment, recovery. Read Book 1 first for the foundation, then Book 2 for agents specifically.

Observability

18 chapters · ~88K words

The foundation. Instrument, trace, evaluate, and monitor LLM features with OpenTelemetry and the modern tool landscape. Know when your AI is drifting before your users do.

Agents

18 chapters · ~94K words

The application. Build, trace, guard, deploy, and recover multi-step LLM agents using every major framework. From single-tool calls to autonomous workflows.

Written for

Backend engineers adding LLM features to existing services
Platform engineers building internal AI infrastructure
Tech leads responsible for AI reliability and cost
SREs who need to monitor and debug LLM-powered systems
Engineers evaluating agent frameworks for production use

Ship AI you can actually trust.

Gabriel Anhaia

About the author

Gabriel Anhaia

Senior Software Engineer based in Berlin. 10+ years of experience building scalable backend systems for fintechs and high-growth companies.

Also the author of the Actually Learn series, Hexagonal Architecture in Go, The Complete Guide to Go Programming, and creator of Hermes IDE.

Ship AI with confidence

Stop guessing.
Observe. Build. Ship.

Two books covering the full production stack for LLM features and AI agents. Available now on Amazon.

Paperback · Available now