2 Books·36 Chapters·~182K Words

The AI Engineer's Library

Production handbooks for engineers shipping AI.

Two books that cover the full operational stack for LLM features in production — from tracing and evals to multi-step agents and recovery. Written for backend and platform engineers.

amazonBook 1: Observability amazonBook 2: Agents

Paperback · Available now on Amazon

Book 1

Observability for LLM Applications

Tracing, Evals, and Shipping AI You Can Trust

The production handbook for backend and platform engineers shipping LLM features. Covers OpenTelemetry GenAI semantic conventions, the 2026 tool landscape (Langfuse, LangSmith, Arize Phoenix, Braintrust), evals as a first-class observability signal, cost tracking, drift detection, and incident response. 18 chapters, ~88K words.

What you'll learn

OpenTelemetry GenAI semantic conventions

Langfuse, LangSmith, Arize Phoenix, Braintrust

Evals as a first-class observability signal

Cost tracking and token-level accounting

Drift detection and model regression

Incident response for AI systems

Prompt versioning and lineage

Structured logging for LLM pipelines

amazonPaperback

Available now in Paperback and Hardcover. Kindle eBook coming soon.

Book 2

Agents in Production

Building, Tracing, and Shipping Multi-Step AI You Can Trust

How to build, trace, evaluate, guard, deploy, and recover LLM agents in production. Covers LangGraph, OpenAI Agents SDK, Anthropic Claude Agent SDK, CrewAI, Microsoft agent-framework, Meta's Agents Rule of Two, and the full agent operations stack. 18 chapters, ~94K words.

What you'll learn

LangGraph, OpenAI Agents SDK, Claude Agent SDK

CrewAI and Microsoft agent-framework

Meta's Agents Rule of Two

Multi-step agent tracing and debugging

Guardrails and safety layers

Agent evaluation frameworks

Deployment patterns and rollback

Recovery and self-healing agents

amazonPaperback

Available now in Paperback and Hardcover. Kindle eBook coming soon.

The series

Two books. One operational stack.

Book 1 builds the observability foundation — tracing, evals, cost tracking, drift detection. Book 2 applies it to agents — multi-step workflows, guardrails, deployment, recovery. Read Book 1 first for the foundation, then Book 2 for agents specifically.

Observability

18 chapters · ~88K words

The foundation. Instrument, trace, evaluate, and monitor LLM features with OpenTelemetry and the modern tool landscape. Know when your AI is drifting before your users do.

Agents

18 chapters · ~94K words

The application. Build, trace, guard, deploy, and recover multi-step LLM agents using every major framework. From single-tool calls to autonomous workflows.

Written for

→Backend engineers adding LLM features to existing services

→Platform engineers building internal AI infrastructure

→Tech leads responsible for AI reliability and cost

→SREs who need to monitor and debug LLM-powered systems

→Engineers evaluating agent frameworks for production use

Ship AI you can actually trust.

About the author

Gabriel Anhaia

Senior Software Engineer based in Berlin. 10+ years of experience building scalable backend systems for fintechs and high-growth companies.

Also the author of the Actually Learn series, Hexagonal Architecture in Go, The Complete Guide to Go Programming, and creator of Hermes IDE.

xgabriel.com

Ship AI with confidence

Stop guessing.
Observe. Build. Ship.

Two books covering the full production stack for LLM features and AI agents. Available now on Amazon.