Solutions Directory

Agent Solutions Reference

Every tool, framework, and standard from the Blueprint Quest. Explained.

Building an AI agent means making eight architectural decisions. Each one shapes what your agent can do, how it fails, and whether it survives production. This directory covers every model, framework, memory system, tool integration, orchestration pattern, security control, governance standard, and deployment platform that appears in the Blueprint Quest. You'll find descriptions, key stats, and links to authoritative documentation. Use it as a companion while you play the Quest, or as a standalone reference when you're deep in a build. No fluff. Just the options you need to evaluate.

Level 01

Foundation Models

The model you choose sets the ceiling for your agent's reasoning, cost, and deployment flexibility. Frontier models maximize capability. Open-source models maximize control. Routing strategies let you optimize both.

Claude Opus 4 Frontier

Anthropic's most capable model. Excels at complex reasoning, extended tool use, and agentic workflows that require sustained focus across many steps. Best-in-class for code generation and analysis tasks.

Context: 200K Tool calling: Yes Vision: Yes

Docs ↗ Hub Article

Claude Sonnet 4 Mid-Tier

Balanced performance and cost. Fast enough for real-time workflows, smart enough for most agent tasks. The workhorse model for production agents that need to stay within budget without sacrificing quality.

Context: 200K Tool calling: Yes Vision: Yes

Docs ↗ Hub Article

GPT-4o Frontier

OpenAI's flagship multimodal model. Handles text, vision, and audio natively. Strong tool-calling support with structured outputs and parallel function execution. A reliable default for many agent architectures.

Context: 128K Tool calling: Yes Vision + Audio: Yes

Docs ↗ Hub Article

GPT-4o-mini Mid-Tier

OpenAI's cost-optimized model. Surprisingly capable for its price point. Ideal for high-volume agent tasks where you need thousands of calls per hour without burning through your budget.

Context: 128K Tool calling: Yes Vision: Yes

Docs ↗

Gemini 2.5 Pro Frontier

Google's top-tier reasoning model with native multimodal support. Its massive context window makes it a strong choice for agents that need to process large documents, codebases, or multi-turn conversations.

Context: 1M Tool calling: Yes Multimodal: Native

Docs ↗ Hub Article

Gemini 2.5 Flash Mid-Tier

Speed-optimized variant with the same context window as Pro. Perfect for latency-sensitive agents that still need strong reasoning. Google's answer to the "fast and smart" category.

Context: 1M Tool calling: Yes Latency: Low

Docs ↗

Llama 3.3 70B Open Source

Meta's flagship open-weight model. Near-frontier reasoning with full data sovereignty. You can self-host it, fine-tune it, and audit every weight. The go-to for teams that need control over their entire stack.

Context: 128K Tool calling: Yes Self-host: vLLM / Ollama

Docs ↗ Hub Article

Mistral Large Open Source

Mistral AI's largest model at 123B parameters. Strong multilingual capabilities and function calling support. A solid European alternative for teams with EU data residency requirements.

Params: 123B Tool calling: Yes Multilingual: Strong

Docs ↗

Mixtral 8x22B Open Source

Mixture-of-Experts architecture. Only activates a subset of parameters per token, giving you strong performance at a fraction of the compute cost. Ideal for cost-effective multi-agent deployments where you need many concurrent model calls.

Architecture: MoE Active params: 39B/call Tool calling: Yes

Docs ↗

Phi-3 Open Source

Microsoft's small language model. Punches well above its weight class for its size. Designed for edge and on-device deployment where you need an agent running locally without GPU infrastructure.

Params: 3.8B mini Edge-ready: Yes Latency: Lowest

Docs ↗

Multi-Model Routing Pattern

Not a single model. It's an architecture pattern. A router dispatches each task to the optimal model based on complexity, cost, and latency requirements. Simple queries go to fast, cheap models. Hard problems go to frontier models. You optimize both cost and capability dynamically.

Tools: LiteLLM, OpenRouter Cost: Optimized

LiteLLM ↗ OpenRouter ↗

Level 02

Agent Frameworks

Your framework determines how agents reason, remember, and execute. Some give you full graph-based control. Others handle the infrastructure so you can focus on business logic. Choose based on how much abstraction you want.

LangGraph Open Source

Graph-based stateful workflows from the LangChain team. You define agent logic as nodes and edges with explicit state transitions. The most production-hardened open-source option with LangSmith observability built in.

Languages: Python, JS, Java Stars: 97K+ (LangChain) MCP: Yes

Docs ↗ Hub Article

LangChain Open Source

The foundational LLM orchestration library. Chains, prompts, retrievers, and tools in a composable package. LangGraph builds on top of it for complex agents, but LangChain alone handles simpler single-agent patterns well.

Languages: Python, JS Ecosystem: Largest

Docs ↗ Hub Article

CrewAI Open Source

Role-based multi-agent teams. You define agents with roles, goals, and backstories, then assign them tasks. Intuitive mental model for teams new to multi-agent systems. Great for research and content workflows.

Language: Python Stars: 46K MCP: Yes

Docs ↗ Hub Article

AutoGen Enterprise

Microsoft's conversational multi-agent framework. Agents collaborate through message-passing in group chats. Now in maintenance mode as Microsoft shifts focus to the Agent Framework and Semantic Kernel. Still used in existing deployments.

Languages: Python, C# Status: Maintenance mode

Docs ↗

Semantic Kernel Enterprise

Microsoft's enterprise AI orchestration SDK. Plugin-based architecture with deep Azure integration. The natural choice for .NET shops and teams already in the Microsoft ecosystem. Supports the new Microsoft Agent Framework.

Languages: Python, C#, Java Stars: 27K MCP: Yes

Docs ↗ Hub Article

OpenAI Agents SDK Managed

Lightweight agent framework from OpenAI. Built around handoffs between specialized agents, with built-in guardrails and tracing. Supports voice agents natively. Tight integration with the OpenAI API.

Language: Python Lock-in: OpenAI models

Docs ↗

Claude Agent SDK Managed

Anthropic's native agent toolkit. Built for Claude's tool_use architecture with first-class MCP support and computer use capabilities. Low learning curve if you're already building with Claude. Ideal for single-agent patterns.

Languages: Python, JS Stars: 5.4K MCP: Native

Docs ↗

AWS Bedrock Agents Managed

Fully managed agents on AWS. Built-in guardrails, knowledge bases, and managed sessions. The fastest path from zero to production for customer-facing agents. You trade flexibility for operational simplicity.

Languages: Python, JS, Java Guardrails: Built-in MCP: Yes

Docs ↗ Hub Article

Google ADK Managed

Google's Agent Development Kit. Evaluation-first design with built-in testing primitives. Optimized for Gemini models and Google Cloud infrastructure. Strong integration with Vertex AI for production deployment.

Languages: Python, JS Optimized for: Gemini

Docs ↗ Hub Article

Haystack Open Source

Pipeline-as-code from deepset. Modular components snap together for RAG, agents, and document processing. Lightweight and clean. A strong choice if you want explicit, readable pipelines without the overhead of a full agent framework.

Language: Python Focus: RAG + Pipelines

Docs ↗

Level 03

Memory Systems

Memory determines whether your agent learns, forgets, or hallucinates. Stateless is safe and simple. Vector databases ground responses in facts. Hybrid architectures give agents both short-term context and long-term knowledge.

Stateless (No Memory) Pattern

Every request is independent. No conversation history, no persistence. The simplest architecture and the easiest to audit. Works for one-shot tasks like classification or single-turn code generation.

Complexity: Trivial Attack surface: None

Hub Article

Sliding Window Pattern

A fixed-size buffer of recent messages. Oldest messages drop off as new ones arrive. Simple to implement and predictable in cost. The default starting point for conversational agents that don't need long-term recall.

Complexity: Low Staleness: Recent only

LangChain Memory ↗ Hub Article

Pinecone Managed

Managed vector database built for RAG workloads. Handles embedding storage, similarity search, and metadata filtering at scale. You don't manage infrastructure. The most popular choice for production RAG pipelines.

Type: Vector DB Retrieval: 50-200ms

Docs ↗

Chroma Open Source

Open-source embedding database that runs locally or in the cloud. Dead simple API. Great for prototyping RAG agents quickly. Scales from a developer laptop to a production cluster without changing your code.

Type: Vector DB Self-host: Yes

Docs ↗

Weaviate Open Source

Vector database with built-in vectorization modules. Supports hybrid search combining vector similarity with keyword matching. Strong multimodal support for agents that work with images, not just text.

Type: Vector DB Hybrid search: Yes

Docs ↗

PostgreSQL pgvector Open Source

Vector similarity search as a PostgreSQL extension. If your app already runs on Postgres, you don't need a separate vector database. Add embeddings to your existing tables and query them with SQL. Simple and effective.

Type: DB Extension Infra: Existing Postgres

Docs ↗

Redis Open Source

In-memory data store with vector search capabilities. Blazing fast for session-level agent memory and short-term context caching. Combine it with a persistent vector DB for a two-tier memory architecture.

Type: In-memory + Vector Latency: Sub-ms

Docs ↗

Neo4j Knowledge Graph Enterprise

Graph database for structured knowledge. Stores entity relationships that agents can traverse for reasoning. Perfect when your agent needs to understand connections between concepts, people, or systems. Not just similarity search. Real structured reasoning.

Type: Graph DB Traversal: 20-100ms

Docs ↗

Conversation Buffer Pattern

Full conversation history with LLM-based summarization when you approach the context limit. The model compresses older turns into summaries while keeping recent messages intact. Balances context preservation with token efficiency.

Complexity: Medium Cost: Summarization calls

LangChain Memory ↗

Hybrid Memory Patterns Pattern

Working memory for the current session plus persistent storage for cross-session context. User preferences, learned patterns, and past interactions survive between conversations. The architecture that makes agents feel like they actually know you.

Tools: Mem0, Zep Complexity: High

Mem0 ↗ Zep ↗

Level 04

Tool Integration

Tools are what make agents useful. Without them, you just have a chatbot. The choice here is about how your agent discovers, authenticates with, and calls external services. MCP is becoming the universal standard. Everything else has tradeoffs.

MCP (stdio) Protocol

Model Context Protocol over local stdin/stdout transport. The MCP server runs on the same machine as your agent. Tools are self-describing. No network latency. Ideal for local development and single-machine deployments.

Transport: Local (stdio) Lock-in: None

MCP Spec ↗ Hub Article

MCP (HTTP/SSE) Protocol

Model Context Protocol over network transport. MCP servers run remotely and agents connect via HTTP with Server-Sent Events for streaming. Enables cross-environment tool sharing. The production-grade MCP transport for distributed systems.

Transport: Network (HTTP/SSE) Lock-in: None

MCP Spec ↗ Hub Article

OpenAI Function Calling Managed

Define tools as JSON schemas in the API request. The model decides which function to call and generates the arguments. You execute the function and return results. Straightforward. No extra infrastructure needed. The simplest way to give an agent tools.

Lock-in: OpenAI Setup: Minimal

Docs ↗ Hub Article

Claude tool_use Managed

Anthropic's native tool calling interface. Tools defined as JSON schemas with descriptions. Claude selects tools, generates inputs, and you handle execution. Supports parallel tool calls and streaming tool results for responsive UIs.

Lock-in: Anthropic Parallel: Yes

Docs ↗

OpenAPI Auto-Generation Tool

Auto-generate tool wrappers from existing OpenAPI/Swagger specs. If your APIs already have good specs, you skip writing tool definitions entirely. The agent gets typed, documented tools derived directly from your API contracts.

Input: OpenAPI spec Output: Agent tools

LangChain Tools ↗

LangChain Tools Open Source

Framework-native tool ecosystem. Hundreds of pre-built integrations for search engines, databases, APIs, file systems, and more. Consistent interface across all tools. The biggest tool ecosystem in the agent framework space.

Integrations: 700+ Ecosystem: LangChain

Docs ↗ Hub Article

Semantic Kernel Plugins Enterprise

Microsoft's plugin architecture for AI tools. Semantic functions (prompts) and native functions (code) share the same plugin interface. Deep integration with Microsoft 365, Azure services, and the broader .NET ecosystem.

Ecosystem: Microsoft Languages: C#, Python, Java

Docs ↗

Composio Platform

Managed tool integration platform with 500+ pre-built connectors. Handles OAuth, authentication, and rate limiting for you. Works across agent frameworks. The fastest way to connect your agent to SaaS tools like Slack, GitHub, Jira, and Salesforce.

Connectors: 500+ Auth: Managed OAuth

Docs ↗

Level 05

Orchestration Patterns

Orchestration is how your agents coordinate. A single agent handles everything alone. A hierarchy delegates to specialists. A swarm lets agents self-organize. More agents means more power, but also more failure modes.

Single Agent Pattern

One agent owns the entire task from start to finish. Simplest architecture, easiest to debug, and clearest accountability. Start here unless you have a proven reason for multi-agent. Most problems don't need multiple agents.

Complexity: Lowest Debuggability: Easy

Hub Article

Sequential Pipeline Pattern

A linear chain of specialized agents. Agent A finishes, hands off to Agent B, then Agent C. Predictable flow with clear step boundaries. Great for data processing, content generation, and any workflow that follows a natural sequence.

Agents: Multi (linear) Latency: Sum of steps

Hub Article

Parallel Fan-Out Pattern

Multiple agents work simultaneously on different subtasks. Results are aggregated when all complete. Total latency equals the slowest branch, not the sum. Ideal for research gathering, multi-source analysis, and any task that decomposes into independent pieces.

Agents: Multi (parallel) Latency: Max of branches

Hub Article

Hierarchical Supervisor Pattern

A manager agent decomposes tasks and delegates to worker agents. The manager synthesizes results and decides next steps. Clear accountability chain. The go-to pattern for complex business workflows that need structured delegation and oversight.

Agents: Manager + Workers Risk: Orchestration hijacking (L3-T3)

Hub Article

Group Chat / Debate Pattern

Agents collaborate in a shared conversation space. Each agent has a different expertise or perspective. The group converges on a solution through dialogue. Powerful for brainstorming and consensus-building. Hard to predict and harder to debug.

Agents: Multi (shared context) Debuggability: Difficult

AutoGen ↗

State Machine (Graph) Pattern

Explicit state graph with conditional transitions. You define every possible state, every edge, and every condition. The agent moves through the graph deterministically based on outputs. LangGraph's native pattern. Maximum control, maximum auditability.

Framework: LangGraph Auditability: Highest

LangGraph ↗ Hub Article

Event-Driven Reactive Pattern

Agents respond to events and triggers asynchronously. No polling. No waiting. An event fires, the right agent activates, processes, and returns to idle. Pairs naturally with serverless deployment for zero-cost idle periods.

Trigger: Events / Webhooks Idle cost: Zero

Haystack Events ↗

Swarm (Decentralized) Pattern

Peer-to-peer agents with no central controller. Agents hand off tasks to each other based on local context. Extremely scalable. Also extremely unpredictable. Cascading failures and rogue agents are real risks. Only for teams with production multi-agent experience.

Control: Decentralized Risk: Rogue agents (T13)

Hub Article

Level 06

Security Controls

Agents that call tools, access data, and make decisions are attack surfaces. Security isn't optional once you move past prototyping. The question is how many layers you need. The answer depends on what your agent can do and who it serves.

Input Validation Control

Sanitize and validate all inputs before they reach the model. Block obvious injection patterns, enforce length limits, and reject malformed requests. The first line of defense. Necessary but not sufficient on its own.

Layer: L1 (Foundation) Overhead: Low

Hub Article

Output Filtering Control

Classify and filter agent outputs for harmful content, PII leakage, and policy violations before they reach the user. Catches cascading hallucinations and misaligned behaviors. The minimum security for any shared agent.

Threats: L1-T3, L1-T4 Overhead: Low

Hub Article

AWS Bedrock Guardrails Managed

Managed guardrails service on AWS. Configure content filters, denied topics, word filters, and PII redaction without writing code. Applies to both inputs and outputs. The easiest path to basic content safety for Bedrock-hosted agents.

Provider: AWS PII redaction: Yes

Docs ↗ Hub Article

NVIDIA NeMo Guardrails Open Source

Programmable guardrails toolkit from NVIDIA. Define rails in Colang, a domain-specific language for conversational safety. Supports topical rails, moderation rails, and custom fact-checking flows. Framework-agnostic.

Language: Colang DSL Lock-in: None

Docs ↗

Human-in-the-Loop Gates Pattern

Require human approval before the agent executes high-risk actions. Send money, delete data, contact customers? A human reviews first. Adds latency. Also prevents catastrophic autonomous failures. Critical for customer-facing and financial agents.

Threats: L4-T1 (tool misuse) Risk: OWASP T10 (HITL overwhelm)

Hub Article

Sandboxed Execution Control

Run agent code in isolated containers with restricted network access and filesystem permissions. Prevents prompt-to-RCE attacks where an agent generates and executes malicious code. Non-negotiable for code generation agents.

Tools: gVisor, Firecracker Threats: L4-T3 (RCE), L5-T2

Hub Article

NHI / Least Privilege Control

Treat each agent as a Non-Human Identity with its own scoped credentials. Strict RBAC, time-limited access tokens, and minimal permissions per task. No shared service accounts. No persistent admin access. The identity layer that regulated environments require.

Threats: L4-T2, L4-T5, L5-T5 Compliance: Strong

Hub Article

CSA MAESTRO Framework Framework

Cloud Security Alliance's 7-layer threat model for agentic AI. Maps 39 threats across foundation model, data, agent core, tool integration, deployment, ecosystem, and governance layers. The most comprehensive agent threat taxonomy available today.

Layers: 7 Threats: 39

CSA ↗ Hub Article

Red Teaming Pattern

Adversarial testing specifically designed for AI agents. Probe for prompt injection, tool misuse, privilege escalation, and behavioral drift. CSA's Red Teaming Guide provides structured methodology. Not a one-time exercise. Run it continuously.

Methodology: CSA Guide Frequency: Continuous

CSA Guide ↗ Hub Article

Level 07

Governance Standards

Governance turns ad hoc AI projects into auditable, certifiable, and legally defensible systems. NIST provides the risk framework. ISO 42001 makes it certifiable. The EU AI Act makes it mandatory. Most production agents need at least two of these three.

NIST AI RMF Standard

The voluntary risk management framework from NIST. Four core functions: Govern, Map, Measure, Manage. It won't get you certified, but it gives you structured risk thinking. The foundation that ISO and EU AI Act build on. Start here if you're new to AI governance.

Doc: NIST AI 100-1 Certification: None (voluntary)

NIST ↗ Hub Article

NIST AI 600-1 Standard

NIST's GenAI Risk Profile. Extends the AI RMF specifically for generative AI risks including hallucination, data privacy, environmental impact, and harmful content generation. Maps 12 risk categories to RMF actions. Essential for any team deploying generative agents.

Risks: 12 categories Extends: NIST AI 100-1

NIST ↗ Hub Article

ISO 42001 Standard

The certifiable AI management system standard. Plan-Do-Check-Act cycle with auditable controls. Your Statement of Applicability defines which controls apply to your agent. Getting certified proves to customers, regulators, and partners that you govern AI systematically.

Certification: Yes (third-party audit) Timeline: 6-12 months

ISO ↗ Hub Article

EU AI Act Regulation

The world's first comprehensive AI regulation. Risk-based classification from minimal to unacceptable. High-risk agents (healthcare, finance, HR) must meet Articles 9-15 requirements including risk management, data governance, and human oversight. Non-negotiable for EU market access.

Doc: Regulation (EU) 2024/1689 Enforcement: Active

EUR-Lex ↗ Hub Article

BBOM Documentation

Behavioral Bill of Materials. A structured document describing what your agent can do, which tools it can access, what decisions it can make, and what guardrails constrain it. Think of it as an SBOM but for agent behavior. Required output of any serious governance process.

Purpose: Agent transparency Audience: Auditors, operators

Hub Article

SOC 2 Standard

Service Organization Control 2. Not AI-specific, but increasingly relevant for AI agent services. Covers security, availability, processing integrity, confidentiality, and privacy. Many enterprise customers require SOC 2 Type II before they'll trust your agent with their data.

Trust criteria: 5 Type II: 12-month audit

AICPA ↗

TEVV Practice

Testing, Evaluation, Verification, and Validation. The ongoing practice of proving your agent does what it claims and nothing more. Not a one-time check. Continuous evaluation with automated monitoring for behavioral drift, performance degradation, and safety violations.

Frequency: Continuous Includes: Drift detection

NIST TEVV ↗ Hub Article

Level 08

Deployment & Observability

Getting an agent to production is half the battle. Keeping it running, monitored, and debuggable is the other half. Choose your deployment model based on control requirements. Then layer observability tools on top so you can actually see what your agent is doing.

AWS Bedrock Platform

Fully managed agent hosting on AWS. Agents, knowledge bases, guardrails, and monitoring in one service. Auto-scales. Integrates with the full AWS ecosystem. The path of least resistance for teams already on AWS who want production agents fast.

Scaling: Auto (managed) Monitoring: CloudWatch

Docs ↗ Hub Article

Google Vertex AI Platform

Google Cloud's ML platform with integrated agent hosting via ADK. Evaluation-first tooling, Model Garden for model selection, and tight integration with BigQuery for data-heavy agents. The natural home for Gemini-based agents.

Scaling: Auto (managed) Evals: Built-in

Docs ↗ Hub Article

Azure AI Foundry Platform

Microsoft's unified AI platform. Managed agents with 1,400+ pre-built connectors. Deep integration with Microsoft 365, Dynamics, and the enterprise Microsoft stack. If your organization runs on Microsoft, this is where your agents live.

Connectors: 1,400+ Monitoring: Azure Monitor

Docs ↗ Hub Article

Docker / Kubernetes Open Source

Self-hosted containerized deployment. Full control over your infrastructure, networking, and scaling. You manage everything, but you own everything. The right choice for teams with strong DevOps maturity and data sovereignty requirements.

Scaling: K8s HPA Data sovereignty: Full

K8s Docs ↗

Serverless Functions Managed

Event-triggered agent execution on AWS Lambda, Google Cloud Functions, or Azure Functions. Zero cost when idle. Auto-scales to demand. Watch out for cold start latency and execution time limits. Best for event-driven agents with bursty traffic patterns.

Idle cost: $0 Risk: Cold starts

Lambda ↗ Cloud Functions ↗

LangSmith Platform

LangChain's observability platform. Full trace visibility for every agent run, including LLM calls, tool invocations, and retrieval steps. Built-in evaluation datasets and regression testing. The default choice for LangGraph/LangChain teams.

Traces: Full agent runs Evals: Built-in

Docs ↗

Langfuse Open Source

Open-source LLM observability platform. Self-hostable for data sovereignty. Framework-agnostic tracing, prompt management, and evaluation. A strong alternative to LangSmith when you need full control over your observability data.

Self-host: Yes Framework: Agnostic

Docs ↗

Arize Platform

ML and LLM observability with a focus on production monitoring. Tracks embeddings, retrieval quality, and model drift over time. Strong visualization for understanding how agent performance changes across releases and traffic patterns.

Focus: Production monitoring Drift detection: Yes

Docs ↗

Braintrust Platform

Evaluation-focused platform for LLM applications. Structured experiments with side-by-side comparison, scoring functions, and dataset management. Helps you answer the question: "Did this change make my agent better or worse?"

Focus: Evaluations A/B testing: Yes

Docs ↗

AgentOps Platform

Purpose-built observability for AI agents. Session replay, cost tracking, and agent-specific analytics. See exactly what your agent did, which tools it called, how much it cost, and where it went wrong. One line of code to integrate.

Focus: Agent-native Setup: 1 line

Docs ↗