Agent Solutions Reference
Every tool, framework, and standard from the Blueprint Quest. Explained.
Building an AI agent means making eight architectural decisions. Each one shapes what your agent can do, how it fails, and whether it survives production. This directory covers every model, framework, memory system, tool integration, orchestration pattern, security control, governance standard, and deployment platform that appears in the Blueprint Quest. You'll find descriptions, key stats, and links to authoritative documentation. Use it as a companion while you play the Quest, or as a standalone reference when you're deep in a build. No fluff. Just the options you need to evaluate.
Foundation Models
The model you choose sets the ceiling for your agent's reasoning, cost, and deployment flexibility. Frontier models maximize capability. Open-source models maximize control. Routing strategies let you optimize both.
Anthropic's most capable model. Excels at complex reasoning, extended tool use, and agentic workflows that require sustained focus across many steps. Best-in-class for code generation and analysis tasks.
Balanced performance and cost. Fast enough for real-time workflows, smart enough for most agent tasks. The workhorse model for production agents that need to stay within budget without sacrificing quality.
OpenAI's flagship multimodal model. Handles text, vision, and audio natively. Strong tool-calling support with structured outputs and parallel function execution. A reliable default for many agent architectures.
OpenAI's cost-optimized model. Surprisingly capable for its price point. Ideal for high-volume agent tasks where you need thousands of calls per hour without burning through your budget.
Google's top-tier reasoning model with native multimodal support. Its massive context window makes it a strong choice for agents that need to process large documents, codebases, or multi-turn conversations.
Speed-optimized variant with the same context window as Pro. Perfect for latency-sensitive agents that still need strong reasoning. Google's answer to the "fast and smart" category.
Meta's flagship open-weight model. Near-frontier reasoning with full data sovereignty. You can self-host it, fine-tune it, and audit every weight. The go-to for teams that need control over their entire stack.
Mistral AI's largest model at 123B parameters. Strong multilingual capabilities and function calling support. A solid European alternative for teams with EU data residency requirements.
Mixture-of-Experts architecture. Only activates a subset of parameters per token, giving you strong performance at a fraction of the compute cost. Ideal for cost-effective multi-agent deployments where you need many concurrent model calls.
Microsoft's small language model. Punches well above its weight class for its size. Designed for edge and on-device deployment where you need an agent running locally without GPU infrastructure.
Not a single model. It's an architecture pattern. A router dispatches each task to the optimal model based on complexity, cost, and latency requirements. Simple queries go to fast, cheap models. Hard problems go to frontier models. You optimize both cost and capability dynamically.
Agent Frameworks
Your framework determines how agents reason, remember, and execute. Some give you full graph-based control. Others handle the infrastructure so you can focus on business logic. Choose based on how much abstraction you want.
Graph-based stateful workflows from the LangChain team. You define agent logic as nodes and edges with explicit state transitions. The most production-hardened open-source option with LangSmith observability built in.
The foundational LLM orchestration library. Chains, prompts, retrievers, and tools in a composable package. LangGraph builds on top of it for complex agents, but LangChain alone handles simpler single-agent patterns well.
Role-based multi-agent teams. You define agents with roles, goals, and backstories, then assign them tasks. Intuitive mental model for teams new to multi-agent systems. Great for research and content workflows.
Microsoft's conversational multi-agent framework. Agents collaborate through message-passing in group chats. Now in maintenance mode as Microsoft shifts focus to the Agent Framework and Semantic Kernel. Still used in existing deployments.
Microsoft's enterprise AI orchestration SDK. Plugin-based architecture with deep Azure integration. The natural choice for .NET shops and teams already in the Microsoft ecosystem. Supports the new Microsoft Agent Framework.
Lightweight agent framework from OpenAI. Built around handoffs between specialized agents, with built-in guardrails and tracing. Supports voice agents natively. Tight integration with the OpenAI API.
Anthropic's native agent toolkit. Built for Claude's tool_use architecture with first-class MCP support and computer use capabilities. Low learning curve if you're already building with Claude. Ideal for single-agent patterns.
Fully managed agents on AWS. Built-in guardrails, knowledge bases, and managed sessions. The fastest path from zero to production for customer-facing agents. You trade flexibility for operational simplicity.
Google's Agent Development Kit. Evaluation-first design with built-in testing primitives. Optimized for Gemini models and Google Cloud infrastructure. Strong integration with Vertex AI for production deployment.
Pipeline-as-code from deepset. Modular components snap together for RAG, agents, and document processing. Lightweight and clean. A strong choice if you want explicit, readable pipelines without the overhead of a full agent framework.
Memory Systems
Memory determines whether your agent learns, forgets, or hallucinates. Stateless is safe and simple. Vector databases ground responses in facts. Hybrid architectures give agents both short-term context and long-term knowledge.
Every request is independent. No conversation history, no persistence. The simplest architecture and the easiest to audit. Works for one-shot tasks like classification or single-turn code generation.
A fixed-size buffer of recent messages. Oldest messages drop off as new ones arrive. Simple to implement and predictable in cost. The default starting point for conversational agents that don't need long-term recall.
Managed vector database built for RAG workloads. Handles embedding storage, similarity search, and metadata filtering at scale. You don't manage infrastructure. The most popular choice for production RAG pipelines.
Open-source embedding database that runs locally or in the cloud. Dead simple API. Great for prototyping RAG agents quickly. Scales from a developer laptop to a production cluster without changing your code.
Vector database with built-in vectorization modules. Supports hybrid search combining vector similarity with keyword matching. Strong multimodal support for agents that work with images, not just text.
Vector similarity search as a PostgreSQL extension. If your app already runs on Postgres, you don't need a separate vector database. Add embeddings to your existing tables and query them with SQL. Simple and effective.
In-memory data store with vector search capabilities. Blazing fast for session-level agent memory and short-term context caching. Combine it with a persistent vector DB for a two-tier memory architecture.
Graph database for structured knowledge. Stores entity relationships that agents can traverse for reasoning. Perfect when your agent needs to understand connections between concepts, people, or systems. Not just similarity search. Real structured reasoning.
Full conversation history with LLM-based summarization when you approach the context limit. The model compresses older turns into summaries while keeping recent messages intact. Balances context preservation with token efficiency.
Working memory for the current session plus persistent storage for cross-session context. User preferences, learned patterns, and past interactions survive between conversations. The architecture that makes agents feel like they actually know you.
Tool Integration
Tools are what make agents useful. Without them, you just have a chatbot. The choice here is about how your agent discovers, authenticates with, and calls external services. MCP is becoming the universal standard. Everything else has tradeoffs.
Model Context Protocol over local stdin/stdout transport. The MCP server runs on the same machine as your agent. Tools are self-describing. No network latency. Ideal for local development and single-machine deployments.
Model Context Protocol over network transport. MCP servers run remotely and agents connect via HTTP with Server-Sent Events for streaming. Enables cross-environment tool sharing. The production-grade MCP transport for distributed systems.
Define tools as JSON schemas in the API request. The model decides which function to call and generates the arguments. You execute the function and return results. Straightforward. No extra infrastructure needed. The simplest way to give an agent tools.
Anthropic's native tool calling interface. Tools defined as JSON schemas with descriptions. Claude selects tools, generates inputs, and you handle execution. Supports parallel tool calls and streaming tool results for responsive UIs.
Auto-generate tool wrappers from existing OpenAPI/Swagger specs. If your APIs already have good specs, you skip writing tool definitions entirely. The agent gets typed, documented tools derived directly from your API contracts.
Framework-native tool ecosystem. Hundreds of pre-built integrations for search engines, databases, APIs, file systems, and more. Consistent interface across all tools. The biggest tool ecosystem in the agent framework space.
Microsoft's plugin architecture for AI tools. Semantic functions (prompts) and native functions (code) share the same plugin interface. Deep integration with Microsoft 365, Azure services, and the broader .NET ecosystem.
Managed tool integration platform with 500+ pre-built connectors. Handles OAuth, authentication, and rate limiting for you. Works across agent frameworks. The fastest way to connect your agent to SaaS tools like Slack, GitHub, Jira, and Salesforce.
Orchestration Patterns
Orchestration is how your agents coordinate. A single agent handles everything alone. A hierarchy delegates to specialists. A swarm lets agents self-organize. More agents means more power, but also more failure modes.
One agent owns the entire task from start to finish. Simplest architecture, easiest to debug, and clearest accountability. Start here unless you have a proven reason for multi-agent. Most problems don't need multiple agents.
A linear chain of specialized agents. Agent A finishes, hands off to Agent B, then Agent C. Predictable flow with clear step boundaries. Great for data processing, content generation, and any workflow that follows a natural sequence.
Multiple agents work simultaneously on different subtasks. Results are aggregated when all complete. Total latency equals the slowest branch, not the sum. Ideal for research gathering, multi-source analysis, and any task that decomposes into independent pieces.
A manager agent decomposes tasks and delegates to worker agents. The manager synthesizes results and decides next steps. Clear accountability chain. The go-to pattern for complex business workflows that need structured delegation and oversight.
Agents collaborate in a shared conversation space. Each agent has a different expertise or perspective. The group converges on a solution through dialogue. Powerful for brainstorming and consensus-building. Hard to predict and harder to debug.
Explicit state graph with conditional transitions. You define every possible state, every edge, and every condition. The agent moves through the graph deterministically based on outputs. LangGraph's native pattern. Maximum control, maximum auditability.
Agents respond to events and triggers asynchronously. No polling. No waiting. An event fires, the right agent activates, processes, and returns to idle. Pairs naturally with serverless deployment for zero-cost idle periods.
Peer-to-peer agents with no central controller. Agents hand off tasks to each other based on local context. Extremely scalable. Also extremely unpredictable. Cascading failures and rogue agents are real risks. Only for teams with production multi-agent experience.
Security Controls
Agents that call tools, access data, and make decisions are attack surfaces. Security isn't optional once you move past prototyping. The question is how many layers you need. The answer depends on what your agent can do and who it serves.
Sanitize and validate all inputs before they reach the model. Block obvious injection patterns, enforce length limits, and reject malformed requests. The first line of defense. Necessary but not sufficient on its own.
Classify and filter agent outputs for harmful content, PII leakage, and policy violations before they reach the user. Catches cascading hallucinations and misaligned behaviors. The minimum security for any shared agent.
Managed guardrails service on AWS. Configure content filters, denied topics, word filters, and PII redaction without writing code. Applies to both inputs and outputs. The easiest path to basic content safety for Bedrock-hosted agents.
Programmable guardrails toolkit from NVIDIA. Define rails in Colang, a domain-specific language for conversational safety. Supports topical rails, moderation rails, and custom fact-checking flows. Framework-agnostic.
Require human approval before the agent executes high-risk actions. Send money, delete data, contact customers? A human reviews first. Adds latency. Also prevents catastrophic autonomous failures. Critical for customer-facing and financial agents.
Run agent code in isolated containers with restricted network access and filesystem permissions. Prevents prompt-to-RCE attacks where an agent generates and executes malicious code. Non-negotiable for code generation agents.
Treat each agent as a Non-Human Identity with its own scoped credentials. Strict RBAC, time-limited access tokens, and minimal permissions per task. No shared service accounts. No persistent admin access. The identity layer that regulated environments require.
Cloud Security Alliance's 7-layer threat model for agentic AI. Maps 39 threats across foundation model, data, agent core, tool integration, deployment, ecosystem, and governance layers. The most comprehensive agent threat taxonomy available today.
Adversarial testing specifically designed for AI agents. Probe for prompt injection, tool misuse, privilege escalation, and behavioral drift. CSA's Red Teaming Guide provides structured methodology. Not a one-time exercise. Run it continuously.
Governance Standards
Governance turns ad hoc AI projects into auditable, certifiable, and legally defensible systems. NIST provides the risk framework. ISO 42001 makes it certifiable. The EU AI Act makes it mandatory. Most production agents need at least two of these three.
The voluntary risk management framework from NIST. Four core functions: Govern, Map, Measure, Manage. It won't get you certified, but it gives you structured risk thinking. The foundation that ISO and EU AI Act build on. Start here if you're new to AI governance.
NIST's GenAI Risk Profile. Extends the AI RMF specifically for generative AI risks including hallucination, data privacy, environmental impact, and harmful content generation. Maps 12 risk categories to RMF actions. Essential for any team deploying generative agents.
The certifiable AI management system standard. Plan-Do-Check-Act cycle with auditable controls. Your Statement of Applicability defines which controls apply to your agent. Getting certified proves to customers, regulators, and partners that you govern AI systematically.
The world's first comprehensive AI regulation. Risk-based classification from minimal to unacceptable. High-risk agents (healthcare, finance, HR) must meet Articles 9-15 requirements including risk management, data governance, and human oversight. Non-negotiable for EU market access.
Behavioral Bill of Materials. A structured document describing what your agent can do, which tools it can access, what decisions it can make, and what guardrails constrain it. Think of it as an SBOM but for agent behavior. Required output of any serious governance process.
Service Organization Control 2. Not AI-specific, but increasingly relevant for AI agent services. Covers security, availability, processing integrity, confidentiality, and privacy. Many enterprise customers require SOC 2 Type II before they'll trust your agent with their data.
Testing, Evaluation, Verification, and Validation. The ongoing practice of proving your agent does what it claims and nothing more. Not a one-time check. Continuous evaluation with automated monitoring for behavioral drift, performance degradation, and safety violations.
Deployment & Observability
Getting an agent to production is half the battle. Keeping it running, monitored, and debuggable is the other half. Choose your deployment model based on control requirements. Then layer observability tools on top so you can actually see what your agent is doing.
Fully managed agent hosting on AWS. Agents, knowledge bases, guardrails, and monitoring in one service. Auto-scales. Integrates with the full AWS ecosystem. The path of least resistance for teams already on AWS who want production agents fast.
Google Cloud's ML platform with integrated agent hosting via ADK. Evaluation-first tooling, Model Garden for model selection, and tight integration with BigQuery for data-heavy agents. The natural home for Gemini-based agents.
Microsoft's unified AI platform. Managed agents with 1,400+ pre-built connectors. Deep integration with Microsoft 365, Dynamics, and the enterprise Microsoft stack. If your organization runs on Microsoft, this is where your agents live.
Self-hosted containerized deployment. Full control over your infrastructure, networking, and scaling. You manage everything, but you own everything. The right choice for teams with strong DevOps maturity and data sovereignty requirements.
Event-triggered agent execution on AWS Lambda, Google Cloud Functions, or Azure Functions. Zero cost when idle. Auto-scales to demand. Watch out for cold start latency and execution time limits. Best for event-driven agents with bursty traffic patterns.
LangChain's observability platform. Full trace visibility for every agent run, including LLM calls, tool invocations, and retrieval steps. Built-in evaluation datasets and regression testing. The default choice for LangGraph/LangChain teams.
Open-source LLM observability platform. Self-hostable for data sovereignty. Framework-agnostic tracing, prompt management, and evaluation. A strong alternative to LangSmith when you need full control over your observability data.
ML and LLM observability with a focus on production monitoring. Tracks embeddings, retrieval quality, and model drift over time. Strong visualization for understanding how agent performance changes across releases and traffic patterns.
Evaluation-focused platform for LLM applications. Structured experiments with side-by-side comparison, scoring functions, and dataset management. Helps you answer the question: "Did this change make my agent better or worse?"
Purpose-built observability for AI agents. Session replay, cost tracking, and agent-specific analytics. See exactly what your agent did, which tools it called, how much it cost, and where it went wrong. One line of code to integrate.