CrewAI: Multi-Agent Orchestration Framework for Python
The definitive guide to CrewAI -- the role-based multi-agent orchestration framework for Python. From core concepts (Agent, Task, Crew, Process) to tools integration, memory systems, Flows API, LLM backends, advanced patterns, framework comparison, and production deployment with CrewAI+.
By Jose Nobile | Published 2026-04-23 | 14 min read
1. What Is CrewAI?
CrewAI is an open-source Python framework for orchestrating autonomous AI agents that collaborate as a crew. Created by João Moura and released in late 2023, it models multi-agent systems as teams of role-playing specialists -- each agent has a role, a goal, and a backstory that shapes its behavior. Agents work together on tasks organized into sequential or hierarchical processes, producing structured outputs that flow through the pipeline.
The core design philosophy is role-based collaboration over rigid programming. Instead of writing explicit control flow for every decision, you define agents with natural-language roles and goals, assign them tasks with expected outputs, and let the framework orchestrate their collaboration. This makes CrewAI particularly effective for knowledge work automation: research, content creation, analysis, planning, and any workflow where multiple specialists contribute to a shared outcome.
As of April 2026, CrewAI has over 25,000 GitHub stars and is one of the most widely adopted multi-agent frameworks in the Python ecosystem. The framework supports any LLM provider (OpenAI, Anthropic Claude, Google Gemini, Ollama, Groq, Azure OpenAI, and more), includes a built-in memory system, integrates with MCP servers and LangChain tools, and offers CrewAI+ -- a managed platform for production deployment with monitoring, testing, and enterprise features.
When to Use Multi-Agent
Use CrewAI when your workflow requires multiple distinct perspectives or specializations -- research + writing, analysis + review, planning + execution. Multi-agent shines when tasks benefit from division of labor, when different steps need different LLM configurations, or when you want agents to critique and refine each other's work.
When a Single Agent Suffices
If your workflow is a single question-answer loop or a linear tool-calling chain, a single agent (via the OpenAI Agents SDK or Claude Agent SDK) is simpler and faster. CrewAI adds value when the complexity of coordination between specialists justifies the orchestration overhead.
CrewAI's Sweet Spot
CrewAI excels at structured multi-step workflows where each step has a clear owner: content pipelines (research, draft, edit, publish), data analysis (collect, process, analyze, report), customer operations (triage, resolve, follow-up), and project planning (scope, estimate, schedule, assign).
2. Core Concepts
Agent
An autonomous unit with a role, goal, and backstory. Each agent wraps an LLM and can use tools, delegate tasks to other agents, and maintain memory. Agents are the workers of your crew -- think of them as specialized team members with distinct expertise and responsibilities.
Task
A unit of work assigned to an agent. Each task has a description, expected output format, and optionally a context (list of other tasks whose outputs feed into this one). Tasks are the building blocks of your workflow -- they define what needs to be done and what the result should look like.
Crew
The orchestrator that brings agents and tasks together. A Crew defines which agents participate, what tasks they execute, which process type governs execution order, and shared configuration like memory, verbosity, and LLM settings. Calling crew.kickoff() starts the orchestration.
Process
The execution strategy for the crew. Sequential runs tasks one by one in order. Hierarchical uses a manager agent to delegate tasks dynamically. Consensual (experimental) lets agents vote on task assignments. The process type determines how collaboration flows.
Tool
A capability that agents can use to interact with the outside world -- search the web, read files, query databases, call APIs, execute code. CrewAI supports built-in tools, custom function tools, LangChain tools, and MCP server tools. Tools bridge the gap between LLM reasoning and real-world actions.
from crewai import Agent, Task, Crew, Process
# Define agents
researcher = Agent(
role="Senior Research Analyst",
goal="Find and synthesize the latest information on {topic}",
backstory="You are an expert research analyst with 15 years of "
"experience. You excel at finding patterns in data and "
"presenting clear, actionable insights.",
verbose=True
)
writer = Agent(
role="Technical Writer",
goal="Write a compelling technical article based on research findings",
backstory="You are a skilled technical writer who transforms complex "
"research into clear, engaging content for developers."
)
# Define tasks
research_task = Task(
description="Research the latest developments in {topic}. "
"Focus on key trends, major players, and technical details.",
expected_output="A comprehensive research brief with key findings, "
"statistics, and sources.",
agent=researcher
)
writing_task = Task(
description="Write a technical article based on the research findings.",
expected_output="A polished 1500-word technical article with clear "
"sections, code examples, and actionable takeaways.",
agent=writer,
context=[research_task] # receives output from research_task
)
# Create and run the crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True
)
result = crew.kickoff(inputs={"topic": "multi-agent AI systems"})
3. Agent Design
Agent design is the most important part of building effective crews. Each agent is defined by three natural-language fields that shape its behavior: role (what the agent is), goal (what it is trying to achieve), and backstory (context that gives the agent personality and expertise). These fields are injected into the system prompt, so crafting them well is critical for agent performance.
Beyond the core identity fields, agents accept configuration for LLM assignment (llm parameter), tool access (tools list), delegation behavior (allow_delegation), maximum iteration limits (max_iter), maximum execution time (max_execution_time), and memory settings. You can assign different LLMs to different agents -- use a powerful model for complex reasoning tasks and a fast, cheap model for simple data extraction.
Role Definition
The role is a job title that sets the agent's expertise domain. Be specific: "Senior Data Analyst specializing in financial markets" is better than "Analyst". The role appears in the system prompt and influences how the LLM approaches problems. Good roles create clear boundaries between agents.
Goal Setting
The goal tells the agent what it is trying to achieve. Goals should be outcome-oriented, not process-oriented: "Produce accurate financial analysis with actionable recommendations" rather than "Analyze data". Goals support {variable} interpolation, so they adapt dynamically to each crew run.
Backstory Crafting
The backstory provides context and personality. It grounds the agent in a realistic professional identity, which improves output quality. Include relevant experience, working style, and domain knowledge. Backstories of 2-4 sentences work well -- enough to establish expertise without overwhelming the context window.
LLM Assignment
Each agent can use a different LLM via the llm parameter. Pass a model string like "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", or "ollama/llama3.2". This lets you optimize cost and quality: expensive models for reasoning-heavy agents, fast models for data extraction or formatting agents.
Delegation Control
When allow_delegation=True (default), an agent can ask other agents in the crew to help with sub-tasks. This enables emergent collaboration where a writer asks a researcher for more data. Set allow_delegation=False to keep agents strictly independent, reducing token usage and execution time.
Execution Limits
Use max_iter (default 20) to cap the number of reasoning iterations per task, preventing infinite loops. Use max_execution_time (in seconds) to set a hard time limit. Use max_rpm to rate-limit API calls per minute. These guardrails prevent runaway costs and ensure predictable execution.
from crewai import Agent, LLM
# Agent with specific LLM and configuration
analyst = Agent(
role="Senior Financial Analyst",
goal="Produce accurate financial analysis with actionable "
"investment recommendations for {company}",
backstory="You are a CFA charterholder with 20 years of experience "
"in equity research. You specialize in technology sector "
"analysis and are known for your rigorous, data-driven "
"approach to valuation.",
llm=LLM(model="anthropic/claude-sonnet-4-20250514", temperature=0.1),
tools=[financial_data_tool, sec_filing_tool],
allow_delegation=False,
max_iter=15,
max_execution_time=300, # 5 minutes
verbose=True
)
# YAML-based agent definition (config/agents.yaml)
# researcher:
# role: "Senior Research Analyst"
# goal: "Find comprehensive information on {topic}"
# backstory: "Expert researcher with deep domain knowledge."
# llm: openai/gpt-4o
# max_iter: 10
4. Task Definition
Tasks are the units of work in CrewAI. Each task has a description (what to do), an expected_output (what the result should look like), and an agent (who does it). The description and expected_output support {variable} interpolation, so you can parameterize tasks at runtime via crew.kickoff(inputs={...}).
The context parameter creates data dependencies between tasks. When a task lists other tasks in its context, it receives their outputs as additional input. This is how information flows through a crew: the researcher's output feeds the writer, the writer's output feeds the editor. Context creates a DAG (directed acyclic graph) of task dependencies that CrewAI resolves automatically.
Description Best Practices
Write clear, specific descriptions that tell the agent exactly what to do. Include constraints, scope boundaries, and quality criteria. Poor: "Research AI". Good: "Research the top 5 multi-agent orchestration frameworks released in 2025-2026, comparing their architecture, adoption, and production readiness."
Expected Output
The expected_output field is critical -- it tells the agent exactly what format and content to produce. Be precise: "A JSON object with keys: frameworks (array of objects with name, architecture, stars, pros, cons)" is better than "A summary". Use output_json or output_pydantic for structured validation.
Context Chaining
Context creates data flow between tasks. A task with context=[task_a, task_b] receives the outputs of both tasks as additional input. This creates a pipeline where each stage builds on previous results. Context works in both sequential and hierarchical processes.
Async Execution
Set async_execution=True to run a task concurrently with the next task in the sequence. This is useful when tasks are independent and can run in parallel -- for example, researching two different topics simultaneously. The crew waits for all async tasks to complete before moving to dependent tasks.
Structured Output
Use output_json=MyModel or output_pydantic=MyModel to enforce structured output validation with Pydantic models. The agent's output is parsed and validated against the schema. If validation fails, the agent retries with feedback about what went wrong. This ensures reliable, machine-readable outputs.
Task Callbacks
Attach a callback function to any task to execute custom logic when the task completes. Callbacks receive the task output and can trigger side effects: save to database, send notifications, update dashboards, or feed results into external systems. Useful for integrating CrewAI into larger application pipelines.
from crewai import Task
from pydantic import BaseModel
from typing import List
# Structured output model
class ResearchReport(BaseModel):
topic: str
key_findings: List[str]
recommendations: List[str]
confidence_score: float
# Task with structured output and context
research_task = Task(
description="Research {topic} and produce a structured analysis. "
"Focus on: current state, key trends, major players, "
"and technical challenges. Use available search tools.",
expected_output="A structured research report with key findings, "
"actionable recommendations, and confidence score.",
agent=researcher,
output_pydantic=ResearchReport
)
# Async tasks that run in parallel
market_task = Task(
description="Analyze market trends for {topic}.",
expected_output="Market analysis with growth projections.",
agent=market_analyst,
async_execution=True
)
tech_task = Task(
description="Analyze technical landscape for {topic}.",
expected_output="Technical analysis with architecture comparisons.",
agent=tech_analyst,
async_execution=True
)
# Task with callback
def save_report(output):
with open("report.md", "w") as f:
f.write(output.raw)
print(f"Report saved: {len(output.raw)} characters")
final_task = Task(
description="Synthesize all research into a final report.",
expected_output="A comprehensive report combining all analyses.",
agent=writer,
context=[market_task, tech_task],
callback=save_report
)
5. Process Types
The process type determines how CrewAI orchestrates task execution. Choosing the right process depends on your workflow's coordination needs: predictable pipelines use sequential, complex delegation uses hierarchical, and collaborative decision-making uses consensual.
Sequential Process
Tasks execute one after another in the order they are listed. Each task's output is available as context for subsequent tasks. This is the default process and the simplest to reason about. Best for linear workflows: research then write then edit. Predictable execution order, predictable costs, easy to debug.
Hierarchical Process
A manager agent (automatically created or custom) receives all tasks and delegates them to the most appropriate agent based on their roles and goals. The manager can re-assign tasks, request revisions, and coordinate the workflow dynamically. Requires a manager_llm or manager_agent. Best for complex workflows where task assignment depends on intermediate results.
Consensual Process
Agents collectively decide on task assignments through a voting or discussion mechanism. This process type is experimental and best suited for creative workflows where multiple perspectives improve decision quality. Higher token usage due to inter-agent deliberation, but produces more nuanced outputs for subjective tasks.
from crewai import Crew, Process
# Sequential: tasks run in listed order
sequential_crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.sequential,
verbose=True
)
# Hierarchical: manager delegates to agents
hierarchical_crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.hierarchical,
manager_llm="openai/gpt-4o", # LLM for the auto-created manager
verbose=True
)
# Hierarchical with custom manager
manager = Agent(
role="Project Manager",
goal="Coordinate the team to produce the highest quality output",
backstory="Experienced PM who excels at delegating tasks to the "
"right specialists and ensuring quality standards.",
allow_delegation=True
)
custom_manager_crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.hierarchical,
manager_agent=manager,
verbose=True
)
6. Tools Integration
Built-in Tools
CrewAI ships with crewai-tools, a package of ready-to-use tools: SerperDevTool (web search), ScrapeWebsiteTool (web scraping), FileReadTool, DirectoryReadTool, PDFSearchTool, CSVSearchTool, CodeInterpreterTool, YoutubeVideoSearchTool, and more. Install with pip install crewai-tools.
Custom Tools
Build custom tools by subclassing BaseTool or using the @tool decorator. The decorator approach is simpler -- annotate a function with @tool, add type hints and a docstring, and CrewAI auto-generates the schema. Custom tools can access APIs, databases, file systems, or any Python library.
LangChain Tools
Any LangChain tool works directly with CrewAI agents. The framework wraps LangChain tools transparently, so you have access to the entire LangChain tool ecosystem -- 700+ integrations covering databases, APIs, cloud services, and specialized AI capabilities.
MCP Server Tools
CrewAI integrates with Model Context Protocol (MCP) servers, giving agents access to the growing ecosystem of MCP tools. Connect any MCP server (file systems, GitHub, databases, Slack, custom servers) and its tools become available to your agents automatically.
RAG Tools
Built-in RAG tools enable agents to search through documents, PDFs, CSVs, and websites semantically. The PDFSearchTool, CSVSearchTool, and DirectorySearchTool use embeddings for semantic search over local files. For custom knowledge bases, use the RagTool with your own vector store.
Code Execution
The CodeInterpreterTool gives agents the ability to write and execute Python code in a sandboxed environment. Agents can perform calculations, generate visualizations, process data, and run experiments -- bridging the gap between reasoning and computation.
from crewai.tools import tool, BaseTool
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
# Built-in tools
search = SerperDevTool()
scraper = ScrapeWebsiteTool()
# Custom tool with decorator
@tool("Database Query")
def query_database(sql: str) -> str:
"""Execute a SQL query against the analytics database and return results.
Use standard SQL syntax. Tables: users, orders, products, events."""
import sqlite3
conn = sqlite3.connect("analytics.db")
result = conn.execute(sql).fetchall()
conn.close()
return str(result)
# Custom tool with class
class GitHubTool(BaseTool):
name: str = "GitHub PR Reviewer"
description: str = "Fetch and analyze pull request details from GitHub"
def _run(self, repo: str, pr_number: int) -> str:
import requests
resp = requests.get(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
headers={"Authorization": f"token {self.api_key}"}
)
pr = resp.json()
return f"PR #{pr_number}: {pr['title']} ({pr['state']}, " \
f"+{pr['additions']}/-{pr['deletions']})"
# Assign tools to agents
researcher = Agent(
role="Research Analyst",
goal="Find accurate, up-to-date information",
backstory="Expert researcher with strong analytical skills.",
tools=[search, scraper, query_database]
)
7. Memory System
CrewAI includes a multi-layer memory system that gives agents the ability to learn and retain information across tasks and crew executions. Memory is disabled by default and enabled with memory=True on the Crew. When enabled, agents can recall previous interactions, build up entity knowledge, and improve their performance over time.
Short-Term Memory
Stores information within the current crew execution. Short-term memory is shared across agents in the crew, enabling them to reference each other's recent outputs and maintain coherence throughout the workflow. Implemented using RAG with embeddings for efficient retrieval. Automatically populated from task outputs.
Long-Term Memory
Persists across crew executions using a local SQLite database. Long-term memory stores task results, successful strategies, and learned patterns. Over time, agents develop institutional knowledge -- they remember what worked before and apply those lessons to new tasks. This creates a flywheel effect where crew performance improves with use.
Entity Memory
Tracks knowledge about specific entities (people, organizations, projects, concepts) across interactions. When an agent encounters information about an entity, it is stored and can be recalled in future interactions. This enables agents to build up a knowledge graph of the domain they work in.
Custom Storage Backends
Override the default storage with custom backends by implementing the memory provider interface. Use external vector stores (Pinecone, Weaviate, Qdrant, ChromaDB), cloud databases, or enterprise knowledge management systems as the backing store for any memory layer.
from crewai import Crew
# Enable memory with default storage
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
memory=True, # enables all memory layers
verbose=True
)
# Custom memory configuration
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
memory=True,
embedder={
"provider": "openai",
"config": {
"model": "text-embedding-3-small"
}
},
long_term_memory=LongTermMemory(
storage=LTMSQLiteStorage(db_path="./crew_memory.db")
),
short_term_memory=ShortTermMemory(
storage=RAGStorage(
embedder_config={
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
}
)
),
entity_memory=EntityMemory(
storage=RAGStorage(
embedder_config={
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
}
)
)
)
8. Flows API
The Flows API is CrewAI's higher-level abstraction for building event-driven, stateful workflows that coordinate multiple crews and external operations. While Crews handle multi-agent collaboration on a set of tasks, Flows handle the orchestration of multiple crews, conditional logic, state management, and integration with external systems in complex pipelines.
Flows use a decorator-based syntax: @start() marks the entry point, @listen() connects methods that react to events (outputs from other methods), and @router() implements conditional branching based on state or results. The Flow maintains typed state across all steps, making it easy to build complex, multi-stage workflows with clear data flow.
@start() Decorator
Marks a method as the entry point of the flow. When flow.kickoff() is called, all @start() methods execute. You can have multiple start methods to run initialization steps in parallel. Start methods set up state that downstream listeners consume.
@listen() Decorator
Connects a method to an upstream event. When the upstream method completes, the listener executes with the upstream's return value as input. Multiple listeners can react to the same event, enabling fan-out patterns. Listeners can also listen to other listeners, creating multi-stage pipelines.
@router() Decorator
Implements conditional branching. A router method inspects the current state or the upstream output and returns a route name. Downstream listeners annotated with the matching route are triggered. This enables if/else logic, retry loops, and dynamic workflow adaptation based on intermediate results.
State Management
Flows maintain typed state via a Pydantic model. State is accessible to all methods via self.state and persists across the entire flow execution. State updates are atomic and tracked, making it easy to debug and audit the flow's behavior. State can be serialized for persistence and resume.
from crewai.flow.flow import Flow, listen, start, router
from pydantic import BaseModel
class ContentPipelineState(BaseModel):
topic: str = ""
research: str = ""
draft: str = ""
quality_score: float = 0.0
final_content: str = ""
class ContentPipeline(Flow[ContentPipelineState]):
@start()
def gather_topic(self):
self.state.topic = "Multi-Agent AI Orchestration"
return self.state.topic
@listen(gather_topic)
def research_phase(self, topic):
# Run a research crew
research_crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True
)
result = research_crew.kickoff(inputs={"topic": topic})
self.state.research = result.raw
return result.raw
@listen(research_phase)
def writing_phase(self, research):
# Run a writing crew
writing_crew = Crew(
agents=[writer, editor],
tasks=[writing_task, editing_task],
process=Process.sequential
)
result = writing_crew.kickoff(
inputs={"research": research}
)
self.state.draft = result.raw
return result.raw
@router(writing_phase)
def quality_check(self, draft):
# Evaluate quality and decide next step
score = evaluate_quality(draft)
self.state.quality_score = score
if score >= 0.8:
return "publish"
return "revise"
@listen("publish")
def publish(self):
self.state.final_content = self.state.draft
return f"Published: {len(self.state.final_content)} chars"
@listen("revise")
def revise(self):
# Re-run writing with feedback
return self.writing_phase(self.state.research)
# Run the flow
flow = ContentPipeline()
result = flow.kickoff()
9. LLM Backends
OpenAI
The default backend. Supports GPT-4o, GPT-4o-mini, o1, o3, and o4-mini. Set OPENAI_API_KEY and use model strings like "openai/gpt-4o". Supports function calling, structured outputs, and streaming. Best overall ecosystem support and most battle-tested integration.
Anthropic Claude
Full support for Claude Opus, Sonnet, and Haiku via "anthropic/claude-sonnet-4-20250514". Set ANTHROPIC_API_KEY. Claude excels at long-context analysis, code generation, and nuanced reasoning. Ideal for agents that need to process large documents or produce high-quality written content.
Ollama (Local Models)
Run agents locally with open-source models via Ollama. Use "ollama/llama3.2", "ollama/mistral", or "ollama/deepseek-r1". Set OPENAI_API_BASE=http://localhost:11434. Best for development, privacy-sensitive workloads, and offline operation. No API costs but requires local GPU resources.
Azure OpenAI
Enterprise deployment via Azure OpenAI Service. Use "azure/your-deployment-name" with AZURE_API_KEY, AZURE_API_BASE, and AZURE_API_VERSION. Provides enterprise security, compliance, and data residency requirements. Same models as OpenAI with Azure's governance layer.
Groq
Ultra-fast inference for agents that need low latency. Use "groq/llama-3.3-70b-versatile" with GROQ_API_KEY. Groq's custom LPU hardware delivers sub-second response times, making it ideal for triage agents, classification tasks, and real-time pipelines where speed matters more than maximum model capability.
Mixed LLM Strategy
The most effective pattern: assign different LLMs to different agents based on their needs. Use GPT-4o or Claude for complex reasoning agents, Llama on Groq for fast triage agents, and GPT-4o-mini for formatting or extraction tasks. This optimizes both cost and quality across the crew.
from crewai import Agent, LLM
# OpenAI agent
planner = Agent(
role="Strategic Planner",
goal="Create comprehensive project plans",
backstory="Expert strategist.",
llm=LLM(model="openai/gpt-4o", temperature=0.2)
)
# Claude agent for long-context analysis
analyst = Agent(
role="Document Analyst",
goal="Analyze lengthy technical documents",
backstory="Expert at processing dense technical content.",
llm=LLM(model="anthropic/claude-sonnet-4-20250514", temperature=0.1)
)
# Groq agent for fast classification
triager = Agent(
role="Request Triager",
goal="Quickly classify and route incoming requests",
backstory="Fast, accurate classification specialist.",
llm=LLM(model="groq/llama-3.3-70b-versatile", temperature=0.0)
)
# Local Ollama agent for privacy-sensitive tasks
local_agent = Agent(
role="PII Processor",
goal="Process documents containing sensitive personal data",
backstory="Privacy-first document processor.",
llm=LLM(
model="ollama/llama3.2",
base_url="http://localhost:11434"
)
)
10. Advanced Patterns
Human-in-the-Loop
Set human_input=True on a task to pause execution and request human feedback before finalizing the output. The human can approve, reject, or provide corrections. This is essential for high-stakes workflows where agent outputs need human validation before proceeding to downstream tasks.
Step Callbacks
Register step_callback on the Crew to receive notifications after every agent reasoning step. Callbacks receive the step output and can log, filter, or transform intermediate results. Use this for real-time monitoring, progress tracking, cost accounting, and custom abort logic.
Output Guardrails
Combine output_pydantic with custom validation logic to enforce quality standards on agent outputs. If the output fails validation, CrewAI automatically retries with feedback. Layer Pydantic validation, custom validators, and post-processing callbacks for defense-in-depth quality control.
Conditional Tasks
Use condition on a task to make its execution conditional on a function that evaluates the current context. The condition receives the task context and returns a boolean. This enables dynamic workflows where certain steps only run when specific criteria are met.
Pipeline Orchestration
Use Pipeline to chain multiple crews into multi-stage workflows. Each stage is a crew that processes inputs and produces outputs for the next stage. Pipelines support parallel stages (multiple crews running concurrently) and routing logic between stages. This scales CrewAI from single-crew to enterprise-grade orchestration.
Training and Testing
Use crew.train(n_iterations=5) to run the crew multiple times with human feedback, building up training data that improves agent performance. Use crew.test(n_iterations=3) to benchmark crew outputs against quality criteria. The training loop creates a feedback cycle that tunes agent behavior over time.
from crewai import Task, Crew
# Human-in-the-loop task
review_task = Task(
description="Review the generated report for accuracy.",
expected_output="A validated, human-approved report.",
agent=editor,
human_input=True # pauses for human feedback
)
# Conditional task
advanced_analysis = Task(
description="Perform deep statistical analysis on the data.",
expected_output="Statistical analysis with confidence intervals.",
agent=statistician,
condition=lambda context: len(context.get("data", [])) > 100
)
# Step callback for monitoring
def monitor_step(step_output):
print(f"Agent: {step_output.agent}")
print(f"Action: {step_output.action}")
print(f"Output: {step_output.result[:200]}...")
crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, writing_task],
step_callback=monitor_step,
verbose=True
)
# Training loop with human feedback
crew.train(
n_iterations=5,
filename="training_data.pkl",
inputs={"topic": "AI agents"}
)
# Test and benchmark
crew.test(
n_iterations=3,
openai_model_name="gpt-4o",
inputs={"topic": "AI agents"}
)
11. Framework Comparison
The multi-agent framework landscape in 2026 includes several major players. Each framework makes different trade-offs between abstraction level, flexibility, and production readiness. The right choice depends on your workflow complexity, team expertise, and deployment requirements.
| Dimension | CrewAI | AutoGen | LangGraph | Claude Agent SDK | OpenAI Agents SDK |
|---|---|---|---|---|---|
| Approach | Role-based crews | Conversational agents | Graph-based state machines | Tool-first agents | Handoff-based pipelines |
| Abstraction Level | High (role/goal/backstory) | Medium (chat patterns) | Low (nodes/edges/state) | Medium (tools/prompts) | Medium (agents/handoffs) |
| Language | Python | Python, .NET | Python, TypeScript | Python, TypeScript | Python |
| Model Lock-in | None (any LLM) | None (any LLM) | None (any LLM) | Claude-optimized | OpenAI-optimized |
| Memory | Built-in (3 layers) | Conversation history | Checkpointing | Session-based | Sessions + compaction |
| State Management | Flows API | Chat history | TypedDict/Pydantic | RunContext | RunContextWrapper |
| Observability | CrewAI+, callbacks | AutoGen Studio | LangSmith | Tracing API | OpenAI Traces |
| Best For | Structured team workflows | Multi-turn conversations | Complex stateful agents | Computer-use, coding | Rapid multi-agent dev |
CrewAI excels when you need structured team collaboration with clear roles -- it has the highest abstraction level, making it the fastest to prototype multi-agent workflows. AutoGen (Microsoft) is strongest for multi-turn conversational patterns where agents debate and refine outputs. LangGraph offers the most control with explicit graph-based state machines, ideal for complex workflows that need checkpointing and time-travel debugging. Claude Agent SDK has the deepest system access with built-in file and shell tools. OpenAI Agents SDK provides the simplest path to multi-agent with minimal abstractions and sandbox execution.
12. Production Deployment
Moving CrewAI from prototyping to production requires addressing reliability, monitoring, error handling, cost control, and scaling. CrewAI+ (the managed platform) handles many of these concerns, but self-hosted deployments need explicit infrastructure around the framework.
CrewAI+ Platform
The managed deployment platform for CrewAI. Deploy crews as API endpoints with built-in monitoring, logging, trace visualization, and crew performance analytics. CrewAI+ handles scaling, error recovery, and provides a dashboard for real-time crew execution tracking. Enterprise tier includes SSO, audit logs, and custom LLM routing.
Monitoring and Logging
Use step_callback and task_callback to capture execution telemetry. Log agent reasoning steps, tool invocations, token usage, and timing data. Export to observability platforms (Datadog, Grafana, custom dashboards) for production alerting on execution failures, cost spikes, and quality degradation.
Error Handling
Set max_iter and max_execution_time on agents to prevent infinite loops. Use output_pydantic for schema validation with automatic retries. Wrap crew.kickoff() in try/except blocks with exponential backoff for transient API errors. Log failed executions with full context for debugging.
Cost Optimization
Use mixed LLM strategies: expensive models for reasoning-heavy agents, cheap models for extraction and formatting. Set max_rpm to rate-limit API calls. Disable allow_delegation on agents that do not need it (reduces token usage from delegation overhead). Monitor token usage per agent per task to identify optimization targets.
Scaling Patterns
For high throughput, run crews in worker processes with task queues (Celery, Redis Queue, or Temporal). Each crew execution is stateless (unless using memory), so horizontal scaling is straightforward. Use the Flows API to orchestrate multiple crews across distributed workers for complex enterprise pipelines.
Testing Strategies
Use crew.test() for automated quality benchmarking. Write unit tests for individual tools and custom callbacks. Integration-test full crew runs with mock LLMs to validate pipeline logic without API costs. Use the training loop to collect human-validated outputs as regression test fixtures.
# Production deployment with error handling and monitoring
import logging
from crewai import Crew
from tenacity import retry, stop_after_attempt, wait_exponential
logger = logging.getLogger("crewai_production")
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60)
)
def run_crew_with_retries(crew, inputs):
try:
result = crew.kickoff(inputs=inputs)
logger.info(
f"Crew completed: {result.token_usage}"
)
return result
except Exception as e:
logger.error(f"Crew execution failed: {e}")
raise
# Deploy as API endpoint (FastAPI example)
from fastapi import FastAPI, BackgroundTasks
app = FastAPI()
@app.post("/api/analyze")
async def analyze(topic: str, background_tasks: BackgroundTasks):
crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, report_task],
process=Process.sequential,
memory=True,
max_rpm=30 # rate limit across all agents
)
background_tasks.add_task(
run_crew_with_retries, crew, {"topic": topic}
)
return {"status": "processing", "topic": topic}
# CLI deployment with CrewAI+
# crewai deploy -- deploys to CrewAI+ platform
# crewai monitor -- real-time crew monitoring
# crewai logs -- view execution logs
Getting Started
CrewAI requires Python 3.10 or newer. Install with pip and create your first crew in minutes.
# Install CrewAI and tools pip install crewai crewai-tools # Create a new project with the CLI crewai create crew my_project cd my_project # Configure agents in config/agents.yaml # Configure tasks in config/tasks.yaml # Run the crew crewai run
# Minimal crew -- your first multi-agent workflow
from crewai import Agent, Task, Crew
researcher = Agent(
role="Researcher",
goal="Find accurate information on {topic}",
backstory="Expert research analyst."
)
writer = Agent(
role="Writer",
goal="Write clear, engaging content based on research",
backstory="Skilled technical writer."
)
research = Task(
description="Research {topic} thoroughly.",
expected_output="Comprehensive research brief.",
agent=researcher
)
article = Task(
description="Write an article based on the research.",
expected_output="A polished 1000-word article.",
agent=writer,
context=[research]
)
crew = Crew(agents=[researcher, writer], tasks=[research, article])
result = crew.kickoff(inputs={"topic": "CrewAI multi-agent orchestration"})
print(result)
For production use, configure your LLM API keys as environment variables, enable memory for learning across runs, and use the Flows API for complex multi-crew orchestration. The official documentation at docs.crewai.com covers all concepts, integrations, and advanced patterns in depth.