LangChain’s v1-alpha release introduces a revolutionary middleware system that fundamentally transforms how developers interact with and control agent behavior. This architectural shift provides unprecedented flexibility and control mechanisms that extend far beyond traditional agent implementations, enabling sophisticated production-grade applications while maintaining simplicity for rapid prototyping.
Understanding Middleware Architecture
The middleware system in LangChain 1.0 alpha operates by modifying the fundamental agent loop through strategic intervention points. The traditional agent architecture consists of a model node and a tool node, but middleware introduces three critical hooks that allow developers to intercept and modify agent behavior at precise moments during execution.
The execution pattern follows a sophisticated sequential processing model similar to web server middleware architectures. When multiple middleware components are provided to an agent, they execute sequentially during the inbound journey to the model call, processing before_model
and modify_model_request
hooks in order. During the return journey, after_model
hooks execute in reverse sequential order, creating a symmetric processing pipeline.
The Three Core Middleware Hooks
before_model Hook
The before_model
hook executes prior to model calls and provides the capability to update state or redirect execution to alternative nodes. This enables preprocessing of inputs, state validation, or conditional routing based on current context. It’s the most powerful hook, capable of:
- Updating permanent state
- Redirecting execution using
jump_to
with values:"model"
,"tools"
, or"__end__"
- Implementing complex decision logic
- State validation and preprocessing
Note: Jumping to "model"
from within before_model
itself is forbidden to maintain execution order guarantees.
from typing import Any
from langchain.agents.middleware import AgentState, AgentMiddleware
class CustomRoutingMiddleware(AgentMiddleware):
def before_model(self, state: AgentState) -> dict[str, Any] | None:
# Check if we should redirect based on message count
if len(state["messages"]) > 50:
# Summarize and update messages
return {"messages": self.summarize_messages(state["messages"])}
# Check if we should end early (correct way using jump_to)
if self.should_terminate(state):
return {"jump_to": "__end__"}
return None # Continue normal flow
modify_model_request Hook
The modify_model_request
hook operates immediately before model calls but specifically allows modification of tools, prompts, message lists, model parameters, model settings, output formats, and tool choice for that particular request. Unlike before_model
, this function cannot modify permanent state or jump to different nodes.
Important: ModelRequest.model
must be a BaseChatModel
instance, not a string. While some docs examples mistakenly show strings, the type contract requires BaseChatModel
:
from typing import Any
from pydantic import BaseModel
from langchain.agents.middleware import ModelRequest, AgentState, AgentMiddleware
from langchain_openai import ChatOpenAI
class Answer(BaseModel):
answer: str
confidence: float
class DynamicModelMiddleware(AgentMiddleware):
def modify_model_request(self, request: ModelRequest, state: AgentState) -> ModelRequest:
# Dynamic model selection - use BaseChatModel instances, not strings
if self.is_complex_query(state):
request.model = ChatOpenAI(model="gpt-4o")
else:
request.model = ChatOpenAI(model="gpt-4o-mini")
# Modify system prompt dynamically
request.system_prompt = self.generate_contextual_prompt(state)
# Control tool availability
if self.should_restrict_tools(state):
request.tools = [t for t in request.tools if self.is_tool_allowed(t, state)]
# Schema-based structured output (v1 integrates schemas in main loop)
# Important v1 change: Prompted JSON formats are NO LONGER supported via response_format
# You must use Pydantic models, TypedDict, or JSON Schema for structured output
if self.needs_structured_output(state):
request.response_format = Answer # Use Pydantic model/schema
return request
after_model Hook
The after_model
hook runs following model calls, offering opportunities for post-processing, result validation, or additional state modifications. This hook executes after model completion but before tool execution. Like before_model
, it can use jump_to
for flow control:
from typing import Any
from langchain.agents.middleware import AgentState, AgentMiddleware
class ValidationMiddleware(AgentMiddleware):
def after_model(self, state: AgentState) -> dict[str, Any] | None:
last_message = state["messages"][-1]
# Validate model output
if self.contains_sensitive_info(last_message):
# Redact and update state
cleaned_message = self.redact_sensitive(last_message)
return {"messages": state["messages"][:-1] + [cleaned_message]}
# Add metadata or annotations
if self.should_annotate(last_message):
annotated = self.add_annotations(last_message)
return {"messages": state["messages"][:-1] + [annotated]}
# Can also control flow with jump_to
if self.requires_immediate_tools(last_message):
return {"jump_to": "tools"}
return None
Integration with create_agent
The middleware system integrates seamlessly with the create_agent
function, allowing multiple middleware to be composed together. When middleware is present, the model parameter must be either a string or a BaseChatModel
(functions are not permitted), and prompts must be string or None:
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware, HumanInTheLoopMiddleware
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import InMemorySaver
# Create custom middleware instances
routing_middleware = CustomRoutingMiddleware()
model_middleware = DynamicModelMiddleware()
validation_middleware = ValidationMiddleware()
# Create agent with middleware stack
agent = create_agent(
model=ChatOpenAI(model="gpt-4o"), # BaseChatModel instance
tools=[...], # Your tools here
middleware=[
routing_middleware, # Executes first on inbound
model_middleware, # Executes second on inbound
validation_middleware, # Executes third on inbound
SummarizationMiddleware(
model="openai:gpt-4o-mini",
max_tokens_before_summary=4000,
messages_to_keep=20
),
HumanInTheLoopMiddleware(
tool_configs={
"send_email": {"require_approval": True}
}
)
],
checkpointer=InMemorySaver() # Required for HITL interrupts
)
# Middleware executes in order for before_model and modify_model_request
# Reverse order for after_model hooks
Built-in Middleware Implementations
LangChain 1.0 alpha includes three production-ready middleware implementations:
Human-in-the-Loop Middleware
Utilizes the after_model
hook to provide an off-the-shelf solution for adding interrupts that enable human feedback on tool calls. Important: HITL middleware requires a checkpointer (like InMemorySaver()
) for interrupt functionality to work properly:
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver
hitl = HumanInTheLoopMiddleware(
tool_configs={
"delete_database": {"require_approval": True, "description": "Dangerous operation"},
"send_email": {"require_approval": True}
},
message_prefix="Approval required:"
)
agent = create_agent(
model="openai:gpt-4o",
tools=[...],
middleware=[hitl],
checkpointer=InMemorySaver() # Required for interrupts
)
Summarization Middleware
Leverages the before_model
hook to automatically summarize accumulated messages once they exceed a specified threshold:
from langchain.agents.middleware import SummarizationMiddleware
summarization = SummarizationMiddleware(
model="openai:gpt-4o-mini",
max_tokens_before_summary=4000,
messages_to_keep=20,
summary_prompt="Summarize earlier context concisely."
)
Anthropic Prompt Caching Middleware
Employs the modify_model_request
hook to dynamically add special prompt caching tags to messages. Import from the prompt_caching
submodule:
from langchain_anthropic import ChatAnthropic
from langchain.agents.middleware.prompt_caching import AnthropicPromptCachingMiddleware
from langgraph.checkpoint.memory import InMemorySaver
caching = AnthropicPromptCachingMiddleware(ttl="5m")
agent = create_agent(
model=ChatAnthropic(model="claude-sonnet-4-latest"),
prompt=LONG_PROMPT,
middleware=[caching],
checkpointer=InMemorySaver()
)
Advanced Middleware Patterns
Composable State Management
Create sophisticated state management patterns by combining multiple middleware:
import json
from typing import Any
from langchain.agents.middleware import AgentState, AgentMiddleware
from langchain_core.messages import messages_to_dict
class StateManagementMiddleware(AgentMiddleware):
def __init__(self, redis_client=None):
self.redis_client = redis_client
self.local_cache = {}
def before_model(self, state: AgentState) -> dict[str, Any] | None:
# Load persistent state
session_id = state.get("session_id")
if session_id:
persistent_state = self.load_from_redis(session_id)
if persistent_state:
return {"context": persistent_state}
return None
def after_model(self, state: AgentState) -> dict[str, Any] | None:
# Save state updates
session_id = state.get("session_id")
if session_id:
self.save_to_redis(session_id, state)
return None
def load_from_redis(self, session_id):
if self.redis_client:
data = self.redis_client.get(f"session:{session_id}")
return json.loads(data) if data else None
return None
def save_to_redis(self, session_id, state):
if self.redis_client:
serializable = self.serialize_state(state)
self.redis_client.setex(
f"session:{session_id}",
3600,
json.dumps(serializable)
)
def serialize_state(self, state: AgentState) -> dict:
"""Convert state to JSON-serializable format"""
out = dict(state)
if "messages" in out:
# Convert messages to dicts for JSON serialization
out["messages"] = messages_to_dict(out["messages"])
return out
Conditional Tool Access
Implement sophisticated tool access control based on context:
class ToolAccessControlMiddleware(AgentMiddleware):
def __init__(self, access_rules):
self.access_rules = access_rules
def modify_model_request(self, request: ModelRequest, state: AgentState) -> ModelRequest:
user_role = state.get("user_role", "guest")
# Filter tools based on user permissions
allowed_tools = []
for tool in request.tools:
if self.is_tool_allowed(tool, user_role):
allowed_tools.append(tool)
request.tools = allowed_tools
# Modify tool choice if necessary
# Note: Some providers return tool names/IDs, validate accordingly
if request.tool_choice and request.tool_choice not in allowed_tools:
request.tool_choice = None
return request
def is_tool_allowed(self, tool, role):
tool_name = tool.name if hasattr(tool, 'name') else str(tool)
return role in self.access_rules.get(tool_name, [])
Installation and Setup
To begin working with middleware in LangChain 1.0 alpha:
# Python (alpha) - langchain-core is pulled in automatically
pip install --pre -U langchain
# Provider packages (per docs, --pre not required for providers)
pip install -U langchain-openai langchain-anthropic
# JavaScript (alpha)
npm install langchain@next
Python Version Requirements
LangChain 1.0 alpha requires Python 3.10 or later, dropping support for Python 3.9. This ensures middleware implementations can leverage modern Python features for improved performance and developer experience.
Migration Considerations
When migrating to the middleware system:
- Legacy Hooks: The
pre_model_hook
andpost_model_hook
parameters are replaced by middleware methods - Function Restrictions: When using middleware with
create_agent
, the model parameter must be a string orBaseChatModel
- Prompt Constraints: Prompts must be strings or None when using middleware
- Structured Output Change: v1 no longer supports prompted JSON via
response_format
- you must use schemas (Pydantic, TypedDict, or JSON Schema) - Package Structure: Legacy chains and agents moved to
langchain-legacy
package
Integration with LangGraph Runtime
The middleware system operates within the broader context of LangChain 1.0’s architectural reorganization, where LangGraph serves as the runtime and orchestrator. This integration ensures that middleware components benefit from:
- Deterministic concurrency using Pregel/BSP algorithms (Bulk Synchronous Parallel model)
- Support for loops and parallelism
- Built-in checkpointing and threading capabilities
- Multiple streaming modes (
updates
,messages
,custom
) for production-grade applications - Time travel capabilities for debugging
Content Blocks and Provider Standardization
Middleware works in conjunction with LangChain 1.0’s new .content_blocks
property on message objects, which provides a fully typed view of message content and standardizes modern LLM features across providers. In v1, content_blocks
is a list of typed dictionaries with standardized type
fields:
from typing import Any
from langchain.agents.middleware import AgentState, AgentMiddleware
from langchain_core.messages import AIMessage
class ContentProcessingMiddleware(AgentMiddleware):
def after_model(self, state: AgentState) -> dict[str, Any] | None:
last_message: AIMessage = state["messages"][-1]
# Access standardized content blocks as typed dicts
for block in getattr(last_message, "content_blocks", []):
block_type = block.get("type")
if block_type == "text":
# Process text content
self.process_text(block.get("text", ""))
elif block_type == "reasoning":
# Log reasoning blocks
self.log_reasoning(block.get("reasoning", ""))
elif block_type == "tool_call":
# Process tool calls
self.process_tool_call(block) # name/args/id available
elif block_type == "citation":
# Validate citations
self.validate_citation(block)
return None
Best Practices and Recommendations
Middleware Design Principles
- Single Responsibility: Each middleware should focus on one specific aspect
- Stateless Operations: Prefer stateless operations in
modify_model_request
- Error Handling: Implement robust error handling to prevent cascade failures
- Performance: Consider performance implications of sequential execution
- Testing: Create comprehensive tests for each middleware component
Performance Optimization
class OptimizedMiddleware(AgentMiddleware):
def __init__(self):
# Pre-compute expensive operations
self.compiled_patterns = self.compile_patterns()
self.cached_prompts = {}
def modify_model_request(self, request: ModelRequest, state: AgentState) -> ModelRequest:
# Use caching for expensive operations
cache_key = self.generate_cache_key(state)
if cache_key in self.cached_prompts:
request.system_prompt = self.cached_prompts[cache_key]
else:
prompt = self.generate_prompt(state)
self.cached_prompts[cache_key] = prompt
request.system_prompt = prompt
return request
Conclusion
LangChain’s middleware system in v1-alpha represents a paradigm shift in agent development, providing the flexibility and control necessary for production-grade applications while maintaining the simplicity needed for rapid experimentation. By leveraging the three core hooks—before_model
, modify_model_request
, and after_model
—developers can create sophisticated agent behaviors that adapt dynamically to context, enforce security policies, manage state efficiently, and integrate seamlessly with existing infrastructure.
The middleware architecture’s composable nature enables teams to build reusable components that can be shared across projects, promoting best practices and reducing development time. As LangChain continues to evolve toward its 1.0 release, the middleware system stands as a foundational component that will enable the next generation of AI agent applications.
Resources and References
Essential Documentation
- Middleware (Python v1-alpha) — Complete guide to hooks, execution order,
jump_to
semantics,ModelRequest
fields,create_agent
restrictions, and built-in middleware - v1 Release Notes — Breaking changes including
.content_blocks
, structured output in main loop, Python 3.10+ requirement,langchain-legacy
package - Messages & Content Blocks — Standard content blocks structure and provider normalization
- Installation Guide — Package installation (
--pre
for core, plain-U
for providers)
LangGraph Integration
- LangGraph Runtime Documentation — Pregel/BSP architecture and core concepts
- Streaming Modes — Available streaming modes:
updates
,messages
,custom
- Persistence & Time Travel — Checkpointing, threading, and debugging capabilities
Community & Updates
- LangChain GitHub Repository — Source code and issue tracking
- Official LangChain Blog — Latest announcements and deep dives
- LangChain Forum — Official community discussions
- AIMUG Discord — AI/ML community chat
- @LangChainAI on Twitter — Real-time updates