Passing Variables in AI Agents: Pain Points, Fixes, and Best Practices
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, ‘Roboto’, ‘Oxygen’, ‘Ubuntu’, ‘Cantarell’, sans-serif;
line-height: 1.7;
color: #333;
background: #ffffff;
max-width: 900px;
margin: 0 auto;
padding: 40px 20px;
}
h1 {
font-size: 2.5em;
font-weight: 800;
margin: 40px 0 20px 0;
color: #1a1a1a;
line-height: 1.2;
}
h2 {
font-size: 2em;
font-weight: 700;
margin: 50px 0 20px 0;
color: #1a1a1a;
border-bottom: 3px solid #3b82f6;
padding-bottom: 10px;
}
h3 {
font-size: 1.5em;
font-weight: 600;
margin: 30px 0 15px 0;
color: #2563eb;
}
h4 {
font-size: 1.2em;
font-weight: 600;
margin: 20px 0 10px 0;
color: #1e40af;
}
p {
margin: 15px 0;
font-size: 1.05em;
}
strong {
font-weight: 600;
color: #1a1a1a;
}
em {
font-style: italic;
color: #4b5563;
}
ul, ol {
margin: 15px 0 15px 30px;
}
li {
margin: 8px 0;
font-size: 1.05em;
}
code {
background: #2e343e;
padding: 2px 6px;
border-radius: 3px;
font-family: ‘Monaco’, ‘Courier New’, monospace;
font-size: 0.9em;
color: #dc2626;
}
pre {
background: #1e293b;
color: #e2e8f0;
padding: 20px;
border-radius: 8px;
overflow-x: auto;
margin: 20px 0;
font-family: ‘Monaco’, ‘Courier New’, monospace;
font-size: 0.9em;
line-height: 1.5;
}
pre code {
background: transparent;
color: black;
padding: 0;
}
table {
width: 100%;
border-collapse: collapse;
margin: 30px 0;
background: white;
box-shadow: 0 1px 3px rgba(0,0,0,0.1);
border-radius: 8px;
overflow: hidden;
}
thead {
background: #3b82f6;
color: white;
}
th {
padding: 15px;
text-align: left;
font-weight: 600;
font-size: 0.95em;
}
td {
padding: 15px;
border-bottom: 1px solid #e5e7eb;
font-size: 0.95em;
}
tbody tr:hover {
background: #f9fafb;
}
tbody tr:last-child td {
border-bottom: none;
}
.tldr-box, .warning-box, .callout-box {
padding: 20px;
margin: 30px 0;
border-radius: 8px;
border-left: 4px solid;
}
.tldr-box {
background: #dbeafe;
border-left-color: #3b82f6;
}
.warning-box {
background: #fef3c7;
border-left-color: #f59e0b;
}
.callout-box {
background: #f3f4f6;
border-left-color: #6b7280;
font-family: monospace;
white-space: pre-wrap;
}
.image-placeholder {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
padding: 60px 20px;
margin: 30px 0;
border-radius: 8px;
text-align: center;
color: white;
font-weight: 600;
}
.image-placeholder p {
margin: 5px 0;
font-size: 1.1em;
}
.image-placeholder .caption {
font-style: italic;
font-size: 0.95em;
opacity: 0.9;
margin-top: 10px;
}
hr {
border: none;
border-top: 2px solid #e5e7eb;
margin: 40px 0;
}
.code-header {
background: #1e293b;
color: #10b981;
padding: 10px 20px;
border-radius: 8px 8px 0 0;
font-family: ‘Monaco’, ‘Courier New’, monospace;
font-size: 0.9em;
font-weight: 600;
margin-bottom: -10px;
}
.code-header.bad {
color: #ef4444;
}
.flowchart {
background: #f9fafb;
padding: 20px;
margin: 30px 0;
border-radius: 8px;
font-family: monospace;
white-space: pre;
overflow-x: auto;
border: 2px solid #e5e7eb;
}
blockquote {
border-left: 4px solid #3b82f6;
padding-left: 20px;
margin: 20px 0;
color: #4b5563;
font-style: italic;
}
.checklist {
background: white;
border: 2px solid #e5e7eb;
border-radius: 8px;
padding: 20px;
margin: 30px 0;
}
.checklist-item {
padding: 10px 0;
border-bottom: 1px solid #f3f4f6;
}
.checklist-item:last-child {
border-bottom: none;
}
.checklist-item input[type=”checkbox”] {
margin-right: 10px;
transform: scale(1.2);
}
.section-number {
display: inline-block;
background: #3b82f6;
color: white;
width: 30px;
height: 30px;
border-radius: 50%;
text-align: center;
line-height: 30px;
margin-right: 10px;
font-weight: 600;
}
.ascii-art {
background: #1e293b;
color: #10b981;
padding: 20px;
border-radius: 8px;
font-family: ‘Courier New’, monospace;
white-space: pre;
overflow-x: auto;
font-size: 0.85em;
line-height: 1.4;
margin: 30px 0;
}
@media (max-width: 768px) {
body {
padding: 20px 15px;
}
h1 {
font-size: 1.8em;
}
h2 {
font-size: 1.5em;
}
table {
font-size: 0.85em;
}
th, td {
padding: 10px;
}
}
.meme-card{
border:1px solid #e5e7eb;
border-radius:12px;
padding:14px;
background:#fff;
margin:14px 0;
}
.meme-row{
display:flex;
gap:10px;
align-items:center;
padding:10px 12px;
border-radius:10px;
}
.meme-no{ background:#fff7ed; }
.meme-yes{ background:#ecfdf5; margin-top:10px; }
.meme-emoji{ font-size:18px; line-height:1; }
.meme-text{ font-weight:600; }
Intro: The Story We All Know
You build an AI agent on Friday afternoon. You demo it to your team Monday morning. The agent qualifies leads smoothly, books meetings without asking twice, and even generates proposals on the fly. Your manager nods approvingly.
Two weeks later, it’s in production. What could go wrong? π
By Wednesday, customers are complaining: “Why does the bot keep asking me my company name when I already told it?” By Friday, you’re debugging why the bot booked a meeting for the wrong date. By the following Monday, you’ve silently rolled it back.
What went wrong? Model is the same in demo and prod. It was something much more fundamental: your agent can’t reliably pass and manage variables across steps. Your agent also lacks proper identity controls to prevent accessing variables it shouldn’t.
What Is a Variable (And Why It Matters)
A variable is just a named piece of information your agent needs to remember or use:
Customer name
Order ID
Selected product
Meeting date
Task progress
API response
Variable passing is how that information flows from one step to the next without getting lost or corrupted.
Think of it like filling a multi-page form. Page 1: you enter your name and email. Page 2: the form should already show your name and email, not ask again. If the system doesn’t “pass” those fields from Page 1 to Page 2, the form feels broken. That’s exactly what’s happening with your agent.
Why This Matters in Production
LLMs are fundamentally stateless. A language model is like a person with severe amnesia. Every time you ask it a question, it has zero memory of what you said before unless you explicitly remind it by including that information in the prompt.
(Yes, your agent has the memory of a goldfish. No offense to goldfish. π )
If your agent doesn’t explicitly store and pass user data, context, and tool outputs from one step to the next, the agent literally forgets everything and has to start over.
In a 2-turn conversation? Fine, the context window still has room. In a 10-turn conversation where the agent needs to remember a customer’s preferences, previous decisions, and API responses? The context window fills up, gets truncated, and your agent “forgets” critical information.
This is why it works in demo (short conversations) but fails in production (longer workflows).
The Four Pain Points
Pain Point 1: The Forgetful Assistant
After 3-4 conversation turns, the agent forgets user inputs and keeps asking the same questions repeatedly.
Why it happens:
Relying purely on prompt context (which has limits)
No explicit state storage mechanism
Context window gets bloated and truncated
Real-world impact:
User: “My name is Priya and I work at TechCorp”
Agent: “Got it, Priya at TechCorp. What’s your biggest challenge?”
User: “Scaling our infrastructure costs”
Agent: “Thanks for sharing. Just to confirmβwhat’s your name and company?”
User: π‘
At this point, Priya is questioning whether AI will actually take her job or if she’ll die of old age before the agent remembers her name.
Pain Point 2: Scope Confusion Problem
Variables defined in prompts don’t match runtime expectations. Tool calls fail because parameters are missing or misnamed.
Why it happens:
Mismatch between what the prompt defines and what tools expect
Fragmented variable definitions scattered across prompts, code, and tool specs
Real-world impact:
Prompt says: “Use customer_id to fetch the order”
Tool expects: “customer_uid”
Agent tries: “customer_id”
Tool fails
Pain Point 3: UUIDs Get Mangled
LLMs are pattern matchers, not randomness engines. A UUID is deliberately high-entropy, so the model often produces something that looks like a UUID (right length, hyphens) but contains subtle typos, truncations, or swapped characters. In long chains, this becomes a silent killer: one wrong character and your API call is now targeting a different object, or nothing at all.
If you want a concrete benchmark, Boundaryβs write-up shows a big jump in identifier errors when prompts contain direct UUIDs, and how remapping to small integers significantly improves accuracy (UUID swap experiment).
How teams avoid this: donβt ask the model to handle UUIDs directly. Use short IDs in the prompt (001, 002 or ITEM-1, ITEM-2), enforce enum constraints where possible, and map back to UUIDs in code. (Youβll see these patterns again in the workaround section below.)
Pain Point 4: Chaotic Handoffs in Multi-Agent Systems
Data is passed as unstructured text instead of structured payloads. Next agent misinterprets context or loses fidelity.
Why it happens:
Passing entire conversation history instead of structured state
No clear contract for inter-agent communication
Real-world impact:
Agent A concludes: “Customer is interested”
Passes to Agent B as: “Customer says they might be interested in learning more”
Agent B interprets: “Not interested yet”
Agent B decides: “Don’t book a meeting”
β Contradiction.
Pain Point 5: Agentic Identity (Concurrency & Corruption)
Multiple users or parallel agent runs race on shared variables. State gets corrupted or mixed between sessions.
Why it happens:
No session isolation or user-scoped state
Treating agents as stateless functions
No agentic identity controls
Real-world impact (2024):
User A’s lead data gets mixed with User B’s lead data.
User A sees User B’s meeting booked in their calendar.
β GDPR violation. Lawsuit incoming.
Your legal team’s reaction: πππ
Real-world impact (2026):
Lead Scorer Agent reads Salesforce
It has access to Customer ID = cust_123
But which customer_id? The one for User A or User B?
Without agentic identity, it might pull the wrong customer data
β Agent processes wrong data
β Wrong recommendations
π‘ TL;DR: The Four Pain Points
Forgetful Assistant: Agent re-asks questions β Solution: Episodic memory
Scope Confusion: Variable names don’t match β Solution: tool calling (mostly solved!)
Chaotic Handoffs: Agents miscommunicate β Solution: Structured schemas via tool calling
Identity Chaos: Wrong data to wrong users β Solution: OAuth 2.1 for agents
The 2026 Memory Stack: Episodic, Semantic, and Procedural
Modern agents now use Long-Term Memory Modules (like Google’s Titans architecture and test-time memorization) that can handle context windows larger than 2 million tokens by incorporating “surprise” metrics to decide what to remember in real-time.
But even with these advances, you still need explicit state management. Why?
Memory without identity control means an agent might access customer data it shouldn’t
Replay requires traces: long-term memory helps, but you still need episodic traces (exact logs) for debugging and compliance
Speed matters: even with 2M token windows, fetching from a database is faster than scanning through 2M tokens
By 2026, the industry has moved beyond “just use a database” to Memory as a first-class design primitive. When you design variable passing now, think about three types of memory your agent needs to manage:
1. Episodic Memory (What happened in this session)
The action traces and exact events that occurred. Perfect for replay and debugging.
{
“session_id”: “sess_123”,
“timestamp”: “2026-02-03 14:05:12”,
“action”: “check_budget”,
“tool”: “salesforce_api”,
“input”: { “customer_id”: “cust_123” },
“output”: { “budget”: 50000 },
“agent_id”: “lead_scorer_v2”
}
Why it matters:
Replay exact sequence of events
Debug “why did the agent do that?”
Compliance audits
Learn from failures
2. Semantic Memory (What the agent knows)
Think of this as your agent’s “wisdom from experience.” The patterns it learns over time without retraining. For example, your lead scorer learns: SaaS companies close at 62% (when qualified), enterprise deals take 4 weeks on average, ops leaders decide in 2 weeks while CFOs take 4.
This knowledge compounds across sessions. The agent gets smarter without you lifting a finger.
{
“agent_id”: “lead_scorer_v2”,
“learned_patterns”: {
“conversion_rates”: {
“saas_companies”: 0.62,
“enterprise”: 0.58,
“startups”: 0.45
},
“decision_timelines”: {
“ops_leaders”: “2 weeks”,
“cfo”: “4 weeks”,
“cto”: “3 weeks”
}
},
“last_updated”: “2026-02-01”,
“confidence”: 0.92
}
Why it matters: agents learn from experience, better decisions over time, cross-session learning without retraining. Your lead scorer gets 15% more accurate over 3 months without touching the model.
3. Procedural Memory (How the agent operates)
The recipes or standard operating procedures the agent follows. Ensures consistency.
{
“workflow_id”: “lead_qualification_v2.1”,
“version”: “2.1”,
“steps”: [
{
“step”: 1,
“name”: “collect”,
“required_fields”: [“name”, “company”, “budget”],
“description”: “Gather lead basics”
},
{
“step”: 2,
“name”: “qualify”,
“scoring_criteria”: “check fit, timeline, budget”,
“min_score”: 75
},
{
“step”: 3,
“name”: “book”,
“conditions”: “score >= 75”,
“actions”: [“check_calendar”, “book_meeting”]
}
]
}
Why it matters: standard operating procedures ensure consistency, easy to update workflows (version control), new team members understand agent behavior, easier to debug (“which step failed?”).
The Protocol Moment: “HTTP for AI Agents”
In late 2025, the AI agent world had a problem: every tool worked differently, every integration was custom, and debugging was a nightmare. A few standards and proposals started showing up, but the practical fix is simpler: treat tools like APIs, and make every call schema-first.
Think of tool calling (sometimes called function calling) like HTTP for agents. Give the model a clear, typed contract for each tool, and suddenly variables stop leaking across steps.
The Problem Protocols (and Tool Calling) Solve
Without schemas (2024 chaos):
Agent says: “Call the calendar API”
Calendar tool responds: “I need customer_id and format it as UUID”
Agent tries: { “customer_id”: “123” }
Tool says: “That’s not a valid UUID”
Agent retries: { “customer_uid”: “cust-123-abc” }
Tool says: “Wrong field name, I need customer_id”
Agent: π‘
(This is Pain Point 2: Scope Confusion)
π
ββοΈ
Hand-rolled tool integrations (strings everywhere)
β
Schema-first tool calling (contracts + validation)
With schema-first tool calling, your tool layer publishes a tool catalog:
{
“tools”: [
{
“name”: “check_calendar”,
“input_schema”: {
“customer_id”: { “type”: “string”, “format”: “uuid” }
},
“output_schema”: {
“available_slots”: [{ “type”: “datetime” }]
}
}
]
}
Agent reads catalog once. Agent knows exactly what to pass. Agent constructs { “customer_id”: “550e8400-e29b-41d4-a716-446655440000” }. Tool validates using schema. Tool responds { “available_slots”: […] }. β Zero confusion, no retries and hallucination.
Real-World 2026 Status
Most production stacks are converging on the same idea: schema-first tool calling. Some ecosystems wrap it in protocols, some ship adapters, and some keep it simple with JSON schema tool definitions.
LangGraph (popular in 2026): a clean way to make variable flow explicit via a state machine, while still using the same tool contracts underneath.
Net takeaway: connectors and protocols will be in flux (Googleβs UCP is a recent example in commerce), but tool calling is the stable primitive you can design around.
Impact on Pain Point 2: Scope Confusion is Solved
By adopting schema-first tool calling, variable names match exactly (schema enforced), type mismatches are caught before tool calls, and output formats stay predictable. No more “does the tool expect customer_id or customer_uid?”
2026 Status: LARGELY SOLVED β . Schema-first tool calling means variable names and types are validated against contracts early. Most teams don’t see this anymore once they stop hand-rolling integrations.
2026 Solution: Agentic Identity Management
By 2026, best practice is to use OAuth 2.1 profiles specifically for agents.
{
“agent_id”: “lead_scorer_v2”,
“oauth_token”: “agent_token_xyz”,
“permissions”: {
“salesforce”: “read:leads,accounts”,
“hubspot”: “read:contacts”,
“calendar”: “read:availability”
},
“user_scoped”: {
“user_id”: “user_123”,
“tenant_id”: “org_456”
}
}
When Agent accesses a variable: Agent says “Get customer data for customer_id = 123”. Identity system checks “Agent has permissions? YES”. Identity system checks “Is customer_id in user_123’s tenant? YES”. System provides customer data. β No data leakage between tenants.
The Four Methods to Pass Variables
Method 1: Direct Pass (The Simple One)
Variables pass immediately from one step to the next.
Step 1 computes: total_amount = 5000
β
Step 2 immediately receives total_amount
β
Step 3 uses total_amount
Best for: simple, linear workflows (2-3 steps max), one-off tasks, speed-critical applications.
2026 Enhancement: add schema/type validation even for direct passes (tool calling). Catches bugs early.
β
GOOD: Direct pass with tool-calling schema validation
from pydantic import BaseModel
class TotalOut(BaseModel):
total_amount: float
def calculate_total(items: list[dict]) -> dict:
total = sum(item[“price”] for item in items)
return TotalOut(total_amount=total).model_dump()
β οΈ WARNING: Direct Pass might seem simple, but it fails catastrophically in production when steps are added later (you now have 5 instead of 2), error handling is needed (what if step 2 fails?), or debugging is required (you can’t replay the sequence). Start with Method 2 (Variable Repository) unless you’re 100% certain your workflow will never grow.
Method 2: Variable Repository (The Reliable One)
Shared storage (database, Redis) where all steps read/write variables.
Step 1 stores: customer_name, order_id
β
Step 5 reads: same values (no re-asking)
2026 Architecture (with Memory Types):
β
GOOD: Variable Repository with three memory types
# Episodic Memory: Exact action traces
episodic_store = {
“session_id”: “sess_123”,
“traces”: [
{
“timestamp”: “2026-02-03 14:05:12”,
“action”: “asked_for_budget”,
“result”: “$50k”,
“agent”: “lead_scorer_v2”
}
]
}
# Semantic Memory: Learned patterns
semantic_store = {
“agent_id”: “lead_scorer_v2”,
“learned”: {
“saas_to_close_rate”: 0.62
}
}
# Procedural Memory: Workflows
procedural_store = {
“workflow_id”: “lead_qualification”,
“steps”: […]
}
# Identity layer (NEW 2026)
identity_layer = {
“agent_id”: “lead_scorer_v2”,
“user_id”: “user_123”,
“permissions”: “read:leads, write:qualification_score”
}
Who uses this (2026): yellow.ai, Agent.ai, Amazon Bedrock Agents, CrewAI (with tool calling + identity layer).
Best for: multi-step workflows (3+ steps), multi-turn conversations, production systems with concurrent users.
Method 3: File System (The Debugger’s Best Friend)
Quick note on agentic file search vs RAG:
If an agent can browse a directory, open files, and grep content, it can sometimes beat classic vector search on correctness when the underlying files are small enough to fit in context. But as file collections grow, RAG often wins on latency and predictability. In practice, teams end up hybrid: RAG for fast retrieval, filesystem tools for deep dives, audits, and βshow me the exact lineβ moments. (A recent benchmark-style discussion: Vector Search vs Filesystem Tools.)
Variables saved as files (JSON, logs). Still excellent for code generation and sandboxed agents (Manus, AgentFS, Dust).
Best for: long-running tasks, code generation agents, when you need perfect audit trails.
Method 4: State Machines + Database (The Gold Standard)
Explicit state machine with database persistence. Transitions are code-enforced. 2026 Update: “Checkpoint-Aware” State Machines.
state_machine = {
“current_state”: “qualification”,
“checkpoint”: {
“timestamp”: “2026-02-03 14:05:26”,
“state_data”: {…},
“recovery_point”: True # β If agent crashes here, it resumes from checkpoint
}
}
Real companies using this (2026): LangGraph (graph-driven, checkpoint-aware), CrewAI (role-based, with tool calling + state machine), AutoGen (conversation-centric, with recovery), Temporal (enterprise workflows).
Best for: complex, multi-step agents (5+ steps), production systems at scale, mission-critical, regulated environments.
The 2026 Framework Comparison
Framework
Philosophy
Best For
2026 Status
LangGraph
Graph-driven state orchestration
Production, non-linear logic
The Winner β tool calling integrated
CrewAI
Role-based collaboration
Digital teams (creative/marketing)
Rising β tool calling support added
AutoGen
Conversation-centric
Negotiation, dynamic chat
Specialized β Agent conversations
Temporal
Workflow orchestration
Enterprise, long-running
Solid β Regulated workflows
How to Pick the Best Method: Updated Decision Framework
π¦ Quick Decision Flowchart
START
β
Is it 1-2 steps? β YES β Direct Pass
β NO
Does it need to survive failures? β NO β Variable Repository
β YES
Mission-critical + regulated? β YES β State Machine + Full Stack
β NO
Multi-agent + multi-tenant? β YES β LangGraph + tool calling + Identity
β NO
Good engineering team? β YES β LangGraph
β NO
Need fast shipping? β YES β CrewAI
β
State Machine + DB (default)
By Agent Complexity
Agent Type
2026 Method
Why
Simple Reflex
Direct Pass
Fast, minimal overhead
Single-Step
Direct Pass
One-off tasks
Multi-Step (3-5)
Variable Repository
Shared context, episodic memory
Long-Running
File System + State Machine
Checkpoints, recovery
Multi-Agent
Variable Repository + Tool Calling + Identity
Structured handoffs, permission control
Production-Critical
State Machine + DB + Agentic Identity
Replay, auditability, compliance
By Use Case (2026)
Use Case
Method
Companies
Identity Control
Chatbots/CX
Variable Repo + Tool Calling
yellow.ai, Agent.ai
User-scoped
Workflow Automation
Direct Pass + Schema Validation
n8n, Power Automate
Optional
Code Generation
File System + Episodic Memory
Manus, AgentFS
Sandboxed (safe)
Enterprise Orchestration
State Machine + Agentic Identity
LangGraph, CrewAI
OAuth 2.1 for agents
Regulated (Finance/Health)
State Machine + Episodic + Identity
Temporal, custom
Full audit trail required
Real Example: How to Pick
Scenario: Lead qualification agent
Requirements: (1) Collect lead info (name, company, budget), (2) Ask qualifying questions, (3) Score the lead, (4) Book a meeting if qualified, (5) Send follow-up email.
Decision Process (2026):
Q1: How many steps? A: 5 steps β Not Direct Pass β
Q2: Does it need to survive failures? A: Yes, can’t lose lead data β Need State Machine β
Q3: Multiple agents involved? A: Yes (scorer + booker + email sender) β Need tool calling β
Q4: Multi-tenant (multiple users)? A: Yes β Need Agentic Identity β
Q5: How mission-critical? A: Drives revenue β Need audit trail β
Q6: Engineering capacity? A: Small team, ship fast β Use LangGraph β
(LangGraph handles state machine + tool calling + checkpoints)
2026 Architecture:
β
GOOD: LangGraph with proper state management and identity
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
# Define state structure
class AgentState(TypedDict):
# Lead data
customer_name: str
company: str
budget: int
score: int
# Identity context (passed through state)
user_id: str
tenant_id: str
oauth_token: str
# Memory references
episodic_trace: list
learned_patterns: dict
# Create graph with state
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node(“collect”, collect_lead_info)
workflow.add_node(“qualify”, ask_qualifying_questions)
workflow.add_node(“score”, score_lead)
workflow.add_node(“book”, book_if_qualified)
workflow.add_node(“followup”, send_followup_email)
# Define edges
workflow.add_edge(START, “collect”)
workflow.add_edge(“collect”, “qualify”)
workflow.add_edge(“qualify”, “score”)
workflow.add_conditional_edges(
“score”,
lambda state: “book” if state[“score”] >= 75 else “followup”
)
workflow.add_edge(“book”, “followup”)
workflow.add_edge(“followup”, END)
# Compile with checkpoints (CRITICAL: Don’t forget this!)
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)
# tool-calling-ready tools
tools = [
check_calendar, # tool-calling-ready
book_meeting, # tool-calling-ready
send_email # tool-calling-ready
]
# Run with identity in initial state
initial_state = {
“user_id”: “user_123”,
“tenant_id”: “org_456”,
“oauth_token”: “agent_oauth_xyz”,
“episodic_trace”: [],
“learned_patterns”: {}
}
# Execute with checkpoint recovery enabled
result = app.invoke(
initial_state,
config={“configurable”: {“thread_id”: “sess_123”}}
)
β οΈ COMMON MISTAKE: Don’t forget to compile with a checkpointer! Without it, your agent can’t recover from crashes.
β BAD: No checkpointer
app = workflow.compile()
β
GOOD: With checkpointer
from langgraph.checkpoint.memory import MemorySaver
app = workflow.compile(checkpointer=MemorySaver())
Result: state machine enforces “collect β qualify β score β book β followup”, agentic identity prevents accessing wrong customer data, episodic memory logs every action (replay for debugging), tool calling ensures tools are called with correct parameters, checkpoints allow recovery if agent crashes, full audit trail for compliance.
Best Practices for 2026
1. π§ Define Your Memory Stack
Your memory architecture determines how well your agent learns and recovers. Choose stores that match each memory type’s purpose: fast databases for episodic traces, vector databases for semantic patterns, and version control for procedural workflows.
{
“episodic”: {
“store”: “PostgreSQL”,
“retention”: “90 days”,
“purpose”: “Replay and debugging”
},
“semantic”: {
“store”: “Vector DB (Pinecone/Weaviate)”,
“retention”: “Indefinite”,
“purpose”: “Cross-session learning”
},
“procedural”: {
“store”: “Git + Config Server”,
“retention”: “Versioned”,
“purpose”: “Workflow definitions”
}
}
This setup gives you replay capabilities (PostgreSQL), cross-session learning (Pinecone), and workflow versioning (Git). Production teams report 40% faster debugging with proper memory separation.
Practical Implementation:
β
GOOD: Complete memory stack implementation
# 1. Episodic Memory (PostgreSQL)
from sqlalchemy import create_engine, Column, String, JSON, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class EpisodicTrace(Base):
__tablename__ = ‘episodic_traces’
id = Column(String, primary_key=True)
session_id = Column(String, index=True)
timestamp = Column(DateTime, index=True)
action = Column(String)
tool = Column(String)
input_data = Column(JSON)
output_data = Column(JSON)
agent_id = Column(String, index=True)
user_id = Column(String, index=True)
engine = create_engine(‘postgresql://localhost/agent_memory’)
Base.metadata.create_all(engine)
# 2. Semantic Memory (Vector DB)
from pinecone import Pinecone
pc = Pinecone(api_key=”your-api-key”)
semantic_index = pc.Index(“agent-learnings”)
# Store learned patterns
semantic_index.upsert(vectors=[{
“id”: “lead_scorer_v2_pattern_1”,
“values”: embedding, # Vector embedding of the pattern
“metadata”: {
“agent_id”: “lead_scorer_v2”,
“pattern_type”: “conversion_rate”,
“industry”: “saas”,
“value”: 0.62,
“confidence”: 0.92
}
}])
# 3. Procedural Memory (Git + Config Server)
import yaml
workflow_definition = {
“workflow_id”: “lead_qualification”,
“version”: “2.1”,
“changelog”: “Added budget verification”,
“steps”: [
{“step”: 1, “name”: “collect”, “required_fields”: [“name”, “company”, “budget”]},
{“step”: 2, “name”: “qualify”, “scoring_criteria”: “fit, timeline, budget”},
{“step”: 3, “name”: “book”, “conditions”: “score >= 75”}
]
}
with open(‘workflows/lead_qualification_v2.1.yaml’, ‘w’) as f:
yaml.dump(workflow_definition, f)
2. π Adopt Tool Calling From Day One
Tool calling eliminates variable naming mismatches and makes tools self-documenting. Instead of maintaining separate API docs, your tool definitions include schemas that agents can read and validate against automatically.
Every tool should be schema-first so agents can auto-discover and validate them.
β
GOOD: Tool definition with full schema
# Tool calling (function calling) = schema-first contracts for tools
tools = [
{
“type”: “function”,
“function”: {
“name”: “check_calendar”,
“description”: “Check calendar availability for a customer”,
“parameters”: {
“type”: “object”,
“properties”: {
“customer_id”: {“type”: “string”},
“start_date”: {“type”: “string”},
“end_date”: {“type”: “string”}
},
“required”: [“customer_id”, “start_date”, “end_date”]
}
}
}
]
# Your agent passes this tool schema to the model.
# The model returns a structured tool call with args that match the contract.
Now agents can auto-discover and validate this tool without manual integration work.
3. π Implement Agentic Identity (OAuth 2.1 for Agents)
Just as users need permissions, agents need scoped access to data. Without identity controls, a lead scorer might accidentally access customer data from the wrong tenant, creating security violations and compliance issues.
2026 approach: Agents have OAuth tokens, just like users do.
β
GOOD: Agent context with OAuth 2.1
# Define agent context with OAuth 2.1
agent_context = {
“agent_id”: “lead_scorer_v2”,
“user_id”: “user_123”,
“tenant_id”: “org_456”,
“oauth_token”: “agent_token_xyz”,
“scopes”: [“read:leads”, “write:qualification_score”]
}
When agent accesses a variable, identity is checked:
β
GOOD: Complete identity and permission system
from functools import wraps
from typing import Callable, Any
from datetime import datetime
class PermissionError(Exception):
pass
class SecurityError(Exception):
pass
def check_agent_permissions(func: Callable) -> Callable:
“””Decorator to enforce identity checks on variable access”””
@wraps(func)
def wrapper(var_name: str, agent_context: dict, *args, **kwargs) -> Any:
# 1. Check if agent has permission to access this variable type
required_scope = get_required_scope(var_name)
if required_scope not in agent_context.get(‘scopes’, []):
raise PermissionError(
f”Agent {agent_context[‘agent_id’]} lacks scope ‘{required_scope}’ ”
f”required to access {var_name}”
)
# 2. Check if variable belongs to agent’s tenant
variable_tenant = get_variable_tenant(var_name)
agent_tenant = agent_context.get(‘tenant_id’)
if variable_tenant != agent_tenant:
raise SecurityError(
f”Variable {var_name} belongs to tenant {variable_tenant}, ”
f”but agent is in tenant {agent_tenant}”
)
# 3. Log the access for audit trail
log_variable_access(
agent_id=agent_context[‘agent_id’],
user_id=agent_context[‘user_id’],
variable_name=var_name,
access_type=’read’,
timestamp=datetime.utcnow()
)
return func(var_name, agent_context, *args, **kwargs)
return wrapper
def get_required_scope(var_name: str) -> str:
“””Map variable names to required OAuth scopes”””
scope_mapping = {
‘customer_name’: ‘read:leads’,
‘customer_email’: ‘read:leads’,
‘customer_budget’: ‘read:leads’,
‘qualification_score’: ‘write:qualification_score’,
‘meeting_scheduled’: ‘write:calendar’
}
return scope_mapping.get(var_name, ‘read:basic’)
def get_variable_tenant(var_name: str) -> str:
“””Retrieve the tenant ID associated with a variable”””
# In production, this would query your variable repository
from database import variable_store
variable = variable_store.get(var_name)
return variable[‘tenant_id’] if variable else None
def log_variable_access(agent_id: str, user_id: str, variable_name: str,
access_type: str, timestamp: datetime) -> None:
“””Log all variable access for compliance and debugging”””
from database import audit_log
audit_log.insert({
‘agent_id’: agent_id,
‘user_id’: user_id,
‘variable_name’: variable_name,
‘access_type’: access_type,
‘timestamp’: timestamp
})
@check_agent_permissions
def access_variable(var_name: str, agent_context: dict) -> Any:
“””Fetch variable with identity checks”””
from database import variable_store
return variable_store.get(var_name)
# Usage
try:
customer_budget = access_variable(‘customer_budget’, agent_context)
except PermissionError as e:
print(f”Access denied: {e}”)
except SecurityError as e:
print(f”Security violation: {e}”)
This decorator pattern ensures every variable access is logged, scoped, and auditable. Multi-tenant SaaS platforms using this approach report zero cross-tenant data leaks.
4. βοΈ Make State Machines Checkpoint-Aware
Checkpoints let your agent resume from failure points instead of restarting from scratch. This saves tokens, reduces latency, and prevents data loss when crashes happen mid-workflow.
2026 pattern: Automatic recovery
# Add checkpoints after critical steps
state_machine.add_checkpoint_after_step(“collect”)
state_machine.add_checkpoint_after_step(“qualify”)
state_machine.add_checkpoint_after_step(“score”)
# If agent crashes at “book”, restart from “score” checkpoint
# Not from beginning (saves time and money)
In production, this means a 30-second workflow doesn’t need to repeat the first 25 seconds just because the final step failed. LangGraph and Temporal both support this natively.
5. π¦ Version Everything (Including Workflows)
Treat workflows like code: deploy v2.1 alongside v2.0, roll back easily if issues arise.
# Version your workflows
workflow_v2_1 = {
“version”: “2.1”,
“changelog”: “Added budget verification before booking”,
“steps”: […]
}
Versioning lets you A/B test workflow changes, roll back bad deploys instantly, and maintain audit trails for compliance. Store workflows in Git alongside your code for single-source-of-truth version control.
6. π Build Observability In From Day One
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π OBSERVABILITY CHECKLIST β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
Log every state transition β
β β
Log every variable change β
β β
Log every tool call (input + output) β
β β
Log every identity/permission check β
β β
Track latency per step β
β β
Track cost (tokens, API calls, infra) β
β β
β π‘ Pro tip: Use structured logging (JSON) so you can β
β query logs programmatically when debugging. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Without observability, debugging a multi-step agent is guesswork. With it, you can replay exact sequences, identify bottlenecks, and prove compliance. Teams with proper observability resolve production issues 3x faster.
The 2026 Architecture Stack
Here’s what a production agent looks like in 2026:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph / CrewAI / Temporal (Orchestration Layer) β
β – State machine (enforces workflow) β
β – Checkpoint recovery β
β – Agentic identity management β
ββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββ
β β β
ββββββββΌβββββ ββββββββΌββββββ βββββΌββββββββ
β Agent 1 β β Agent 2 β β Agent 3 β
β(schema-aware)βββββββΆβ(schema-aware) βββΆβ(schema-aware)β
βββββββββββββ ββββββββββββββ βββββββββββββ
β β β
ββββββββββββββββββββΌβββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββ
β β
ββββββββΌββββββββββββββ βββββββββββββββββΌβββββββββββ
βVariable Repository β βIdentity & Access Layer β
β(Episodic Memory) β β(OAuth 2.1 for Agents) β
β(Semantic Memory) β β β
β(Procedural Memory) β ββββββββββββββββββββββββββββ
ββββββββββββββββββββββ
β
ββββββββΌβββββββββββββββ
β Tool Registry (schemas) β
β(Standardized Tools) β
ββββββββββββββββββββββ
β
ββββββββΌββββββββββββββββββββββββββββββ
βObservability & Audit Layer β
β- Logging (episodic traces) β
β- Monitoring (latency, cost) β
β- Compliance (audit trail) β
βββββββββββββββββββββββββββββββββββββββ
Your 2026 Checklist: Before You Ship
Before deploying your agent to production, verify:
Core Framework
Is your framework tool-calling-ready? (LangGraph, CrewAI, or Temporal preferred)
Do you have an episodic memory store? (PostgreSQL, logs for replay and debugging)
Is your state machine checkpoint-aware? (Can resume from failures without restarting)
Identity & Security
Have you defined agentic identity controls? (OAuth 2.1 tokens, per-agent permissions)
Is identity checked before every variable access? (User-scoped, tenant-scoped, permission-checked)
Tools & Standards
Are all tools schema-validated? (Input/output schemas defined and enforced)
Memory Architecture
Do you have three memory types?
Episodic (action traces)
Semantic (learned patterns)
Procedural (workflows)
Observability
Are you logging every state transition?
Are you logging every variable change?
Are you logging every tool call?
Are you logging every permission check?
Recovery & Versioning
Can you replay the entire agent run? (From episodic traces, for debugging)
Is your workflow versioned? (Can roll back if issues arise)
Cost Management
Do you have cost tracking per agent? (Tokens, API calls, infrastructure)
Conclusion: The 2026 Agentic Future
The agents that win in 2026 will need more than just better prompts. They’re the ones with proper state management, schema-standardized tool access, agentic identity controls, three-tier memory architecture, checkpoint-aware recovery and full observability.
State Management and Identity and Access Control are probably the hardest parts about building AI agents.
Now you know how to get both right.
Last Updated: February 3, 2026
Start building. π
About This Guide
This guide was written in February 2026, reflecting the current state of AI agent development. It incorporates lessons learned from production deployments at Nanonets Agents and also from the best practices we noticed in the current ecosystem.
Version: 2.1
Last Updated: February 3, 2026
