Agentic Implementation in AI (2025): A Practitioner’s Guide

How to choose between workflows and agents, implement each pattern with OpenAI, and glue everything together with LangGraph, LangChain, LlamaIndex, and vector databases like Qdrant/Chroma.

TL;DR (what you’ll get)

A crisp, workflows vs. agents comparison you can use in design reviews. (Source: Anthropic)
Anthropic’s taxonomy of common agentic patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer, and fully autonomous agents) with “when to use” guidance. (Source: Anthropic)
OpenAI-first code (Python) for each flow, with optional LangGraph/LangChain/LlamaIndex wiring. (Source: platform.openai.com)
Minimal Qdrant and Chroma examples for retrieval.
A complete end-to-end agentic recipe for a customer-support style app (routing → retrieve → act → review → approve).

1) Workflows vs. Agents: What’s the difference?

Anthropic’s definition is the cleanest I’ve seen:

Workflows: LLMs + tools orchestrated via predefined code paths (predictable).
Agents: LLMs dynamically direct their own process and tool usage, looping with feedback until done (flexible). Anthropic

Quick comparison:

Dimension	Workflows	Agents
Control flow	Fixed / code-driven	Model-driven (plans/tools in a loop)
Predictability	High	Lower (but more adaptive)
Best for	Well-defined tasks, SLAs	Open-ended tasks, variable steps
Cost/latency	Lower, bounded	Higher, variable (looping/tool calls)
Guardrails	Straightforward	Essential (limits, checks, sandbox)
Typical framework	DAGs / graphs	Graphs with loops + memory / state

Rule of thumb: start simple; only add agentic complexity when it measurably improves outcomes. (Source: Anthropic)

2) The building block: An augmented LLM

Before choosing a pattern, design an LLM with:

Retrieval (vector search over your docs/data),
Tools/actions (APIs to take real-world steps),
Memory/state (short- or long-term).
Anthropic calls this the augmented LLM; everything else composes on top.

Mermaid diagram – augmented LLM context:

(DIAGRAM HERE)

3) Anthropic’s patterns (and how to code them)

Below I summarize Anthropic’s workflows and agents (with when-to-use) and show OpenAI-first Python snippets. Patterns are not mutually exclusive. Combine as needed.

3.1 Prompt Chaining (Workflow)

When: Break a task into clear, fixed steps (e.g., outline → draft → QA)

# Prompt chaining with OpenAI (Python)
from openai import OpenAI
client = OpenAI()

def gen_outline(topic):
    return client.responses.create(
        model="gpt-4.1-mini",
        input=f"Create a bullet outline for a blog on: {topic}"
    ).output_text

def improve_outline(outline):
    return client.responses.create(
        model="gpt-4.1-mini",
        input=f"Critique & improve this outline, keep bullets:\n{outline}"
    ).output_text

def write_article(outline):
    return client.responses.create(
        model="gpt-4.1",
        input=f"Write a detailed article strictly following this outline:\n{outline}"
    ).output_text

outline = gen_outline("Agentic AI patterns")
outline = improve_outline(outline)
article = write_article(outline)
print(article)

OpenAI’s Responses API and Agents SDK are the recommended interfaces in 2025 for text + tool calling; the SDK gives you nice utilities for tools & tracing.

3.2 Routing (Workflow)

When: Inputs fall into distinct categories handled by different prompts/tools/models (e.g., billing vs. tech support; easy vs. hard routed to small vs. large model).

# Simple router → then task-specific prompts
from openai import OpenAI
client = OpenAI()

CATEGORIES = ["billing", "tech_support", "general"]

def route_query(q):
    r = client.responses.create(
        model="gpt-4.1-mini",
        input=f"Classify into {CATEGORIES}. Only output a single label.\n\nQuery: {q}"
    ).output_text.strip().lower()
    return r if r in CATEGORIES else "general"

def handle(q):
    c = route_query(q)
    prompt = {
        "billing": "You are a billing assistant. Answer precisely.",
        "tech_support": "You are a technical support assistant. Be diagnostic.",
        "general": "Be concise and helpful."
    }[c]
    return client.responses.create(model="gpt-4.1-mini",
                                   input=f"{prompt}\nUser: {q}").output_text

print(handle("Refund my last invoice, please"))

3.3 Parallelization (Workflow)

When: Subtasks are independent (speed via parallel) or you want voting across diverse prompts.

# Parallel sectioning / voting with asyncio
import asyncio
from openai import AsyncOpenAI
aclient = AsyncOpenAI()

async def summarize(section):
    r = await aclient.responses.create(model="gpt-4.1-mini",
                                       input=f"Summarize:\n{section}")
    return r.output_text

async def main(sections):
    results = await asyncio.gather(*[summarize(s) for s in sections])
    # Voting/aggregation:
    final = "\n".join(results)
    return final

summary = asyncio.run(main(["part A...", "part B...", "part C..."]))
print(summary)

3.4 Orchestrator-Workers (Workflow)

When: Complex tasks with unknown subtasks; central LLM plans & delegates to worker LLMs/tools, then synthesizes.

This is where LangGraph shines: it’s a low-level orchestration framework for stateful, long-running agents/workflows—with cycles, branches, persistence, and human-in-the-loop. (https://www.langchain.com/langgraph)

# Orchestrator-workers in LangGraph (minimal sketch)
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, List

class State(TypedDict):
    task: str
    plan: List[str]
    results: List[str]

def plan_node(state: State):
    # (call OpenAI to create a plan as bullet steps)
    # ... set state["plan"] = [...]
    return state

def worker_node(state: State):
    # (iterate one step of the plan: search, code, call a tool, etc.)
    # ... append to state["results"]
    return state

def decide_next(state: State):
    return END if len(state["results"]) >= len(state["plan"]) else "worker"

graph = StateGraph(State)
graph.add_node("plan", plan_node)
graph.add_node("worker", worker_node)
graph.set_entry_point("plan")
graph.add_edge("plan", "worker")
graph.add_conditional_edges("worker", decide_next, {END: END, "worker":"worker"})
app = graph.compile(checkpointer=MemorySaver())

# run
out = app.invoke({"task":"Implement feature X"}, config={"configurable":{"thread_id":"t1"}})

LangChain explicitly recommends building new agents on LangGraph.

3.5 Evaluator-Optimizer (Workflow)

When: Iterative improvement with clear evaluation criteria (e.g., code review, style guide conformance).

# Generator + Critic loop
from openai import OpenAI
client = OpenAI()

def generate(spec):
    return client.responses.create(model="gpt-4.1",
                                   input=f"Write code per spec:\n{spec}").output_text

def critique(code):
    return client.responses.create(model="gpt-4.1-mini",
                                   input=f"Critique this code; bullet list of fixes:\n```py\n{code}\n```").output_text

def apply_fixes(code, feedback):
    return client.responses.create(model="gpt-4.1",
                                   input=f"Improve the code using this feedback:\n{feedback}\n\nCode:\n```py\n{code}\n```").output_text

code = generate("CLI that greets a name")
for _ in range(2):
    fb = critique(code)
    code = apply_fixes(code, fb)
print(code)

3.6 Agents (Autonomous)

When: Open-ended, multi-step tasks where you can’t predefine the path; the model plans, uses tools, and loops—with limits & checkpoints.

# Minimal tool-using agent loop with OpenAI Responses (function tools)
from openai import OpenAI
client = OpenAI()

tools = [{
  "type": "function",
  "function": {
    "name": "lookup_order",
    "description": "Return order status by id",
    "parameters": {
      "type":"object",
      "properties":{"order_id":{"type":"string"}},
      "required":["order_id"]
    }
  }
}]

def lookup_order(order_id: str):
    # replace with real DB/API call
    return {"order_id": order_id, "status": "shipped"}

def step(messages):
    r = client.responses.create(
        model="gpt-4.1",
        input=messages,
        tools=tools,
        tool_choice="auto"
    )
    return r

messages = [{"role":"system","content":"You are a helpful agent."},
            {"role":"user","content":"Where is order 12345?"}]

for _ in range(5):  # safety cap
    resp = step(messages)
    out = resp.output[0]
    # Tool call?
    if getattr(out, "type", "") == "tool_call":
        call = out
        if call.name == "lookup_order":
            result = lookup_order(**call.arguments)
            messages += [
              {"role":"assistant","content":[{"type":"tool_result","tool_call_id":call.id,"output":str(result)}]}
            ]
    else:
        print(resp.output_text)  # final answer
        break

For a higher-level wrapper, check the OpenAI Agents SDK (function tools, Pydantic validation, built-in tracing).

4) Retrieval: Qdrant & Chroma (vector DBs)

Agentic systems usually need fast, filtered retrieval. Two strong options:

Qdrant (production-grade; filters, payloads, hybrid search)

Insert & search (Python client)

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

client = QdrantClient(host="localhost", port=6333)

if not client.collection_exists("docs"):
    client.create_collection(
        collection_name="docs",
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )

points = [
    PointStruct(id=1, vector=[0.1]*1536, payload={"title":"Policy A", "type":"policy"}),
    PointStruct(id=2, vector=[0.2]*1536, payload={"title":"Runbook X", "type":"runbook"}),
]
client.upsert(collection_name="docs", points=points)

# Search
hits = client.search(collection_name="docs", query_vector=[0.1]*1536, limit=3)
print(hits)

Qdrant supports payloads, rich filters, hybrid sparse/dense search (e.g., BM42), and async APIs; it integrates smoothly with LlamaIndex.

Chroma (fast local dev; simple API)

Add & query

import chromadb
from chromadb.utils import embedding_functions

client = chromadb.Client()
collection = client.create_collection(name="docs")

# (for custom embedders, see Chroma embedding functions)
collection.add(ids=["1","2"],
               documents=["Policy A text ...", "Runbook X text ..."])

res = collection.query(query_texts=["refund policy"],
                       n_results=3)
print(res)

Chroma is great for local prototyping and small apps; Qdrant is excellent when you need filters, performance, and scale.

5) Frameworks that help (and when)

LangGraph: State-machine graphs for long-running, stateful agents; cycles, persistent checkpoints, and human-in-the-loop moderation. Also ships a Platform/Studio for deployment, visualization, and debugging.
LangChain: Building blocks for prompts, retrievers, tools; now points to LangGraph for agents.
LlamaIndex: Quick routes to agents, query engines, and vector store integrations (Qdrant, others).

Minimal LangGraph “router → retrieve → answer → reflect”

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
from openai import OpenAI
client = OpenAI()

class S(TypedDict):
    question: str
    route: Literal["billing","tech","general"]
    context: str
    answer: str

def route_node(s:S):
    r = client.responses.create(model="gpt-4.1-mini",
        input=f"Label as one of [billing, tech, general]. Only label.\n\n{s['question']}")
    s["route"] = r.output_text.strip().lower()
    return s

def retrieve_node(s:S):
    # pretend retrieval (plug in Qdrant/Chroma here)
    s["context"] = "doc snippets ..."
    return s

def answer_node(s:S):
    msg = f"Context:\n{s['context']}\n\nQ: {s['question']}\nA:"
    s["answer"] = client.responses.create(model="gpt-4.1", input=msg).output_text
    return s

def reflect_node(s:S):
    critique = client.responses.create(model="gpt-4.1-mini",
        input=f"Critique & improve the answer for correctness and tone.\n\n{s['answer']}").output_text
    s["answer"] = critique
    return s

g = StateGraph(S)
g.add_node("route", route_node)
g.add_node("retrieve", retrieve_node)
g.add_node("answer", answer_node)
g.add_node("reflect", reflect_node)
g.set_entry_point("route")
g.add_edge("route","retrieve")
g.add_edge("retrieve","answer")
g.add_edge("answer","reflect")
g.add_edge("reflect", END)
app = g.compile()

print(app.invoke({"question":"How do I get a refund?"})["answer"])

6) Anthropic’s classification (at a glance)

Anthropic frames workflows as predictable compositions (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) and agents as autonomous loops with tools + checkpoints. The “when to use” bullets above are lifted from their guidance; the full write-up is gold for teams deciding where to spend complexity. (Source: Anthropic)

7) End-to-end example: a Support Agent (hybrid)

Goal: Resolve customer queries with routing → retrieval → tool use (refund) → evaluation → optional human approval.

Code (condensed; swap your real search + refund API):

from openai import OpenAI
from qdrant_client import QdrantClient
client = OpenAI()
qdrant = QdrantClient(host="localhost", port=6333)

def retrieve(query: str, k: int = 5):
    # ... embed query; qdrant.search(...) return top docs as text
    return "retrieved docs ..."

def refund(order_id: str):
    # call real system
    return {"order_id": order_id, "status": "refunded"}

tools = [{
  "type":"function",
  "function":{
    "name":"refund",
    "description":"Issue a refund by order ID",
    "parameters":{"type":"object","properties":{"order_id":{"type":"string"}}, "required":["order_id"]}
  }
}]

def route(q):
    r = client.responses.create(model="gpt-4.1-mini",
        input=f"Classify as billing, tech, general. Only label.\n\n{q}")
    return r.output_text.strip().lower()

def decide_action(question, context):
    r = client.responses.create(model="gpt-4.1",
        input=f"You can answer or call refund(order_id) if needed.\nContext:\n{context}\n\nQ:{question}",
        tools=tools, tool_choice="auto")
    return r

def agent(question):
    bucket = route(question)
    context = retrieve(question)
    for _ in range(4):  # safety cap
        r = decide_action(question, context)
        out = r.output[0]
        if getattr(out, "type","") == "tool_call":
            if out.name == "refund":
                res = refund(**out.arguments)
                # feed tool result back
                question += f"\nTool result: {res}"
                continue
        # Final answer (optionally send through evaluator/critic like earlier)
        return r.output_text

print(agent("Please refund order 12345; it arrived damaged."))

This blends routing, retrieval, tool use, and (optionally) an evaluator loop—a pragmatic hybrid most teams ship first.

8) Operational must-haves (production)

Guardrails & limits: max iterations, timeouts, spend caps, tool allow-lists, sandboxed actions, and human checkpoints. Anthropic
Observability: traces, token/cost, success metrics. LangGraph Studio/Platform + LangSmith are designed for this. LangChain Docs
Evaluate regularly: task success rate, time-to-resolution, human fallback rate.
Prefer simpler patterns when they perform similarly—it’s cheaper and easier to debug.

9) LlamaIndex agent snippets (alternative stack)

LlamaIndex provides quick starts for agents and vector stores (Qdrant, etc.).

# Tiny LlamaIndex agent (function-agent style; API may vary by version)
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI as LIOpenAI

llm = LIOpenAI(model="gpt-4.1-mini")
def tool_search(query: str) -> str:
    # real search here
    return "doc snippet..."

agent = FunctionAgent.from_functions(llm=llm, functions=[tool_search])
print(agent.chat("Find me our refund policy and summarize it."))

(Check your installed version’s API; LlamaIndex evolves quickly.)

10) Choosing your pattern (decision cheatsheet)

Single-shot question answering with docs? → Augmented LLM + retrieval (no agent).
Predictable steps? → Workflow (prompt chaining or routing).
Unknown steps / tool calls / retries? → Agent with limits + checkpoints.
Needs scale and control (retries, reflection, approvals)? → LangGraph graph with loops, memory, and human-in-the-loop.

References & further reading

Anthropic – Building Effective Agents (definitions, patterns, when-to-use). (Source: Anthropic)
OpenAI – Agents SDK & Developer quickstart (responses, tools, agents). (Source: platform.openai.com)
LangGraph – Concepts, reference, and why to use it for modern agents. (Source: langchain-ai.github.io)
LangChain – Agents overview (and migration note to LangGraph). (Source: python.langchain.com)
LlamaIndex – Agents and Qdrant integrations. (Source: LlamaIndex)
Qdrant – Python client examples (create/upsert/search) & docs. (Source: python-client.qdrant.tech)
Chroma – Getting started & query docs. (Source: Chroma Docs)