The Hidden Cost of Building AI Agents with LangChain, CrewAI, and AutoGen

April 10, 2026 · 6 min read

Engineering at Ariftly

You found LangChain six months ago and built something impressive in a weekend. Or maybe it was CrewAI, or AutoGen, or the OpenAI Assistants API. The demo worked. Your team was excited. Then you tried to put it in production.

This is the story of almost every engineering team that has tried to build AI agents from a framework.

What the frameworks are actually good at

Let's be honest about what LangChain, CrewAI, and AutoGen do well:

LangChain is excellent for rapid prototyping. Its chain abstractions make it easy to wire LLM calls together, and its massive library of integrations means you can connect to almost anything. If you need to build a proof-of-concept quickly and demonstrate a concept to stakeholders, LangChain is fast.

CrewAI makes multi-agent orchestration approachable. The "role, goal, backstory" pattern for defining agents is intuitive, and seeing multiple agents collaborate on a task is genuinely compelling in a demo.

AutoGen (now AG2) is strong for experimental multi-agent research and complex reasoning chains. Microsoft's investment in the framework shows — it handles sophisticated conversational patterns that simpler frameworks can't.

None of these criticisms are wrong. The frameworks work. The problem is what happens after the demo.

The production gap

Building an AI agent that works in a demo and building an AI agent that works reliably in production for real business users are two very different problems.

Here is what the frameworks don't handle:

Human-in-the-loop approvals — when the agent decides to send an email, book a meeting, update a CRM record, or execute any other real-world action, how does a human review and approve that action? The frameworks don't have a first-class approval primitive. You build it yourself. This ends up being a significant project — you need a UI, a notification system, a state machine for tracking pending approvals, and a way to get the approval decision back to the agent.

Credential management — your agents need access to real integrations: Gmail, GitHub, HubSpot, Slack. Managing OAuth tokens, rotating credentials, handling token refresh, and ensuring tokens are scoped correctly is plumbing work. Lots of it.

Observability — when an agent fails, what do you look at? LangSmith (LangChain's tracing tool) helps with chain-level visibility, but it's another service to manage, another cost to track, and it only covers LangChain. If you're using multiple frameworks, you're stitching together multiple observability tools.

Tenant isolation — if you're building a product where multiple organizations each get their own agent, you need to ensure data never crosses tenant boundaries. Every integration call, every memory store read, every tool invocation needs to be scoped to the right tenant. Implementing this correctly is not hard, but it requires discipline and discipline is hard to maintain as the codebase grows.

Reliability and retries — LLMs fail. APIs rate-limit. Network calls time out. Production agent systems need retry logic, fallback models, graceful degradation, and clear error propagation. The frameworks give you primitives; the operational logic is yours to build.

Versioning and rollback — when an agent behavior changes because you updated a prompt or added a tool, how do you roll back? How do you test changes before they affect production users? The frameworks treat agents as code, not as deployable services with versioning semantics.

The maintenance tax

The frameworks move fast. LangChain has had multiple breaking changes between major versions. If your production system is pinned to an old version, you miss security fixes and new model support. If you upgrade, you often discover that APIs changed in ways that break your code.

A team at a B2B SaaS company recently shared their experience: they had three engineers who each understood different parts of their LangChain-based pipeline. When one left, the institutional knowledge of why certain workarounds existed disappeared with them. The codebase had accumulated months of patches for LangChain version incompatibilities, half-implemented retry logic, and a custom approval UI that worked well enough to not get fixed.

This is not an unusual story. It is the median outcome for teams that choose to own their agent infrastructure.

The build-vs-buy calculation

Before choosing to build with a framework, run the honest calculation:

What you're building:

The agent itself (the valuable part — domain logic, prompts, outputs)
The approval workflow
The integration connectors and credential management
The observability stack
The tenant isolation layer
The reliability and retry infrastructure
The versioning and deployment system
The HITL notification system

What you're maintaining indefinitely:

Framework version upgrades
Integration connector updates as APIs change
Prompt engineering as model behaviors shift
Infrastructure that grows as usage grows

For most teams, the non-agent infrastructure accounts for 60–80% of the engineering effort over the first year. The actual business logic — the part that creates value — is a fraction of the work.

The alternative

A purpose-built agent platform handles the infrastructure layer so your team builds only the valuable part.

Ariftly gives you:

Human-in-the-loop approvals built into the protocol — no custom approval UI to build
Pre-built integrations (GitHub, Gmail, Slack, Jira, HubSpot) with managed credentials
Complete observability with event sourcing — every state change logged and replayable
Tenant isolation by default — scoped to your organization
Reliability infrastructure — retries, fallbacks, graceful degradation
Versioned, deployable agents with rollback capability

If your use case is AI Readiness compliance or B2B sales outreach, you can deploy the vertical agent — built, tested, and running in production — in 10 minutes. No framework setup, no infrastructure plumbing, no approval workflow to build.

If your use case is something custom, the Remote Agent Protocol lets you build an agent in any language and register it on the platform. You write the domain logic; the platform handles everything else.

When to use a framework

Frameworks are the right choice when:

You're building something experimental that doesn't need to be reliable or multi-tenant
Your use case is so unusual that no existing agent covers it, and you're willing to invest in the infrastructure
You're a research team that values flexibility over operational stability
You want to contribute to the open-source ecosystem

Frameworks are the wrong choice when:

You need this in production in weeks, not quarters
You need multiple people to approve AI actions before they execute
You're a small team with limited infrastructure bandwidth
Your domain is one of the verticals that a purpose-built agent already covers

The framework vs. platform decision is fundamentally about where you want to spend your engineering time. Building on a framework is a bet that the infrastructure problems are worth solving yourself. Choosing a platform is a bet that the business logic — the thing that actually differentiates you — is where your time is better spent.

For most teams building real-world AI agents in 2026, that bet is the platform.

→ Deploy an agent in 10 minutes → See the full platform

What the frameworks are actually good at​

The production gap​

The maintenance tax​

The build-vs-buy calculation​

The alternative​

When to use a framework​