Ayush Gupta - AI Agent Engineer

We have seen this before. The internet reinvents itself roughly every decade, and each time, the shift rewrites the rules for everyone building on top of it. Static HTML gave way to dynamic web applications. Monoliths decomposed into microservices. REST APIs became the lingua franca of distributed systems. Cloud computing made infrastructure someone else's problem. Each wave did not replace what came before -- it absorbed it, extended it, and changed what "building software" meant.

We are in the middle of another one of those shifts right now. And this time, the change is not about a new runtime, a new protocol, or a new deployment model. It is about agency. Software is no longer just responding to requests. It is reasoning, planning, taking actions, and coordinating across tools -- autonomously. The patterns we built the modern internet on are not going away, but they are evolving into something fundamentally different.

01 Observability: From Logs to Reasoning Traces

Observability used to mean Grafana dashboards, Loki logs, and distributed tracing with Jaeger. The model was straightforward: a request enters the system, passes through services, and you inspect the artifacts after the fact.

Agentic systems break this. An agent does not follow a predetermined code path -- it reasons about which tools to use, retries with different strategies, and sometimes abandons an approach entirely. The "trace" is no longer a linear sequence of service calls. It is a tree of reasoning -- branching, backtracking, and self-correcting.

This means observability now needs to capture things that never existed before:

LLM call traces -- prompts, responses, token counts, latency, model version
Tool invocation chains -- which tools, in what order, with what inputs and outputs
Reasoning chains -- why the agent made a decision, especially when it chose not to do something
Memory reads and writes -- what context was pulled and what was stored
Cost attribution -- tokens consumed per task, per agent, per workflow

Platforms like LangSmith, Langfuse, Arize Phoenix, and Helicone are building exactly this. They are not replacing Grafana -- they are extending observability into territory it was never designed for. Meanwhile, traditional vendors are adding LLM tracing. The two worlds are colliding.

The deeper shift

In traditional software, observability is mostly retrospective -- you look at traces after something breaks. In agentic systems, observability becomes part of the runtime. Agents can inspect their own traces, realize they are stuck in a loop, and course-correct. Observability is no longer just for humans debugging production. It is for agents debugging themselves.

02 API Gateways: From Service Routing to Agent Orchestration

API gateways used to route HTTP requests to the right backend service. Deterministic, path-based, configured with rules. In the agentic world, routing means something different: a simple classification goes to a small, cheap model; a complex reasoning task goes to a frontier model; a coding task goes to a code-specialized model. Routing is no longer based on URL paths -- it is based on intent, cost, latency, and capability matching.

Traditional Gateway

Deterministic path-based routing to known backend services.

Routes: /api/users -> user-service

Auth: JWT validation at the edge

Limits: Fixed rate per API key

Agentic Gateway

Intent-aware routing to models, agents, and workflows.

Routes: task complexity -> model selection

Auth: Agent identity + capability verification

Limits: Token budgets + cost-aware throttling

But it goes further than model routing. Modern agentic gateways are also becoming workflow routers. A single user request -- "book me a flight and hotel in Tokyo for next week" -- might need to be decomposed into subtasks and routed to different specialized agents: a flight search agent, a hotel booking agent, a calendar agent, and a budget optimization agent. The gateway is no longer just forwarding requests. It is orchestrating multi-agent workflows.

LLM routing is already real

This is not theoretical. Intelligent model routing -- analyzing a query and sending it to the best model for the job -- is a solved problem today. A coding task routes to DeepSeek. A complex reasoning task routes to Claude. A long-context analysis routes to Gemini. A simple classification routes to a small, cheap model. The user sends one request and the router decides.

OpenRouter already does this at scale with its auto model -- one API key, 200+ models, automatic selection based on the query. LiteLLM (33k+ stars on GitHub) gives you the same thing self-hosted: a unified OpenAI-compatible proxy that sits in front of every provider and lets you define routing rules based on cost, latency, and fallback chains. RouteLLM from LMSYS/Berkeley goes further -- it trains classifiers on human preference data from Chatbot Arena to predict which model handles a query best, claiming 85% cost reduction while maintaining 95% of GPT-4 quality.

These are not future concepts. They are production infrastructure running today. The API gateway of the agentic era is making intelligent, cost-aware decisions about which model, which agent, and which workflow handles every request.

03 Testing: From Unit Tests to Agent Evaluations

Unit tests still matter. But there is a new category of testing that is fundamentally harder: agentic systems are non-deterministic. Give the same agent the same task twice and it might take a different path, use different tools, and arrive at the answer through completely different means. Traditional assertions -- "output must equal expected" -- break when the output is natural language or a multi-step plan.

The new testing stack

LLM-as-judge evaluations -- one model grades another on correctness, relevance, and safety. Sounds circular, works surprisingly well when the judge is more capable.
Human-in-the-loop evaluation -- domain experts reviewing agent outputs for high-stakes decisions. Does not scale, but provides ground truth that automated evals calibrate against.
Regression benchmarks -- curated task-output pairs. Run your agent against them after every change, track performance over time.
Behavioral testing -- does the agent ask for clarification when ambiguous? Does it refuse destructive actions without confirmation?
Adversarial testing -- prompt injection, contradictory constraints, edge cases. Break the agent on purpose to verify robustness.

The mindset shift: in traditional testing, you verify code does what it should. In agent evaluation, you verify that behavior is acceptable across a distribution of possible executions. Closer to QA for a human employee than QA for a codebase.

The feedback loop matters

The most effective teams are building continuous evaluation pipelines where agent performance is tracked like an ML metric -- with dashboards, alerts on regression, and A/B testing of different agent configurations. Testing is not a gate before deployment. It is a continuous process running alongside production.

04 Scalability: From Request Throughput to Workflow Orchestration

Scalability used to mean: how many requests per second can this handle? Autoscalers, load balancers, connection pools. Agentic systems redefine the question: how many autonomous workflows can an agent reliably orchestrate without losing coherence? A single task might involve dozens of LLM calls, multiple tool invocations, memory reads, and coordination with other agents. The "request" is no longer sub-millisecond -- it is a multi-minute workflow that consumes real money in API costs.

Dimension	Traditional Scalability	Agentic Scalability
Primary metric	Requests per second	Concurrent workflows orchestrated
Bottleneck	CPU, memory, I/O	LLM rate limits, token budgets, reasoning depth
Cost model	Infrastructure (compute hours)	API tokens consumed per task
Failure mode	Timeout, 5xx errors	Reasoning loops, hallucination, context overflow
Scaling mechanism	Horizontal pod autoscaling	Model cascading, parallelized sub-agents, task decomposition
Latency	Milliseconds (P99 target)	Seconds to minutes (acceptable for complex tasks)

New primitives are emerging: model cascading (cheap models first, escalate when needed), parallel sub-agents (decompose and execute concurrently), and speculative execution (start multiple approaches, use the first that succeeds). And the economics are different -- a poorly designed agent workflow can cost 10x more than an optimized one even at identical infrastructure load. Cost optimization is now an architectural concern, not just an infrastructure one.

05 Authentication: From User Sessions to Agent Identity

Traditional auth assumes a human on one end -- JWTs, OAuth, sessions. Agents shatter this. When an agent calls an API, who is making the request? The user who triggered the workflow? The agent itself? The platform? The developer? The answer is "all of the above," and traditional auth has no clean way to represent this.

What is taking shape is a layered identity system:

Agent identity -- cryptographic identity independent of any user. Registered, revoked, audited. Service accounts on steroids.
Delegated authority -- scoped permissions far more granular than OAuth: "read my emails but not delete them, draft replies but not send without approval."
Trust levels -- first-party agents get different capabilities than third-party agents from unknown developers.
Capability-based access -- not "has the admin role" but "can call this endpoint with these parameters."
Audit trails -- every action logged with full provenance: who authorized it, what triggered it, what the reasoning was.

This is not hypothetical. Google's Agent Registration protocol and emerging standards around verifiable agent credentials are being built right now. Agent identity will be as fundamental as SSL certificates were to the early web.

06 Service Discovery: From DNS to Agent Discovery

DNS solved service discovery: given a name, find the address. In the agentic world, the problem is harder. An agent needs to find the right agent for a specific capability -- and "right" is multidimensional: skill match, trustworthiness, cost, availability, protocol compatibility.

What agent discovery needs to solve

Capability advertisement -- agents publish what they can do (tools, skills, domains) in a machine-readable format
Semantic matching -- finding agents not by exact name, but by what they can accomplish ("I need an agent that can analyze financial documents")
Trust verification -- confirming that an agent is who it claims to be and is authorized to operate in a given context
Version negotiation -- agents evolve, and consumers need to handle capability changes gracefully
Real-time availability -- knowing whether an agent is currently operational, under load, or degraded

Several YC-backed startups are building agent registries -- searchable directories where agents publish capabilities and others discover them through semantic queries. Think DNS meets package registry meets marketplace, but the "users" doing the discovering are not humans. They are other agents.

07 Databases: From Application Data to Agent Memory

An agent does not just need to store data -- it needs to remember. Remembering is fundamentally different from storage: deciding what is worth keeping, connecting information across time, updating beliefs when new data contradicts old data, and retrieving the right context at the right moment. What is emerging is a layered memory architecture:

Short-term memory

Conversation context and working state. Equivalent to the context window. Fast, ephemeral, limited by token budgets.

Episodic memory

Records of past interactions, decisions, and outcomes. Enables learning from experience and avoiding repeated mistakes.

Semantic memory

Structured knowledge -- facts, relationships, domain expertise. Stored in vector databases and knowledge graphs for retrieval.

Vector databases (Pinecone, Weaviate, Qdrant) store embeddings and enable similarity search -- searching by meaning, not exact match. Knowledge graphs add structure: entities, relationships, temporal context. The best systems combine both -- vectors for broad semantic recall, graphs for structured reasoning. Teams building these are solving problems database engineers never faced: memory consolidation, contradiction resolution, forgetting strategies, and relevance decay over time.

08 Controllers: From Explicit Code to Dynamic Execution

In traditional software, every execution path exists in the codebase. Agentic systems flip this: you define boundaries -- tools available, constraints to follow, goals to pursue -- and the agent figures out execution dynamically. The developer's role shifts from writing code that does things to defining the space of things that can be done.

Traditional development: "Here is exactly what to do, step by step."
Agentic development: "Here is what you can do, here is what you cannot do, here is what success looks like. Figure it out."

Concretely: instead of REST endpoints, you expose tool definitions that agents interpret at runtime. Instead of hardcoded business rules, constraints and guardrails. Instead of workflow DAGs, agents that compose workflows on the fly. MCP (Model Context Protocol) is standardizing this -- a tool provider advertises capabilities, and agents integrate them dynamically. The "client" is not a developer writing code. It is an autonomous agent deciding in real time what to call and why.

09 The New Primitives: Protocols, SDKs, and Frameworks

Every platform shift produces new foundational tools. The agentic shift is producing its own, and they are maturing fast.

Protocols

MCP (Model Context Protocol) -- Anthropic's open standard for connecting agents to tools and data sources. Already supported across major platforms. Think of it as the USB of the agent world -- a standard interface that lets any agent connect to any tool.
A2A (Agent-to-Agent) -- Google's protocol for agent interoperability. Defines how agents discover, authenticate, and communicate with each other. The HTTP of agent-to-agent communication.

SDKs and frameworks

Vercel AI SDK -- streamlines building AI-powered applications with framework-agnostic streaming, tool calling, and structured output support
OpenAI Agents SDK -- production framework for multi-agent orchestration with built-in handoffs, guardrails, and tracing
Claude Agent SDK -- Anthropic's toolkit for building agents that use Claude's extended thinking and tool use capabilities
LangGraph -- state machine-based framework for building complex, multi-step agent workflows with checkpointing and human-in-the-loop support
CrewAI -- multi-agent framework focused on role-based collaboration between specialized agents
AutoGen -- Microsoft's framework for building conversational multi-agent systems

The ecosystem feels like JavaScript frameworks circa 2015 -- chaotic, competitive, moving fast. But the underlying patterns are stabilizing: tool integration, multi-agent coordination, memory management, evaluation, and guardrails. These will persist even as specific frameworks come and go.

10 What Agents Actually Are Now

Agents are no longer chatbots. They are goal-oriented systems that can:

Reason and plan

Decompose complex goals into subtasks, sequence them intelligently, and adapt the plan when reality does not match expectations.

Take actions

Call APIs, run code, interact with databases, browse the web, send messages -- not just generate text about doing these things, but actually do them.

Use memory

Maintain context across sessions, learn from past interactions, and build up domain knowledge over time. Each interaction makes the agent more capable.

Coordinate with other agents

Delegate subtasks, share context, negotiate resources, and collaborate on problems that exceed any single agent's capability.

Schedule and persist

Set up recurring workflows, monitor conditions, trigger actions based on events, and operate without continuous human oversight.

Self-correct

Detect when something went wrong, diagnose the issue, and retry with a different approach. The best agents fail gracefully and learn from failures.

Not functions, not microservices, not workflows. A new category: autonomous software entities that operate with delegated authority and adaptive behavior.

11 The Bigger Picture

Step back and look at the full pattern. Every major layer of the software stack is growing an agentic counterpart:

Layer	Traditional Internet	Agentic Internet
Observability	Grafana, Loki, Jaeger	LLM tracing, reasoning chains, tool usage logs
Routing	API gateways, load balancers	Model routers, workflow orchestrators
Testing	Unit tests, integration tests	LLM evals, behavioral testing, human-in-the-loop
Scalability	Requests per second	Concurrent autonomous workflows
Authentication	JWTs, OAuth, sessions	Agent identity, trust levels, capability-based access
Discovery	DNS, service registries	Agent registries, capability search
Storage	PostgreSQL, Redis, S3	Vector databases, knowledge graphs, memory systems
Logic	Controllers, APIs, workflows	Tool definitions, constraints, dynamic execution
Data	ETL pipelines, dashboards	Autonomous analysis, intelligent monitoring

The traditional stack is not going away -- agents still run on servers, call APIs over HTTP, store data in databases. The existing internet is the substrate. What is changing is where human attention goes. It is shifting to agent design: defining tools, writing system prompts, building evaluation suites, designing memory architectures, and establishing trust boundaries.

The internet is not being replaced. It is being promoted. The network layer became invisible when we started thinking in HTTP. HTTP became invisible when we started thinking in REST APIs. REST APIs are becoming invisible as we start thinking in agent capabilities. Each layer of abstraction makes the previous one disappear into the foundation.

The shift in one sentence

The internet is evolving from a network of applications that humans operate to a network of autonomous systems that collaborate with each other -- and with humans -- to get things done.

The patterns are everywhere. The infrastructure is being built. The standards are being written. The internet is changing. Again. And this time, the software is not just serving requests. It is thinking.