Building AI agents that can handle complex business tasks represents a massive opportunity for enterprises—but it also presents an entirely new set of challenges that traditional software development approaches simply can’t address. According to May Habib, CEO and co-founder of Writer, an AI platform that helps enterprises build and deploy AI agents, companies are hitting a “scaling cliff” when they try to expand agent deployments using conventional methods.
Speaking at VB Transform, a major enterprise technology conference, Habib outlined why agents require fundamentally different development, deployment, and maintenance strategies. Her insights come from Writer’s work with more than 350 Fortune 1000 companies, with over half of the Fortune 500 expected to scale agents using Writer’s platform by the end of 2025.
The core challenge? Unlike traditional software that follows predictable, step-by-step processes, AI agents are designed to interpret situations, adapt their approach, and work toward outcomes rather than simply executing predetermined workflows. This fundamental difference creates what Habib calls a “categorically different” technology that demands new approaches to building, testing, and scaling.
Traditional software development relies on deterministic systems—input A produces output B in a predictable, repeatable way. Developers can map out every possible scenario, create specific rules for each situation, and test whether the software performs exactly as designed.
AI agents operate differently. They don’t reliably follow rigid rules because they’re designed to interpret context, make decisions, and adapt their behavior based on real-world conditions. “Agents don’t reliably follow rules,” Habib explained. “They are outcome-driven. They interpret. They adapt. And the behavior really only emerges in real-world environments.”
This non-deterministic nature—where the same input might produce different outputs depending on context—can be “really nightmarish” when trying to scale agents across an organization, particularly without proper frameworks in place. Even though technical teams might be able to spin up individual agents without traditional product managers and designers, Habib emphasizes that a product management mindset remains essential for building, iterating, and maintaining agents effectively.
The stakes are particularly high for IT departments, which often end up responsible for managing these systems. “Unfortunately or fortunately, depending on your perspective, IT is going to be left holding the bag if they don’t lead their business counterparts into that new way of building,” Habib noted.
The key shift involves moving from task-based thinking to outcome-based design. Many companies initially request agents that can “assist legal teams in reviewing contracts” or “help customer service respond to inquiries.” However, these requests are too open-ended and don’t provide clear success metrics.
Instead, effective agent design focuses on specific, measurable outcomes. Rather than building an agent to “review contracts,” companies should design agents to “reduce contract review time by 50%” or “identify compliance risks with 95% accuracy.” This goal-oriented approach provides clearer direction for both the agent’s behavior and the team’s evaluation criteria.
In practice, this means creating business logic blueprints that guide agent decision-making rather than rigid workflows. Teams need to design reasoning loops—systematic approaches to problem-solving that agents can follow while maintaining flexibility. Success requires collaboration between technical teams and subject matter experts who understand the nuances of the business processes being automated.
Despite widespread enthusiasm for AI agents, most companies still build them one at a time because scaling presents unique challenges. The “scaling cliff” emerges when organizations attempt to expand agent deployments faster than they can develop proper governance structures.
Before scaling, companies must answer critical questions: Who owns each agent? Who audits its performance? Who ensures it remains relevant as business needs evolve? Who monitors whether it continues producing desired outcomes? Without clear answers, organizations quickly find themselves managing dozens of agents with inconsistent performance and unclear accountability.
This cliff becomes particularly steep when different departments develop agents independently without coordination. Marketing might build a content generation agent while customer service develops a support agent, each using different approaches, standards, and evaluation methods. The result is a patchwork of AI systems that become increasingly difficult to manage, maintain, and improve.
Traditional software quality assurance relies on objective checklists—does the login function work, does the payment system process transactions correctly, does the search feature return relevant results? These binary tests (pass/fail) work well for deterministic systems.
Agent evaluation requires a completely different approach. Instead of checking whether something broke, teams must assess whether agents “behaved well” in complex, real-world scenarios. This includes evaluating whether fail-safes activated appropriately, whether outcomes aligned with intentions, and whether the agent’s reasoning process was sound.
“The goal here isn’t perfection,” Habib explained. “It is behavioral confidence, because there is a lot of subjectivity here.” This means accepting that agents will sometimes make imperfect decisions while ensuring they operate within acceptable parameters and learn from experience.
Companies that don’t embrace this iterative approach often get stuck in what Habib describes as “a constant game of tennis that just wears down each side until they don’t want to play anymore.” Success requires launching agents with appropriate safeguards, then rapidly iterating based on real-world performance rather than pursuing perfection before deployment.
Despite these challenges, properly implemented agents are already generating significant business value. Habib cited a major bank that worked with Writer to develop an agent-based system for customer onboarding. The system identifies opportunities to introduce customers to multiple product lines during the onboarding process, creating a new upsell pipeline worth $600 million.
This success demonstrates the potential return on investment when companies properly address the unique challenges of agent development and deployment. However, it also illustrates why the scaling cliff matters—without proper frameworks, such successes remain isolated rather than becoming repeatable, scalable business advantages.
Traditional software maintenance follows familiar patterns: when something breaks, developers examine the code, identify the problem, and implement a fix. The relationship between cause and effect is typically clear and traceable.
AI agents require entirely new approaches to maintenance and version control. Because agent behavior emerges from the interaction of multiple components—prompts, model settings, tool configurations, memory systems, and external data sources—tracking what influences performance becomes significantly more complex.
“You can update a large language model prompt and watch the agent behave completely differently even though nothing in the git history actually changed,” Habib explained. Model updates from AI providers, changes to retrieval systems, or modifications to external APIs can all alter agent behavior without any visible changes to the agent’s core configuration.
This creates what Habib describes as “debugging ghosts”—performance issues that seem to emerge from nowhere and prove difficult to trace back to their source. Proper agent maintenance requires tracking not just code changes but also model versions, prompt modifications, tool schema updates, and memory configurations.
Additionally, teams must implement comprehensive execution tracing that captures inputs, outputs, reasoning steps, tool calls, and human interactions. This detailed logging becomes essential for understanding why agents behaved in particular ways and for identifying areas for improvement.
The transition to agent-based systems represents more than a technical shift—it requires organizational changes in how companies approach software development, quality assurance, and system maintenance. Companies that successfully navigate this transition will gain significant competitive advantages, while those that apply traditional approaches to agent development risk hitting the scaling cliff.
Success requires embracing uncertainty while maintaining appropriate controls, focusing on outcomes rather than rigid processes, and developing new frameworks for governance and evaluation. Most importantly, it requires recognizing that agents aren’t just another type of software—they’re a fundamentally different technology that demands new approaches to building, scaling, and maintaining intelligent systems.
For enterprise leaders, the message is clear: the agent economy is arriving quickly, but success requires abandoning familiar software development practices in favor of approaches designed specifically for adaptive, outcome-driven systems. Companies that make this transition thoughtfully and systematically will be best positioned to capture the substantial business value that AI agents can provide.