The Immediate Risk of AI No One Is Talking About

Scott Willson

April 22, 2025

This is some text inside of a div block.

minute read

By now, most discussions about AI fall into one of two buckets: the utopian vision of infinite productivity or the dystopian fear of Skynet's birth. But what if the greatest risk isn't some distant sci-fi threat but a very real, very present challenge—one that hits on "day two" after your AI goes live?

Let's talk about Day Two AI Risks—specifically, what happens when your AI agent changes its behavior without your knowledge due to AI drift.

What Is AI Drift?

AI drift occurs when an AI system's behavior shifts unexpectedly, often due to updates in the underlying large language model (LLM), changes in training data, or evolving user inputs. Unlike traditional software bugs, these shifts can be subtle, invisible, and cascading, making them hard to detect until something goes wrong. The most immediate threat? Unannounced LLM updates that alter how your AI "thinks."

The Invisible Update That Changes Everything

AI-powered workflows rely on LLMs to interpret prompts, make decisions, and drive automation. Whether using ServiceNow's LLM or opting for the BYO and using OpenAI, Anthropic, or Microsoft, the model behind the scenes is a moving target. AI vendors frequently release updates to improve accuracy, reduce hallucinations, or align with new safety guidelines. Sounds great—until those updates disrupt your trusted autonomous system.

Imagine this: you've built an autonomous workflow that screens requests, writes responses, and triggers downstream actions. It's tested and reliable. Then, one morning, the LLM gets updated without warning. What once interpreted "expedite this approval" as a nudge to prioritize now sees it as an emergency requiring escalation to an executive. That slight shift cascades through your system—triggering alerts, approvals, or changes you didn't intend.

We experienced something similar firsthand when developing our AI agent. The LLM on our development instance differed from the one in pre-production, causing a benign but unexpected interpretation issue. A prompt that worked perfectly in development produced less-than-desired output in pre-production. We had to tweak the prompt to restore the intended behavior. It was a minor hiccup, but it made me think: what if this had happened in a live, mission-critical system?

The Domino Effect of Personality Drift

This kind of personality drift can wreak havoc on autonomous workflows. The orchestration layer coordinating your Agentic AI relies on consistent prompt interpretation. A small change in how the LLM processes a prompt can cause it to:

Pull or push different data
Choose incorrect downstream actions
Require new inputs or make faulty assumptions
Create unexpected artifacts, logs, or outcomes

It's like the AI version of the children's game Telephone: a subtle misinterpretation at the start of a workflow ripples through each step, compounding into dramatically unintended results. In our case, the fix was simple, but the consequences could be far worse in a complex system. Think of Knight Capital in 2012, when errant code pushed to production triggered a $440 million loss in 45 minutes, bankrupting the firm. If a single, non-AI code update can cause such chaos, imagine what an unexpected LLM update could do to an AI-driven workflow.

Why It Matters to Everyone

AI drift isn't just a technical issue—it's a business risk. For IT leaders, it threatens operational stability. It could mean financial losses or reputational damage for executives if customer-facing systems misbehave. It could lead to frustrating or incorrect outcomes for end-users, like a chatbot escalating a routine query to a C-suite executive. Drift could trigger compliance violations or financial loss in regulated industries like finance or healthcare.

Some might argue that LLM providers already test updates rigorously or that businesses can lock models to avoid changes. But testing can't catch every edge case, and version-locking isn't a full solution—it limits access to improvements and may not account for drift from other sources, like evolving user inputs or training data shifts.

Agentic AI Needs Observability and Governance

Like infrastructure and applications, AI agents and autonomous workflows need observability to track what's happening under the hood. We must:

Know which LLM version is running
Implement change controls for LLM updates
Log past outputs and outcomes to monitor decision quality and variability

More importantly, governance must evolve to include AI state awareness. AI agents are prediction machines whose behavior may change without a single line of your software changing. Even a minor update can erode trust, sabotage automation, and increase operational risk without proper oversight.

AI Drift Management

As companies adopt Agentic AI, we'll need new tools to manage drift, including:

Version-pinning of LLMs in production workflows
Interpretable diffing to compare AI responses before and after updates
Audit trails that capture prompts, responses, and outcomes
Automated monitoring suites to simulate and catch unexpected behavior

These aren't nice-to-haves—they're essential to prevent your autonomous workflows from misbehaving on an unexpected Tuesday morning when an LLM provider pushes an update.

Conclusion

The AI risk that should keep you up at night isn't Armageddon. It's the silent shift in your AI's behavior that goes unnoticed until it disrupts your business. Just as Knight Capital learned the hard way with errant code, we must treat AI drift as a critical risk to manage today, not tomorrow.

Have you encountered unexpected AI behavior in your workflows? How did you handle it? Share your thoughts—I'd love to hear your experiences.