Four months. $3 million in business value. Zero training sessions before launch.
Those numbers belong to Jean-Paul (not a human executive, but an AI agent built and deployed internally by SnapLogic). If you haven't heard of it, that's understandable. The story hasn't made the mainstream business press rounds the way flashier AI announcements tend to. But for anyone trying to figure out whether enterprise AI agents actually deliver ROI, the Jean-Paul case study is one of the most instructive data points available right now.
Most people read that headline and focus on the number. The number is the wrong thing to focus on. The architecture is.
The AI Pilot Graveyard Is Full of Siloed Deployments
Most enterprise AI pilots make the same mistake.
A company identifies a high-volume workflow. They pick one system (usually their CRM, their ticketing platform, or their document repository) and layer an AI assistant on top of it. The pilot shows promise in a controlled environment. The team celebrates. Then the initiative hits production, and something deflates.
The reason is almost always the same: the agent can only see a fraction of the information it needs to actually finish a task.
An AI assistant trained on Salesforce data can tell you what's in Salesforce. But a real business question like "What's the customer's full history, what issues have they opened recently, what's their contract value, and is there upsell potential here?" spans Salesforce, Zendesk, your billing system, your analytics warehouse, and probably a few spreadsheets someone emailed around last quarter. An agent that can only query one system returns a partial answer. Partial answers still require a human to synthesize the rest. That's not automation. It's assisted search.
MIT research cited across the industry found that roughly 95% of enterprise generative AI pilots fail to deliver measurable P&L impact. Model quality barely appeared on the list of culprits. Data silos and integration failures were the real problems. Gartner has flagged the same issue, estimating that through 2025, at least 30% of GenAI projects would be abandoned after proof of concept due to poor data and unclear business value.
The model isn't the bottleneck. The data cage is.
What Jean-Paul Actually Does
SnapLogic built Jean-Paul on its own Agentic Integration Platform, connected via its MCP server to every major system the company actually runs on: Salesforce, Zendesk, BigQuery, Box, ZoomInfo, Chorus, Jira, Loopio, and Saleshood. Employees access it through Slack, Microsoft Teams, email, or API. The key phrase in the case study is this one:
"It returns a finished answer in minutes, not a search result to work from."
A search result offloads synthesis to the human. A finished answer means the agent pulled context from every relevant system, reasoned across it, and handed back something usable: a production-quality document, a competitive summary, a customer health report. No analyst handoff. Task done.
The results over four months were documented via platform audit logs across 17 departments:
- 2,141 hours saved in a single 30-day period, equivalent to roughly 12.5 additional FTEs across Sales, Customer Success, Marketing, Engineering, Finance, HR, and Professional Services. This is drawn from actual platform audit logs, not a projection.
- 1,630 requests handled, producing 281 production-quality documents
- $380-540K per year in cost avoidance: analytics tooling and consultant engagements were eliminated because Jean-Paul now generates reports directly from live data
- 1-3 days to deploy, against an industry average of 8 months for comparable enterprise AI implementations
One-to-three days for deployment is the outlier number here. Jean-Paul runs on top of SnapLogic's existing integration layer, so all those system connections were already live. There was no IT project to scope. The first production requests came in before the rollout was formally announced.
The Real Differentiator: What the Agent Can See
Most enterprise AI teams treat integration as a technical prerequisite — something IT handles before the real work begins. That's backwards.
For an AI agent, connectivity to live operational data is the core capability. The model is the reasoning engine; the integrations are what give it something real to reason about. Swap out the underlying model and Jean-Paul still works, because the systems it connects to haven't changed. Pull one of those system connections and every answer that touches that data gets worse. Immediately.
Evaluate AI agent investments on data connectivity first, model quality second. That order matters. The enterprise with a well-connected mid-tier model will outperform the one with a frontier model isolated inside a single system.
A 2026 analysis of enterprise AI agent platforms found exactly this pattern: "Enterprise teams consistently report agent projects fail because of data quality gaps, unclear ownership of edge cases, and governance infrastructure that was never built." The model itself barely came up as a root cause.
Martin Management Group: A Car Dealership Makes the Same Point
SnapLogic is a technology company, so their own internal deployment is only so convincing. Martin Management Group is a car dealership.
Martin is an Automotive News Top 150 dealership group running 16 locations nationwide. Their challenge was straightforward: inbound call volume was straining their Business Development Center staff, appointments were being missed, and revenue was slipping through the cracks. They deployed Toma's voice AI agents to handle inbound service calls.
In the first 90 days: more than 22,000 calls handled end-to-end without human intervention, over 9,000 service appointments booked generating $2 million in revenue, and a 40% reduction in BDC workload.
Toma's agents weren't just answering calls. They were integrated with the service scheduler so they could actually book appointments, not just collect information and hand it off. The agent had access to the right system at the moment it needed it. That's the difference between an agent that completes a task and one that starts it.
Think about what happens without that integration. A voice agent connected only to the phone system can transcribe calls perfectly and detect intent accurately, but still generates zero appointments, because scheduling requires access to the scheduling system. The model quality is irrelevant. Integration closes that loop.
Three Questions Every Executive Sponsor Should Be Asking
If you're evaluating an AI agent proposal (or trying to diagnose why a current deployment isn't delivering), these questions will tell you more than any benchmark score or vendor demo.
How many systems does it connect to natively, and which ones?
Not "can it connect to" — native, live, production connections today. An agent with one or two integrations is a workflow tool for one or two workflows. Map the ten most common research tasks your team performs and count how many systems each one touches. That's your minimum integration requirement, and most vendors can't meet it without months of custom integration work.
What does a "finished answer" look like without human routing?
Push any vendor to define this concretely. A finished answer doesn't require an employee to open three more tabs and compile the result themselves. Jean-Paul's benchmark — production-quality documents, competitive analyses, customer summaries pulled simultaneously from nine systems — is worth treating as the standard. Anything less means replacing full-day tasks with half-day tasks, which is a much harder ROI story to tell the board.
Is the data actually accessible to the agent at inference time?
This question exposes the most hidden implementation gaps. Data that exists in a system is not automatically data an agent can use. Security policies, API rate limits, data freshness issues, and permissions structures can silently degrade what an agent actually sees when it's answering a question in production. The gap between "we have all this data" and "the agent can access all this data" is often significant, and you won't discover it in a demo.
Require a live demonstration against real or realistic data, not a curated sandbox. Ask specifically how the agent handles a question that requires pulling from three different systems simultaneously. Watch what it actually returns.
Why Mid-Market Companies Have an Advantage Here
Large enterprises often run multi-year AI programs with dedicated ML engineering teams and sprawling data platform projects. Mid-market companies don't have that runway, which means every AI dollar has to be justified faster and more clearly. That pressure is an asset.
Mid-market companies typically carry less legacy technical debt, operate smaller SaaS stacks, and can move without a six-month approval process. The Jean-Paul deployment (live in 1-3 days because the integration layer already existed) is a realistic model for organizations that already run modern SaaS and need results quickly. The prerequisite isn't a massive data platform rebuild. It's having the integration layer in place, or choosing an AI platform that brings it along.
Martin Management Group makes the same point from a different angle. This is a car dealership group, not a technology company. They didn't need to understand agentic AI architecture. They needed an agent that connected to their scheduler and picked up the phone. The technical complexity under the hood doesn't matter as long as the integration works.
The Due Diligence Question Most Sponsors Aren't Asking
The $3M from the Jean-Paul story is the kind of figure that gets AI initiatives approved for next quarter. But the organizations that will actually replicate that result aren't the ones chasing the number. They're the ones asking how many systems the agent connects to before they sign anything.
Jean-Paul recovered 2,141 hours in a single month. Toma booked $2M in service appointments in 90 days. Both outcomes were made possible not by better models, but by agents that could see everything they needed to see. That's the capability worth buying.
Before the next proposal lands on your desk, ask whether the agent returns a finished answer or a starting point. The vendors who can demonstrate the difference clearly are rarer than the slide decks suggest.