The Data Massagist The Data Massagist by Pablo Junco

The Real Magic Behind AI Accuracy Isn’t AI — It’s Your Data

February 3, 2026 · 12 min read
Data Agents MS Fabric MS Purview Newsletter
This content is mirrored from LinkedIn and may contain formatting inconsistencies. For the full experience — including comments and reactions — read the original on LinkedIn.

The Data Massagist
From messy data to measurable outcomes—governed platforms that power agentic AI.

The Real Magic Behind AI Accuracy Isn’t AI — It’s Your Data

Created on 2026-02-01 03:11

Published on 2026-02-03 13:59

Hello—my name is Pablo Junco Boquer

Here is the second edition of The Data Massagist newsletter in which I will explore what sits behind Agentic AI solutions such as Microsoft Copilot and Copilot for Power BI (yes, they’re not the same 😉).

Across every industry, leaders are looking to AI to accelerate their business transformation due to the promise of doing things faster, acting with greater precision, and unlocking insights that without Generate AI was out of reach. Surveys from prestigious research and advisory companies such as Gartner indicates that 86% of leaders believe AI will help maintain or grow revenue in the years ahead, and Ernest and Young (EY) ensure that 97% of senior leaders report already seeing positive ROI on their AI investments.

I understand—and share—the growing excitement about what AI can do with data. Maybe that is why Copilot for Power BI is becoming a familiar companion for business users. Fabric Data Agents are quickly gaining adoption—especially among data engineers—because they are fast to build and easy to reason about. Teams working with Microsoft Fabric Real-Time Intelligence (RTI) are starting to appreciate Operations Agents for monitoring, detecting issues, and recommending actions on streaming data.

And yes — the progress is impressive. But, if we are honest, this is not where the real magic happens.

AI is only as good as the data we prepare for it.

It’s no secret that AI is only as good as the data it’s grounded on. Organizations simply won’t extract real value from AI until their data is trusted, connected, and ready to use. EY’s research reflects this urgency: 69% of leading companies have already modernized their data to support analytics and AI.

And yet—modernization alone is not the finish line. Without modern data, AI strategies remain just that: strategies. To be truly data‑driven, organizations must operate more efficiently, make faster decisions, and stay adaptable to whatever comes next.

In my experience, engineering excellence—regardless of how advanced or elegant the architecture is—is not enough to deliver accurate, reliable AI outcomes. The real differentiator lies in how well we understand, model, and prepare our data so AI systems can reason over it correctly.

What many research and advisory firms don’t emphasize is this: having your data sitting in the latest database engine, a modern warehouse, or the now‑ubiquitous lakehouse isn’t what unlocks AI’s full potential. These technologies are foundational, yes—but insufficient on their own.

To get the best results from AI, we must massage, refine, and shape the data. Because at the end of the day, AI is only as good as the data we prepare for it.

The uncomfortable truth about AI accuracy

I often ask a simple question:

If we, as humans, struggle to understand our own data schemas, why do we expect AI to do better?

When schemas are messy, definitions are unclear, and documentation is outdated, getting 70–80% accuracy from AI is already a success.

In many cases, the problem is not Copilot, not our AI based Assistant, not an AI Agent, or even the LLM you choose — whether that’s OpenAI GPT-5x, DeepSeek AI R1 or Anthropic Claude-4x, etc.) — to meet business needs and budget.

The problem it’s the data foundation we provide. AI quality is, fundamentally, a data quality problem.

So, the real question is not "Which model should we use?" but "How do we make our data understandable?"

Where the real magic actually happens

AI magic doesn’t happen in the prompt. It happens in the preparation.

By now, you’re probably wondering how to get your data ready for AI. There’s no single path, but in my work with large enterprises, I use Microsoft Fabric to unify data workloads and transform raw data into structured, governed, AI-ready knowledge. If you want to go deeper, check out my earlier article, Microsoft Fabric: Redefining the Future of Enterprise Data Intelligence Platforms.”

Yeah! As Principal Solution Engineer for Data Platform, I’m advising customers to use Microsoft Fabric to bring together databases, analytics, and real‑time data—not just to deliver best‑in‑class insights and advanced interactive reports, but to establish a trusted knowledge‑driven data engine that truly enables AI experiences.

These experiences include using AI to accelerate solution development across each of Fabric’s data engines, as well as business‑focused capabilities like Copilot for Power BI and Data Agents (which can even be deployed within Microsoft 365).

And here’s the real sneak peek: the success of any Data Agent or Power BI Copilot experience ultimately depends on having a well‑designed Semantic Modelthe secret ingredient behind a three‑star Michelin‑level data meal.

Semantic models: Our organization's data language

A Semantic Model in Microsoft Fabric is an organizational agreement on what the business means by its data. It defines:

  • Common business terms

  • Relationships and hierarchies

  • Measures that reflect real business logic

  • Governed, consistent definitions

In simple terms, it turns data into something both people and machines can reason about.

In Microsoft Fabric, semantic models can be:

  • Default models (auto-created with Lakehouse/Warehouse before Sep 2025) or

  • Custom models (full modeling control, recommended for enterprise-scale analytics).The impact is profound: Copilot and AI based Agents are no longer guessing across disconnected tables—they are reasoning over a curated, governed model that reflects reality.

For AI based systems, sematic models are essential. Copilot for Power BI and Fabric Data Agents do not reason over raw tables—they rely on the semantic model’s metadata to select fields, generate DAX, and answer business questions accurately.

AI only understands our business as well as our semantic model explains it.

Copilot for PowerBI and the Data Agents uses elements such as table and column descriptions, synonyms, relationships, and data types to better interpret user questions and generate accurate responses. In short: AI learns your business only as well as your semantic model explains it.

You can learn more about multi-agents and the role of Microsoft Fabric in my article titled "Unlocking Business Agility with Multi-Agent AI and Microsoft Fabric’s Data Mirroring".

Storage Modes That Enable AI

For this topic, it's important to know that Microsoft Fabric offers three storage modes—each optimized for different AI scenarios—but all built upon the same semantic model.

1) Direct Lake: Real-Time AI at Scale

Direct Lake allows AI to analyze extremely large or frequently updated datasets directly in OneLake, with no duplication and minimal latency. It is the most powerful mode for AI, giving Data Agents and Copilot “real-time access to data” needed for up-to-date insights .

2) DirectQuery: Always Live

When recency is critical, DirectQuery ensures that AI responses always reflect the latest source system values. Copilot and agents benefit from this direct, real-time connection to operational systems .

3) Import: Designed for Speed

Import mode keeps stable data in-memory, making AI interactions extremely fast—ideal for historical data or static datasets where performance matters more than recency.

The key point: Fabric lets you optimize freshness, performance, or scale without changing your business logic.

Preparing data for Agentic AI

To make a semantic model AI-ready, Fabric includes the dedicated Prep Data for AI feature set. These capabilities configure the model so Copilot and Data Agents can interpret questions correctly, generate high-quality DAX, and provide consistent answers. This is not marketing language—it is a practical set of capabilities that directly influence AI accuracy.

The following image presents a simplified “behind the scenes” view of how AI‑powered experiences in Microsoft Fabric—specifically Copilot—process a user request. In this flow diagram I'm trying to show you the four stages running horizontally across the top: Initial experience, Data used for answer, Copilot action, and Order of execution.

At the starting point, a box labeled Prompt Copilot represents the user prompt. From there, the workflow may split into two parallel paths:

  • One path is Item(s) attached as part of input or during the conversation.

  • The other is Search Results using the associated data sources (your data).

Both converge into two potential AI actions:

  • Get a summary of what was are asked or the thinking process that Copilot or the Agent is following in natural language.

  • Get a data answer based on the data managed by Microsoft Fabric

From the “Get a data answer” branch, you can see how Microsoft Fabric determines the correct execution strategy. Depending on the type of request, Copilot for Power BI or the Data Agent may:

  1. Check the report visuals (only in the case of Copilot for PowerBi)

  2. Look in the semantic model (that is why is so important)

  3. (If required) Create an ad‑hoc DAX or T‑SQL query, triggered particularly when Fabric’s Data Agent is involved.

Let's go deeper into some of the most important elements:

1. Model readiness fundamentals

AI performs best when the model is designed for clarity:

  • Star schema with clear fact and dimension tables

  • Natural language–friendly table, column, and measure names

  • Rich descriptions that explain intent, not just structure

  • Correct row labels and key columns

  • Proper use of Summarize By to prevent misleading aggregations

These may sound like classic BI best practices—and they are. The difference is that AI now amplifies the cost of getting them wrong.

2. Synonyms and natural language alignment

By defining synonyms in the model, we help Copilot and agents map how users speak to how data is structured. This reduces friction, ambiguity, and misinterpretation—especially for business users who do not think in table or column names.

3. Schema selection: simplifying what AI can see

Schema selection allows us to control which columns and measures Copilot should prioritize. This is not about security—it is about relevance.

By simplifying the AI data schema, we:

  • Reduce noise

  • Guide Copilot toward the right metrics

  • Improve consistency and confidence in answers

In practice, this often has a bigger impact on answer quality than tuning prompts.

4. Verified Answers by adding human expertise in the loop

One of the most powerful—and underrated—capabilities in Fabric is Verified Answers.

Verified Answers allow us to:

  • Capture high‑value business questions

  • Attach trusted reports and visuals

  • Define exact and semantic triggers

  • Ensure consistent, approved responses

This is where AI stops being probabilistic and starts being dependable. Over time, Verified Answers become a reusable layer of institutional knowledge that Copilot or the Data Agent can rely on.

The key insight: accuracy improves when humans teach AI what “right” looks like.

5. AI instructions to teach agents how to think about our data

AI instructions let us explicitly guide Copilot and agents on how to interpret business concepts. They provide context, rules, example queries, and guidance that shape how AI should interpret and talk about our data. These instructions can partition large models into domains and improve the precision of generated queries.

Together, these three components form the grounding layer that allows AI to deliver answers aligned with business terminology and expectations.

Note: In Microsoft Fabric, we can provide AI instructions to each Sematic Model and each Data Agent (in this case, to redefine who the Agent will behave).

For me, a well‑written instructions should include the following:

  • Map vague terms to concrete definitions

  • Clarify grouping vs. filtering logic

  • Encode domain knowledge that is rarely obvious from schema alone

  • Reduce back‑and‑forth clarification with users

Yes, this requires upfront effort—but the payoff is fewer surprises and far more consistent results.

Agents don’t replace understanding — they scale it

Once your data is truly AI-ready, agents become transformational—and the possibilities for agentic AI experiences keep expanding. Microsoft, for example, offers a spectrum of solutions that range from less autonomous to highly autonomous.

  • On the left, Power BI agents act as embedded assistants for data exploration and visualization, helping users quickly generate insights and summaries directly within Power BI.

  • Data agents in Fabric are interactive and respond to natural-language queries, with strong configurability for deeper data reasoning.

  • Copilot Studio agents are customizable for business logic and can trigger actions based on events or workflows, enabling process automation.

  • Operations agents in Fabric take it a step further as autonomous agents that monitor data, set goals, and recommend actions with minimal or no prompting.

  • On the far right, Azure AI Foundry agents enable multi-agent orchestration with enterprise grounding, supporting full-scale AI orchestration across complex scenarios.

However, their effectiveness is directly tied to how well data is prepared, modeled, and documented. Autonomous systems without semantic clarity don’t create value. They create confident mistakes.

But again, the effectiveness of these agents is directly tied to how well the data is prepared, modeled, and documented.

Final thought

If AI accuracy is not where we want it to be, don’t start by blaming the model.

Start by asking:

  • Do we truly understand our data?

  • Have we modeled it in a way that reflects business reality?

  • Have we documented and grounded it for AI?

Microsoft Fabric is not just a unified data platform—it is an AI transformation engine. But AI can only deliver trusted, explainable insights when the underlying data is structured, described, and governed through a robust semantic model.

By investing in high-quality semantic modeling, activating Prep Data for AI, and embracing modern storage modes like Direct Lake, organizations create the foundation for reliable Copilot interactions, accurate Data Agent behaviors, and scalable enterprise-wide AI adoption.

In Fabric, AI doesn’t start with a question — it starts with a great semantic model, the place where raw data transforms into intelligence and real magic happens.

By the way, if you are an AI Solution Architect, I do recommend you read an old article but, still relevant: The Role of an IT Architect in the Era of AI

View on LinkedIn ← Back to Articles

Let’s talk!
Let's have cafecito together.

If you’re a Chief Data Officer (CDO), a data leader, or simply someone who believes in the power of preparing data for AI—you’re already a Data Massagist.

Whether you have an idea, a challenge, or just want a fresh perspective, let’s connect. I’m always open to collaborating, learning, and helping others move forward.

You can find me on LinkedIn (feel free to connect and send me a message), or book time with me directly for a virtual coffee (or "cafecito").