The Data Massagist The Data Massagist by Pablo Junco

From Blank Canvas to Data Architecture: Designing Greenfield Platforms

February 25, 2026 · 8 min read
Databricks MS Fabric Newsletter
This content is mirrored from LinkedIn and may contain formatting inconsistencies. For the full experience — including comments and reactions — read the original on LinkedIn.

The Data Massagist
From messy data to measurable outcomes—governed platforms that power agentic AI.

From Blank Canvas to Data Architecture: Designing Greenfield Platforms

Created on 2026-02-19 12:06

Published on 2026-02-25 17:30

¡Hola! Hi, and thank you for following my The Data Massagist newsletter. Let’s begin.

Imagine this for a moment.

  • Put yourself in the shoes of Pablo Picasso, standing in front of a blank canvas.

  • Or Diego Rivera, facing an empty wall.

  • Or Frida Kahlo, painting without a brief.

  • Or Fernando Botero, reshaping proportions without permission.

  • Or Antoni Gaudí, designing buildings the world had never seen.

No legacy to preserve. No constraints disguised as “requirements.” No obligation to migrate what already exists.

Just one question: What should exist for the years to come?

This is exactly what a greenfield moment feels like in data architecture.

The Rarity of a True Greenfield

In most data initiatives, our energy is consumed by the weight of the past. We migrate legacy data warehouses, re-implement aging ETL (Extract, Transform & Load) logic, and preserve design decisions made years—sometimes decades—ago. Progress often means adapting, translating, and modernizing what already exists rather than creating something entirely new.

A true greenfield is different.

There is no platform to replicate. No backward compatibility to maintain. No technical debt silently shaping decisions. This is one of the advantage of startups companies versus well-stablished organizations.

And yet, what replaces those constraints is something even more significant: responsibility.

Because when nothing exists, every architectural choice carries greater weight. The data model, the governance approach, the platform design, the operating principles—each decision becomes a foundational layer upon which everything else will stand.

A greenfield is not freedom from constraints. It is the opportunity—and obligation—to design the future correctly from the very beginning.

When Art Becomes Architecture

Great artists do not begin by asking, “How do I migrate what already exists?”

They begin with a different question: “What structure will endure?”

Greenfield data platforms require the same mindset. Without the gravity of legacy constraints, architecture is no longer centered on simply reproducing the past. The focus shifts—fundamentally and intentionally.

From tool selection to system design. From delivery speed to long-term adaptability. From feature parity to architectural clarity.

In this space, success is no longer driven by checklists, migrations, or replication strategies. Instead, design principles take the lead—guiding decisions, shaping foundations, and ensuring that what is built today can evolve, scale, and endure tomorrow.

From Inspiration to Technical Intent

Once the blank canvas exists, technical reality begins to take shape. Greenfield platforms force us to confront a set of fundamental questions:

  • Where does data live by default?

  • How many copies of data are truly acceptable?

  • How can analytics, BI, and AI coexist without creating new silos?

  • How do we enforce governance without slowing innovation?

Modern greenfield architectures are converging on a clear and consistent set of answers.

A single, shared data foundation. Open, analytics-friendly data formats. Multiple compute engines operating on one source of truth. Governance designed in from the start—not retrofitted later.

This convergence is not accidental. It is a direct response to decades of fragmented data stacks, duplicated pipelines, and disconnected platforms.

Customers working with Microsoft are in a unique position. They can begin their greenfield journey on two powerful and proven platforms: Microsoft Fabric and Azure Databricks . Both have strong track records of continuous innovation, and both are consistently recognized as leaders by industry analysts such as Gartner and Forrester .

Importantly, these platforms are not mutually exclusive—they can work together when needed, enabling organizations to combine strengths, avoid lock-in, and design architectures that remain adaptable as technology and business needs evolve.

The Shift to Lakehouse‑First Design

In greenfield scenarios, the traditional “DW‑first” approach breaks down quickly. Instead, architects increasingly design:

  • A lakehouse‑first foundation

  • With raw, curated, and business‑ready layers

  • Supporting SQL, Spark, BI, and AI on top of the same data

The lakehouse is no longer just storage—it becomes the structural backbone of the platform.

Everything else is an interface.

Three Greenfield Architectural Paths

A true greenfield is not a migration. Not a modernization. Not a like-for-like replacement. It is a deliberate design moment where nothing must be preserved, and no past decision is protected by obligation.

The objective is not to choose a tool. The objective is to design an architecture that can evolve, scale, and endure.

Option 1) Azure Databricks + Power BI

This architecture represents a mature and proven path for organizations prioritizing advanced analytics, data science, and AI from day one.

Strategic profile

  • Optimized for deep analytics and experimentation

  • Ideal when data science and predictive modeling drive competitive advantage

  • Emphasizes flexibility and depth over simplicity

Technical shape

  • Lakehouse on ADLS using open formats such as Delta

  • Unified engineering, streaming, ML, and AI in Spark-based compute

  • Power BI connected via SQL endpoints for analytics and visualization

  • Strong separation of compute and storage for elasticity and cost control

Design consideration

Governance and semantics span multiple layers, requiring strong platform engineering discipline to avoid fragmentation over time.

Architecture

In the Databrick's Architecture Center, you can find Databrick's recommend end-to-end architecture with Azure Databricks which provides a scalable, secure foundation for analytics, AI and real-time insights across both batch and streaming data.

Data Intelligence End-To-End Architecture with Azure Databricks

Option 2) Microsoft Fabric

Choosing Fabric alone is a clear architectural statement: simplicity, speed, and integration matter more than modular complexity.

Strategic profile

  • End-to-end analytics in a single SaaS platform

  • Faster time-to-value and reduced operational overhead

  • Enables analytics to become ubiquitous, not specialized

Technical shape

  • Unified data foundation (OneLake) eliminating redundant copies

  • Shared data across engineering, analytics, and BI workloads

  • Integrated governance and lineage across the platform

Design consideration

Fabric introduces intentional abstraction—optimized for simplicity and broad adoption rather than extreme customization. For many greenfield scenarios, this is a strength, not a limitation.

Architecture

I always go first to the Azure Architecture Center to see Microsoft recommended reference architecture. In this case, we can find the Analytics end-to-end with Microsoft Fabric which is a solution in that combines a range of Microsoft services that ingest, store, process, enrich, and serve data and insights from different sources. These sources include structured, semistructured, unstructured, and streaming formats.

To give you another perspective, here is my own version. The only important element I'm missing here is the Data Governance module of Microsoft Purview as; I'm planning to cover this in the next edition of this newsletter.

Architecture Reference for an End-To-End Modern Advanced Analytics with Microsoft Fabric

Option 3) Azure Databricks + Microsoft Fabric Together

An architecture with full Azure Databricks or full Microsoft Fabric is not a compromise—it is about optionality. More importantly, there is no need to choose only one platform. Forcing that decision would take us back to the old days of rigid, monolithic data stacks. Modern data platforms are built to coexist.

Azure Databricks and Microsoft Fabric are uniquely interoperable by design, sharing an open Delta Lake foundation and a clear separation of storage and compute. This enables both platforms to operate over the same data estate, each optimized for different workloads and personas.

Interoperability is enabled through three core patterns:

  1. Direct read and write of Delta tables in OneLake using ADLS‑compatible endpoints, treating OneLake as native storage.

  2. Mirroring Databricks‑managed data into OneLake, so Fabric and Databricks work on the same datasets without duplication.

  3. Query federation via Unity Catalog, allowing Databricks to read OneLake data with no data copy (read‑only, still in "Beta").

Together, these capabilities allow Azure Databricks to remain the execution engine for advanced data engineering and AI, while Microsoft Fabric becomes the engagement layer for analytics, BI, and Copilot‑driven experiences—all over a unified data foundation.

The result is freedom of choice without penalty: one data layer, multiple engines, and an architecture designed for interoperability—not lock‑in.

Architecture

Conclusion — Build What Can Evolve

Greenfield architecture is not about predicting the future, but about designing in a way that the future does not break what you build today. The canvas remains blank by intention, not by chance. The objective was never to move fast just to fill space, but to create a foundation strong enough to grow, adapt, and scale without constant reinvention. When architecture is designed correctly from the beginning, change becomes an extension—not a disruption.

Technology will continue to evolve, tools will improve, and platforms will inevitably transform. What endures is a clear architectural structure grounded in a shared data foundation, open formats, thoughtful governance, and flexible compute. This is the real responsibility of a greenfield moment: not chasing speed, not optimizing for tools, but building a structure that can evolve with clarity and stability over time. In the end, tools will change, but architecture endures.

View on LinkedIn ← Back to Articles

Let’s talk!
Let's have cafecito together.

If you’re a Chief Data Officer (CDO), a data leader, or simply someone who believes in the power of preparing data for AI—you’re already a Data Massagist.

Whether you have an idea, a challenge, or just want a fresh perspective, let’s connect. I’m always open to collaborating, learning, and helping others move forward.

You can find me on LinkedIn (feel free to connect and send me a message), or book time with me directly for a virtual coffee (or "cafecito").