Data Mesh on AWS SageMaker Unified Studio

Every few months, a client shares their pain points about undiscoverable, non-interoperable, siloed data and how everything becomes a bottleneck, or walks us through their vision for company-wide data access. They want per-team governance and secure AI workloads that draw from every department's assets without violating privacy or compliance. They explain the architecture they imagine, the problems they've encountered, and the politics they're navigating. Then we say: "What you're describing is two words: Data Mesh. Have you heard of it?"

Nine times out of ten, the answer is no. It's the pattern we implement most often for our clients, yet those who need it most have rarely heard the term. This guide is for them, and anyone who suspects their organisation's data problem is an ownership gap, not a technology gap.

Data mesh on AWS SageMaker Unified Studio is architecturally straightforward. AWS has productised the control plane. What makes or breaks it is whether your organisation can adopt federated ownership without creating bureaucracy. This guide covers what data mesh means, how SageMaker Unified Studio implements it, the three architectural decisions that shape everything downstream, and the organisational patterns that determine whether your mesh survives contact with reality.

What data mesh actually means (and why your CDO probably hasn't heard of it)

Data mesh is not a technology. It's not a specific AWS service. It's not a replacement for your data lake, and it's definitely not a reason to restructure your org chart overnight. It's an organisational architecture for data ownership: four principles, operationalised through technology.

Those four principles, as defined by Zhamak Dehghani and conforming to the AWS Well-Architected Data Analytics Lens:

Domain-oriented ownership. Data belongs to the team that produces it, not a centralised "Data" team that attempts to manage everyone's assets.
Data as a product. Each domain publishes discoverable, quality-controlled datasets with defined SLAs. Think of it as an internal API contract, but for data.
Self-serve data platform. Infrastructure that lets domain teams operate autonomously, without filing tickets or waiting on a central bottleneck.
Federated computational governance. Central standards, decentralised execution. The rules come from the top; the implementation lives with the teams.

Winston Churchill said, "We shape our buildings; thereafter they shape us." The same applies to how companies structure their teams. If your company groups people by technical specialisation (a "Data" team handling all company data regardless of ownership), your architecture will reflect that centralisation. It quickly gets out of hand as companies grow. Data mesh inverts this.

The analogy that resonates most with technical audiences is domain-driven development. You understand grouping code by business domain: a Payments module, not "all Python scripts" or "all Stripe-related files." Data mesh applies the same principle to data assets. Amazon's famous two-pizza teams follow the same idea: small, autonomous, cross-functional, outcome-oriented groups that own their domain end-to-end.

Just as application architecture evolved from monoliths to microservices, data teams are modularising their platforms into federated, decentralised solutions. The AWS Well-Architected Framework Data Analytics Lens explicitly draws this parallel.

Why do experienced CDOs miss this? Because the concept requires thinking at a different level of abstraction. It's not about new tools but about who owns what and why. That cognitive shift is difficult, even for people who've worked in data for decades. It's especially hard for data managers closer to hands-on implementation on the management-execution spectrum.

How SageMaker Unified Studio implements data mesh

AWS SageMaker Unified Studio, generally available since March 2025, is AWS's concrete implementation of data mesh principles. It sits atop SageMaker Catalog (the evolution of Amazon DataZone), Lake Formation, the Glue Data Catalog, and Athena. Together, these form the control plane for a federated data architecture.

The hierarchy: domains, domain units, projects

The full structure maps cleanly to organisational reality:

AWS Account → Domain(s) → Domain Unit(s) → Project(s) → Member(s)

A domain represents a major line of business. Most companies need only one. The exceptions are large, diversified enterprises, such as Siemens, where separate domains for energy, consumer electronics, and transport would make sense. When we built the event-driven data pipeline for Siemens Energy, the cross-account architecture naturally mapped to their divisional structure, a pattern data mesh formalises. As a rule of thumb, domains map to the major lines of business of companies that have their hands in many loosely related activities.

The recommended design is a single governance domain that contains no data or domain units and serves only as the mesh's control plane. This is where rules and best practices are enforced. Other data-rich domains are onboarded to participate. Data producers have data owners and engineers. Data consumers have data engineers, report builders, and data scientists. If possible, keep the governance AWS account separate from others. Otherwise, have them side by side in a single AWS account.

Domain units are organisational subdivisions within a domain, such as departments, teams, and capabilities. This is where most of the structure lives.

The rule of thumb: one governance domain, one root business domain with many domain units, many projects per domain unit. Then this structure is repeated in development, UAT, and production accounts.

Projects as the unit of work

A project team is not a permanent team. It's a cross-functional intersection of people from different physical teams, grouped for a particular business goal. Members are either Contributors or Owners, and they come and go as the project needs them.

This is where the "do you want me to hire a data engineer for every department?" pushback dies. Data engineers temporarily join a specific project, configure the necessary machinery, then leave, returning when the owning team needs help. Same people as today. Different levels of abstraction.

Projects include the resources needed to achieve their business goals: a Data Lakehouse for data sources, ETL tools (scripts, notebooks, Airflow orchestration) for processing and migration, and MLflow tracking servers for data science work. Each project selects blueprints that provision these resources, and we strongly recommend setting all blueprints to ONDEMAND rather than ONCREATE (see the cost section below for more details).

A corporate domain uses the Tooling blueprint; a personal IAM domain uses the newer (but less capable) ToolingLite. The differences between these two, and when to use which, are a topic for another post in this series.

The catalog: producer/consumer pattern

The data mesh comes alive through the producer/consumer pattern, which maps directly to the Well-Architected reference architecture:

Producer projects publish data assets to SageMaker Catalog, complete with metadata, glossary terms, quality information, and lineage.
Consumer projects subscribe to those assets and query them via Athena from within Unified Studio.
Lake Formation enforces partition-level access, serving as the governance layer between producers and consumers.

Each layer (producer, governance, consumer) resides in its own AWS account, per Well-Architected guidance. This best practice is not always possible. Many companies, even large ones, have workloads that share the same AWS account instead of being segregated. The most common convention is one AWS account per environment (dev/uat/prd).

Master Data producer projects: our recommendation

We recommend establishing dedicated projects around critical data sources, appending "Master Data" to the name: "CRM Master Data," "Website Analytics Master Data." These are fed by a data source used by multiple consumers through several projects.

A nuance worth clarifying: any project can be both a data producer and a data consumer. Most will be. However, Master Data projects are essential, atomic leaves. Typically, they don't consume. They expose one data source to the mesh catalogue. The Well-Architected Framework recommends setting these producers up as soon as possible, but the reality is more subtle. We encountered exceptions: one client's insurance policy management system and its predecessor were exposed as separate datasets, plus an agglomerated view that retrofitted the old schema into the new. Consumers could query the policy repository as if it had never migrated. The inner plumbing and historical business decisions were obfuscated, facilitating the interfacing.

Only the data owners (the sales department for CRM, the marketing team for their analytics) are permanent members. Data engineers are invited temporarily to configure blueprints and connections, then removed until needed again. These Master Data projects expose cleaned datasets with friendly names, glossaries, metadata, descriptions, lineage, and quality information. They are the core blocks for all downstream analytics, BI, and AI work.

Three decisions that shape everything

Before writing infrastructure code, three decisions determine your implementation intricacy, governance granularity, and monthly bill.

Decision	Option A	Option B	Our recommendation	Reason
AWS accounts strategy	One account per environment (dev/UAT/prod) with all domains & projects	One account per domain (if multi-domain) or project (if uni-domain)	One per domain/project with (dev/uat/prd each)	Mirrors SDLC, respects best-practices of one AWS account per workload + environment combo.
Identity model	SSO via AWS Identity Center	IAM	SSO	Per-user granularity, traceability, fine-grained Lake Formation governance, recommended by AWS.
Network posture	No VPC (default)	VPC-only	Start open unless policy mandates otherwise	VPC-only adds significant complexity; prove value first, tighten later.

1. AWS accounts strategy

The ideal is one AWS account per workload per environment. For a uni-domain mesh, that typically means one account per environment (dev/UAT/prd), with the full domain structure replicated in each. For multi-domain enterprises, each domain (or even each major project) gets its own set of accounts. The principle: isolate blast radius and mirror your SDLC.

Each account has its own root domain. Develop the mesh like an application: build in dev, promote to UAT, deploy to production.

Don't confuse AWS's environment concept (within Unified Studio) with dev/UAT/prod environments. They are different. Each SDLC environment will have the full domain structure replicated.

2. Identity model: SSO vs IAM

SSO via AWS Identity Center is strongly preferred. It provides per-user granularity for governance, traceability, and fine-grained data access control. SSO works where IAM falls short. The SageMaker Unified Studio AWS console will keep prompting you to configure SSO until you do.

The alternative, IAM with federated groups, loses granularity. The smallest access control unit becomes a group, which is too coarse for meaningful data governance. If your company mandates federated groups without Identity Center, you'll sacrifice traceability and per-user auditability.

Here's the political reality: SSO is often one of the hardest things to get clients to support. Their hands are tied by those higher up the chain of command. IAM domains work if IAM users are allowed, but that is often not guaranteed.

We often see confusion where clients conflate IAM/SSO (infrastructure access, who can log in to the AWS console) with Lake Formation (data governance, who can see which rows and columns in tables). These are separate access control planes. Most technologists know IAM well but have not encountered Lake Formation concepts such as row-based access, column-based access, role-based access control (RBAC), or tag-based access control (TBAC). Every implementation starts with a whiteboard session unpacking this distinction.

3. Network posture: VPC-only vs open

VPC-only domains are more secure but greatly increase implementation intricacy. Expect to configure VPC service endpoints, modify security groups, and coordinate with IT for networking changes. In organisations using hub-and-spoke architectures with Transit Gateways, where spoke VPCs have no internet traffic, the complexity expands further.

If VPC-only is required (by policy or regulation), start with it from the beginning. Don't leave it for later. Retrofitting VPC-only onto an existing mesh is much harder than building it in from day one.

For first-time deployments where VPC-only isn't mandated, start with the default open posture, prove the mesh's value, then tighten. But only if that is genuinely an option for your organisation.

A practical defence against future security scrutiny is to add CDK NAG from day one. The overhead is real but provides a head start when (not if) someone asks, "Has this been validated against AWS best practices?"

The operating model: where implementations succeed or fail

Technology accounts for about 30% of a data mesh implementation. The remaining 70% is organisational change management: roles, ownership, and the politics of who controls data.

Why domain teams resist ownership

The most common pushback is: "Do you want me to hire a data engineer for every department?" Managers hear "federated ownership" and immediately imagine headcount requests for every team.

The reframe: a project team can be fluid rather than permanent. A data team (the current "Data Department") is a fixed group of people. A project is a transient intersection of subsets from different teams, assembled for a business goal. Data engineers already exist. We're not hiring new ones. We're grouping the same people differently per project. If a team consistently keeps a technical member busy, as a manager, you have discovered a critical local need, and hiring a dedicated full-time employee (FTE) may be a good idea.

Conway's Law ensures your current org structure produces your current architecture. Data mesh inverts this: group by business outcome, not technical specialisation.

The governance paradox

Central governance defines the standards: naming conventions, quality thresholds, retention policies, and classification rules. Domain teams implement and maintain the contracts for their own data products.

The hardest part isn't technology. It's getting stakeholders who are paranoid about data access to agree on what "sharing" means. To make them realise that it's not all or nothing. The security paradox reappears: the less mature an organisation's governance, the higher the bar they set for the new system.

Data contracts and accountability

Each data product needs a defined contract: schema, quality rules, freshness SLA, and a named owner. The Well-Architected lens specifies that data products must be autonomous, discoverable, secure, and reusable.

The data steward role (per AWS's reference architecture) ensures federated decision-making and metadata auditability. Without contracts and stewardship, a mesh degenerates into a distributed mess, worse than the centralised lake it replaced.

Getting buy-in

Implementation is as much political as engineering. Start with one Master Data producer project. Prove that controlled access works: that the sales team can see their CRM data, the marketing team can query website analytics, and neither can access the other's tables with granular, per-project control.

The "Marie Kondo" argument: before you can run AI workloads at company scale, you need granular, context-specific access to well-governed data. Data mesh isn't a luxury alongside your AI roadmap. It's the prerequisite that makes AI possible.

What it costs (and what catches people off guard)

The mesh infrastructure itself is cheap. The surprise costs come from blueprints you activate without understanding what resources they provision.

The blueprint cost trap

SageMaker Unified Studio setup carries no per-domain cost. Lake Formation, the Glue Catalog, and project management are essentially free at a small scale. The bill comes from the compute resources that blueprints provision. (See SageMaker pricing for current rates.)

The most expensive surprise: the MLFlow tracking server. Activated by an ML-oriented blueprint, it bills in the mid-single-digit thousands per month, even for the smallest compute option, even when nobody is using it. We've seen clients discover this weeks after activation, only to wonder where the bill came from. (Note: AWS announced a serverless MLflow option in late 2025 at no additional charge, but the managed tracking server provisioned by older blueprints still carries this cost.)

Code spaces (the remote JupyterLab and VS Code compute environments wherein you author scripts) are affordable but not free. A t3.medium instance costs roughly $0.60 per 10 hours of use, and the lowest idle timeout before automatic shutdown is 1 hour.

GP3 storage runs about $2 per 15 GB per month. Negligible.

The pattern: Infrastructure (domains, projects, catalog) = essentially free. Compute (endpoints, tracking servers, code spaces) = where the bill grows. Governance (Lake Formation, permissions) = free. Storage = cheap.

Our recommendation: if you don't know up front what functionality your project will need, you can create the "All Capabilities" project profile with all blueprints added and set them all to ONDEMAND, not ONCREATE. If you don't know what a blueprint provision is, don't activate it. Our upcoming articles in this series will explain each blueprint and what it creates.

How we help with costs

As an AWS Partner, we help clients access cost benefits beyond those available through customary compute optimisation, savings that aren't available when working with AWS directly. All clients receive DoIt PartnerOps, a cross-cloud FinOps and compliance platform, at no additional charge. It provides visibility into exactly which blueprint-provisioned resources are driving spend. We've seen clients save the equivalent of our consulting fee through cost-optimisation tooling alone.

The pragmatic path: start small, prove value

Don't boil the ocean. Pick one critical data source, one domain unit, one consumer, and prove the pattern works before asking anyone else to change.

Step 1: Identify your most-requested dataset, the one every project currently obtains via Slack DMs, shared drives, or manual exports. Alternatively, pick a pain point: "We overspend on this existing SAS platform and need to migrate to modern architecture."

Step 2: Create a Master Data producer project, assign the data owner, and configure Lake Formation access via SageMaker Unified Studio rather than directly in the Lake Formation console.

Step 3: Publish to SageMaker Catalog with metadata, glossary terms, and quality rules.

Step 4: Onboard one consumer project. Prove they can discover the data and query it without asking anyone. Connect AWS QuickSight so business stakeholders can chat with the BI dashboard. The reaction is usually immediate.

Step 5: Celebrate the win. Then do the next dataset. If done right, every successful onboarding creates at least one internal evangelist, someone who saw it work and tells their colleagues. These evangelists are critical. Consultants don't benefit from the same trust. Peer advocacy spreads the data mesh philosophy far more effectively than any top-down mandate.

This is exactly what the Well-Architected lens recommends: quick delivery cycles with iteration from lessons learned. The alternative (a big-bang mesh rollout that requires everyone to change simultaneously) dies in governance committee meetings.

For teams that want to automate this process with Infrastructure as Code: we maintain an opinionated open-source L2 construct library for SageMaker Unified Studio, managed with projen. It provisions domains, domain units, projects, and blueprints in a single CDK deployment. We'll publish to NPM and PyPI once stable, so follow our GitHub for updates.

A few practical notes for teams going the IaC route: as your mesh grows, you'll hit CloudFormation's 500-resource limit per stack. Our approach is a separate CDK stage for data mesh resources, one stack per domain, nested stacks for projects. This gives room to scale without refactoring the entire deployment.

Automate your CDK tests from day one. There will be significant code evolution, and you want to minimise the time spent waiting for failed deployments. Those wasted minutes compound.

AWS also provides useful starting points: a CI/CD CLI for SageMaker Unified Studio and an official utilities repository with CloudFormation templates for common patterns.

Conclusion

Data mesh on SageMaker Unified Studio is architecturally straightforward. AWS has productised the control plane: domains, projects, the catalog, and Lake Formation permissions. The technology works. What makes or breaks your implementation is whether your organisation can adopt federated ownership without generating more bureaucracy than it eliminates.

The prerequisite insight hasn't changed: there is no way to run secure AI workloads at company scale without granular, context-specific access to governed data. The pyramid of AI needs applies here: clean, governed, accessible data is the foundation on which everything else rests. Data mesh isn't a side project alongside your AI roadmap. It's what makes corporate AI possible.

If this describes your challenge, a vision for company-wide data access with no clear path from here to there, that's what we help with. As an AWS Partner, we bring certifications, cost benefits, and practitioner experience.

Malik AlimoekhamedovEngineer, techno-optimist, entrepreneur, writer, musician, investor and minimalist. I strive to automate as much of my life as possible. Writing helps me crystallise and manage thoughts. Technology is my bread and butter, as well as a passion. You can safely reach out any time.