A Free Discussion on Designing Data Systems with a Governance-Priority Mindset
Introduction: Rethinking Governance
When I hear the word "governance," I immediately imagine people saying "No!" blocking access, asking for approvals, and perhaps even being a bit strict. For me, governance has always seemed more like an obstacle than an enabler.
To be honest, I find it boring. I try to avoid anything that looks like security or compliance. I skipped security - related certifications because delving into access policies and audit logs felt dull.
I remember early in my career, I needed access to a project's dataset. It took me days, and I had to go through different teams. By the time I finally got access, the project's direction had changed.
At that moment, a belief formed in my mind: Governance gets in the way of everything! Many people work around this friction with work - arounds: local copies, unofficial pipelines, and undocumented shortcuts.
I "sometimes" do this too, especially in research - type experiments where governance isn't an immediate concern. But what works in isolated experiments doesn't work in systems designed for scale, collaboration, or serving real users.
It took me a long time and several "unexpected events" to realize that governance isn't something to be dealt with "later." It's not a hindrance or an overhead; it's design! This article is a structured reflection on this realization from a different perspective.
Governance is no longer something I avoid. It's something I think about seriously and systematically, right from day one!
To bring structure to this shift in thinking, we can look at the DAMA model. I won't explain the entire model; it's very extensive and covers far more than I can touch on here. Instead, I'll map my learnings to selected parts of it, not as abstract theory, but as a way to anchor practical insights to something structured and widely recognized. In other words, I'll refer to the DAMA model when relevant, but it's not the focus; it's a useful reference to shape my thinking about governance.
➀ Governance Is More Than Access Control
For a long time, governance has been on the periphery of my awareness. As a data scientist, I usually saw it as a synonym for access control, i.e., who can read what and who can write where. Given the nature of my work, this made sense.
If you're a data scientist or analyst, you probably have your own view that governance is something in the background that you occasionally encounter when requesting access.
But this is just part of the picture.
My relationship with governance changed when I took on the role of an architect.
Suddenly, it wasn't just about "Can someone access this table?" but about trust, traceability, and long - term maintainability.
I had to think about how the system was built, how teams collaborated, and how today's decisions would be understood tomorrow. That's when I realized that governance includes access control but isn't defined by it.
The DAMA framework helped me realize this. Through its terminology, I found that what I used to call "governance" actually only covered a small part of the data security domain.
But the data governance domain in DAMA introduced me to concepts I hadn't paid much attention to before, such as management and decision - making authority.
At first glance, these terms may seem abstract. But on closer inspection, I found that they exactly describe the gaps I've encountered.
🏷️ The essence of data management is: Who manages this data? In fact, data managers are the ones who ensure that all data is documented, definitions are consistent, and quality issues are addressed promptly rather than ignored. When someone asks, "What does this column mean?" data managers are the ones who truly understand the data or know who to ask!
Yes, you may ask, "What's the difference between management and ownership?" I had the same question.
In short: Ownership is about responsibility, and management is about tasks.
Even if you don't handle the data yourself, you may own it, make key decisions, and be responsible for the results. Conversely, you can manage the data, maintain its quality and availability, without taking ultimate responsibility when problems arise. For example, a data administrator (e.g., a BI developer or business analyst) says, "This number seems wrong. I think it has incorrect revenue." The data owner (e.g., a finance partner) says, "Thanks. We'll fix this right away and make sure it doesn't happen again."
Decision - making authority addresses a frequently overlooked question: Who makes decisions about this data? Who decides to rename a field, change a data type, or redefine a metric? In many teams, changes happen ad - hoc, driven by whoever needs them. Good governance introduces a clear process for these decisions, ensuring that changes are well - thought - out and their consequences are easy to understand.
As an architect, looking at governance from a broader perspective made me realize that governance isn't something to be patched up later; it needs to be designed early!
② A Glimpse of the DAMA Governance Architecture
By now, you may be wondering: What exactly is the DAMA model? What does it cover? What have I missed?
I had similar doubts. Especially because people often list familiar concepts like data quality, access control, metadata, and then casually add "etc.," as if everything else is just a vague afterthought.
It might be an innocent joke, but in technical conversations, "etc." usually means we've reached the limit of our understanding. To be honest, I also mentioned this in the metaphorical diagram above (the tip of the iceberg, where I said, "... etc.").
Specifically, the DAMA framework lists 11 different data management domains. It's not a checklist or a step - by - step guide; it's more like a map. Maps are useful, especially for architects.
The center of the wheel is data governance. You may notice that it's not just one of many frameworks; it's the coordinating layer that brings everything together.
The domains around it include data architecture, metadata management, data security, data quality, and "more" domains, but these domains are well - defined and have real meaning.
However, the situation is not as simple as it seems: If you're a data architect, only one of these domains, data architecture, has the word "architecture" in its literal title. But in practice, you'll be drawn to almost all of them.
You'll make design decisions that depend on good metadata management.
You'll inherit systems that aren't clearly managed and be asked to "make them scalable."
You'll build pipelines that assume the data is trustworthy and managed.
You'll need to model quality, integrate across silos, and "keep it secure" during execution.
So, an architect doesn't own the entire DAMA system. But if you don't understand its components, you may end up accidentally designing around them or, even worse, ignoring them until they fail in production.
③ Building Trust with Metadata and Lineage
Once I started seeing governance as design rather than a limitation, I couldn't stop noticing all the subtle, quiet ways trust breaks down in systems.
It's not always dramatic. It's not a data breach or a system failure. It's more subtle: Someone hesitates before using a dataset because they're not sure if it's up - to - date. Dashboards are abandoned because no one remembers how a certain number was calculated. Junior analysts recreate reports from scratch because they can't find or trust the existing ones.
These situations rarely show up in event logs, but they're the real cost of missing governance. They slow down teams, create redundancy, and erode confidence.
That's when I realized: Governance isn't just about who has access; it's about whether people can trust what they see after accessing.
This trust doesn't come from dashboards or documentation. It comes from metadata and lineage!
Metadata: The System's Own Memory
Metadata isn't just a description or a label. It's the system's memory!
It allows datasets to explain themselves. It enables people to deal with complexity without guessing. It improves discoverability, reduces misunderstandings, and frees teams from relying on tribal knowledge.
In modern data systems, metadata isn't something filled in later; it's considered during design. Metadata provides structure to otherwise chaotic tables and files.
I say this because I've been there. As a data scientist, I've opened a table with no description, no owner, and I didn't even know what half of the columns meant. Like many others, I'd guess or ignore the columns I didn't understand. Sometimes I was lucky, sometimes not.
Metadata covers aspects such as dataset ownership, field definitions, classifications, relationships between objects, usage patterns, and access context. Capturing all this information isn't just documentation; it's design.
Because good metadata isn't just sitting on a wiki; it affects how your system is understood, used, and extended.
From naming conventions to ownership models, from classifications to lineage, these choices affect the architecture. They're not just descriptive; they're structural.
Designing with governance in mind means you don't just ask, "Where should this go?"
You also ask, "Six months from now, when I'm not around, how will people find it, understand it, and more importantly, trust it?"
Lineage: The Footprint That Builds Confidence
If metadata is memory, then lineage is the story!
Lineage can tell you how a dataset was formed, which source systems fed into it, what transformations were applied, and which dashboards or models depend on it. It turns invisible logic into a visible process!
When something goes wrong or looks off, lineage is often the only reliable way to debug.
I've seen this in practice. If the numbers on a dashboard look suspicious and there's no lineage, you're left opening tabs, tracing pipelines, and guessing.
But with lineage? You follow the trail. You see the join that introduced null values. You notice the upstream field that was quietly deprecated. You're debugging the "system," not just the "symptom."
But lineage isn't just for emergencies. Even when everything is working fine, it builds confidence. It helps analysts ask better questions. It helps new team members get up to speed faster. It helps stakeholders trust the numbers they see.
That's why all designs should consider lineage awareness.
④ Designing for Quality
Data quality is a buzzword that's hard to argue against. Everyone talks about it, and everyone says they care. But when it comes to actual systems, people's actions are often reactive: measure after the fact, report problems, and hope someone downstream will fix them.
It's seen as a technical issue: testing, validation, checking. But if you look closely, quality is the practical action of governance. It's about setting expectations and integrating those expectations into the system structure, not the error log.
The DAMA model makes this distinction clearer. Data quality is an independent domain, but it coexists with data integration. Because real - world quality issues aren't just about invalid values; they include mismatched joins, undocumented assumptions, silent truncations, and decisions made by one team that affect another.
There's a story in aviation history that always sticks with me. During World War II, the British built an extraordinary plane: the de Havilland Mosquito bomber. Unlike most planes at the time, this one was mainly made of wood to save metal for other war uses. It was light, fast, and efficient and was considered an engineering achievement.
But then things started to go wrong. The planes started breaking up in the air. At first, the failures were blamed on enemy attacks. But after investigation, the cause was far less dramatic and more obvious: the glue used by one of the factories to bond the wooden parts was mixed incorrectly!
There were no immediate signs of failure. But over time, under stress and humidity, it finally gave way. The planes passed inspections and carried out their missions. But inside, they were falling apart.
This story always reminds me that a system can "work" on the surface while quietly deteriorating inside.
In data systems, this is what missing quality control looks like. Pipelines run, dashboards load. But the numbers are based on silent assumptions: missing fields, default values, unknown joins. Like the Mosquito, the system will hold up until it fails!
That's why quality must be integrated into the design, not the inspection. Design starts with expectations.
Schema enforcement is one of the most obvious examples. It's often seen as restrictive and a hindrance to flexibility. But in reality, it means clarity. It's like the system saying, "This is what I expect."
When enforcing a certain type, declaring required fields, and rejecting unknown columns, it's not just a technical feature; it's a governance perspective. You declare what valid data looks like, and the system maintains that definition.
But expectations don't stop at structure. They extend to integration: Do the keys align? Are the relationships reasonable? Do the dimensions match? Can this dataset be safely joined with that one, or will it introduce "silent" duplicates? These questions are entirely within the data integration domain and are often the biggest sources of quality issues.
Fortunately, modern architectures give us ways to ensure quality without relying on human memory. We can encode constraints directly into the model. We can create layers that only allow verified data to pass through. We can define contracts, i.e., clear agreements between data producers and consumers, such as, "This is what we promise to deliver. If we fail to deliver, you'll know through the contract."
These aren't just practices; they're design principles. If implemented correctly, they'll change the nature of governance: from reactive to proactive.
⑤ Security, Classification, and Policies as Design Tools
In 2018, the world witnessed a seemingly innocent decision turn into a global scandal. A seemingly harmless personality - test app collected personal information, not only from its users but also from tens of millions of their friends, thanks to the way Facebook's system was designed!
There was no hacking, and no firewalls were breached. The problem wasn't unauthorized access; it was that the system didn't label this data as sensitive from the start! There was no clear classification and no demarcation of what data could be collected, shared, and reused.
The