Start Where You Are - A Practical Path to Data Products
A practical incremental approach to implementing data products using semantic web technologies
Start Where You Are: A Practical Path to Data Products
With the advent of LLMs, metadata is having a bit of a renaissance. Where before an organization could muddle along with a “people oriented” approach to data context, now they find themselves needing to modernize rapidly or be left behind by their more organized competitors.
The problem? It’s hard to know where to start. Data products have had a lot of buzz, but execution remains a challenge, and leadership isn’t waiting around for perfect solutions. What if you could start with what you already have? Over the past few years I’ve landed on an incremental approach: Publish → Align → Replatform. And along the way, I discovered that technologies hiding in plain sight have been solving these problems for decades.
A Journey Towards Data Products
I spent my early career as a data consumer. The challenge I had, and most of the engineering new hires had, was that we worked for a financial data company, but didn’t have a background in finance. We had documentation, access to the data pipelines, and SMEs to help us figure things out, but all of those sources required active effort to chase down and interpret. When I discovered GraphQL, I fell in love not because I was developing highly optimized mobile applications and wanted to be gentle with clients’ data caps, but because GraphQL requires a schema. All the information a consumer needed was embedded right there. You could ignore it, but if you were curious, it was waiting for you.
In 2021 I was starting to make my transition from data consumer to data producer. While attending Subsurface that summer, I got to hear Zhamak Dehghani give her keynote presentation Data Mesh - Enabled with a Self-Serve Platform. I remember it distinctly because it was the first time I got to hear the term “Data Product”. I remember thinking a few things coming out of that presentation: “I’m not sure I fully understand, but I think I want it”, and “I have no idea how to get started”. In a lot of ways these two questions ended up driving the rest of my journey between then and now.
Analytical Data Products
In data mesh, Zhamak Dehghani defines a “data product” (or “analytical data product”) as a domain-owned, independently deployable unit that packages data plus the code, metadata, and policies needed to publish and consume it. Think of it as a product in the traditional sense: something owned by a team, designed for consumers, with clear interfaces and SLOs.
Dehghani outlines baseline characteristics that every data product should meet:
- Discoverable: easily found in a catalog with clear docs
- Addressable: stable, unique identifier to request it
- Understandable: explicit semantics, schema, and meaning
- Trustworthy: visible quality signals, lineage, and ownership
- Accessible: standard interfaces (files, tables, APIs)
- Interoperable: follows shared conventions for composition
- Valuable: solves real use cases without extra stitching
- Secure: built-in access control and policy enforcement
Practically, this means each product exposes interfaces for discovery and access, publishes contracts (schemas and policies), and is owned by a team that treats data consumers as customers.
Data Products
Here’s where I differ a bit from the data mesh literature. Data Mesh uses “data product” and “analytical data product” interchangeably, but I think it’s useful to distinguish them. An analytical data product is just one type of data product. This matters because your enterprise probably has lots of different kinds of data with different access patterns and capabilities:
- Relational tables and views (SQL, warehouse schemas)
- Files in object storage (Parquet, CSV, JSON)
- APIs (REST, GraphQL, OpenAPI)
- Graph databases (SPARQL, Cypher)
- Streaming data (Kafka, PubSub topics)
- Semantic definitions (ontologies in TTL, SHACL, OWL)
- and more
Why does this matter? Because you can take these simpler building blocks and compose them into data products that actually reflect what your organization has today, not what some ideal future state looks like.
Example A: “Securities Master”
Let’s say you have a securities master database. This is one data product that exposes multiple interfaces:
Relational interface: Your standard tables and views with instruments, issuers, and identifiers. Someone can query holdings by ISIN, look up issuer details, trace identifier mappings across systems.
Semantic interface: A lightweight ontology that defines the core concepts: what’s an Instrument? An Issuer? A Listing? An Identifier? This gives context for anyone trying to understand what those relational tables actually mean.
One product, two ways to access and understand it.
Example B: “Pricing & Market Data”
Here’s another single product serving different consumption patterns:
Stream interface: Real-time pricing ticks on a Kafka topic. Perfect for live analysis, trading algorithms, monitoring dashboards.
File interface: Historical pricing partitioned by date and symbol in Parquet files. Sitting in object storage, optimized for cost-efficient backtesting and analysis.
Same product, same underlying data, multiple interfaces based on how people need to use it.
But isn’t the idea that analytical data products should give us uniform semantics across the board? Isn’t that the promise?
Yes it is, but most enterprises don’t start with uniform capabilities. Data landscapes are messy, and boiling the ocean doesn’t work. Instead, start where you are and move through three stages: Publish → Align → Replatform.
- Publish what exists as a product (owner, catalog card, docs, supported access modes, basic schema)
- e.g., Postgres → a relational data product, a curated training set → a training data product.
- Align by expanding the shared interfaces you provide: add shared semantics, IDs, units without moving the data
- Example B could be better by either publishing its own semantics, or even better aligning to an already published company standard.
- Replatform when the ROI is clear
- Remodel storage/compute/pipelines to meet native analytical data-product traits (performance, interoperability, governance by design).
Keep the common product contract throughout; let the implementation mature. Not every data product needs to reach Replatform on day one.
Getting started with data products
Now you hopefully have some idea of what a data product can be conceptually, but what is it physically? If you read my last article you have some idea where this is going: Semantic web has been delivering basic data products for years, they just call them ontologies!
At its simplest, a data product can be a document at a URL that describes itself and its data. An ontology is typically a file (often in RDF/Turtle format) that defines concepts, their relationships, and metadata about itself, all served over HTTP.
Let’s take as an example the Simple Knowledge Organization System, SKOS, what I would call a “Semantic Data Product” above. Here’s a snippet from SKOS:
...
@prefix dct: <http://purl.org/dc/terms/> .
<http://www.w3.org/2004/02/skos/core>
dct:contributor "Dave Beckett", "Nikki Rogers", "Participants in W3C's Semantic Web Deployment Working Group." ;
dct:creator "Alistair Miles", "Sean Bechhofer" ;
dct:description "An RDF vocabulary for describing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, 'folksonomies', other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies."@en ;
dct:title "SKOS Vocabulary"@en ;
a owl:Ontology ;
rdfs:seeAlso <http://www.w3.org/TR/skos-reference/> .
... How SKOS (and semantic web ontologies) map to data product characteristics:
- Discoverable: Google it or ask your LLM of choice
- Addressable: Stable URL at http://www.w3.org/2004/02/skos/core
- Understandable: Documentation as HTML for humans, RDF formats for machines, reference links built right in
- Trustworthy: Clear ownership and attribution
- Accessible: Just HTTP
- Interoperable: Other ontologies can reference it freely using semantic web patterns
- Valuable: A framework for classification systems you can use immediately
- Secure: Standard HTTP access control
SKOS demonstrates that the semantic web community has been thinking about these problems for decades. They’ve created reusable, well-documented ontologies and processes that anyone can adopt or extend. There’s even a data product ontology (a data product of data products?) to help get you started.
The Knowledge Gap & Tooling Challenge
Here’s the catch: While semantic web standards offer a powerful foundation for data products, there’s a steep learning curve. Most organizations don’t have teams fluent in RDF, SPARQL, or ontology design. The tooling, while mature, either assumes fluency in semantic web, or is behind a paywall.
These gaps are exactly why I started Semvia: To build the tools I wish I had to get started with semantic web, knowledge graphs, and data products in the enterprise.
Upcoming previews to watch out for:
- Data Product Manager: A “package manager” for discovering, installing, and managing data products and RDF ontologies
- Data Product Editor: A user-friendly interface for creating and editing RDF based data products, no PhD in semantic web required.
I’m building these tools in the open and would love to connect with others interested in this space. If you’re exploring data products at your organization, frustrated with the current RDF tooling ecosystem, or just want to follow along on the journey, let’s connect on LinkedIn, or reach out at hello@semvia.io