What is RDF, and Why is it Awesome for Metadata?

An introduction to RDF and why it works so well for metadata based on my experience over the last four years.

By Greg Ruhnow
tutorialsrdfsemantic-webtech

Simply put, Resource Description Framework (RDF) is a data model. A way of organizing your thoughts so that you can communicate about a topic, in this case data. There are many out there that you may be more familiar with:

  • Relational (data is represented as tables w/ rows and columns)
  • Document (data is represented as documents)
  • Object-Oriented (data is represented as objects with properties and methods)
  • Graph (data is represented as nodes and edges)

Specifically, RDF is a type of graph data model that has the “triple” as its main organizing concept.

The Triple: RDF’s Building Block

Each triple consists of three parts:

  1. Subject - What you’re describing
  2. Predicate - How the subject and object relate
  3. Object - The value or target of the relationship

Think of it as a simple sentence: “John (subject) knows (predicate) Jane (object).”

To better describe John, we simply add more triples:

SubjectPredicateObject
JohnfullNameJohn Doe
JohnknowsJane
Johnage30
Johnemailjohn@example.com

JSON with the same information might look like this:

{
	"id": "John",
	"fullName": "John Doe",
	"knows": ["Jane"],
	"age": 30,
	"email": "john@example.com"
}

RDF: Web Native

While the above example is useful for understanding the concept of triples, it is also a simplification. RDF is standardized by the World Wide Web Consortium (W3C), and is designed to work as a graph model implemented on the web.

In reality John knows Jane would look like this:

<https://example.com/John> <https://schema.org/knows> <https://example.com/Jane> .

As you can see, each part of the triple is actually a URI that can be dereferenced to learn more about the resource being referenced. In this example, John and Jane are placeholders and don’t resolve to real pages. The predicate schema:knows, however, comes from schema.org and has a published definition. When you follow that URI in a browser you’ll see HTML:

schema.org/knows example

The same page also embeds a more machine-friendly form of the information as JSON-LD, a JSON-based serialization format of RDF:

{
	"@context": {
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
		"schema": "https://schema.org/"
	},
	"@id": "schema:knows",
	"@type": "rdf:Property",
	"rdfs:comment": "The most generic bi-directional social/work relation.",
	"rdfs:label": "knows",
	"schema:domainIncludes": {
		"@id": "schema:Person"
	},
	"schema:rangeIncludes": {
		"@id": "schema:Person"
	}
}

This JSON-LD represents the following triples:

SubjectPredicateObject
schema:knowsrdf:typerdf:Property
schema:knowsrdfs:comment“The most generic bi-directional social/work relation.”
schema:knowsrdfs:label“knows”
schema:knowsschema:domainIncludesschema:Person
schema:knowsschema:rangeIncludesschema:Person

As you can see, the predicate “knows” from the “John knows Jane” example is itself a resource that has its own definition. If there is a part of schema:knows definition that does not make sense, for example schema:domainIncludes, then the schema:domainIncludes uri can be followed to see that definition. This demonstrates one of RDF’s superpowers: resources can be interconnected through shared vocabularies making it much easier to share meaning across your organization.

RDF as a Meta-Model

While RDF is itself a data model, another one of RDF’s super powers is that it is capable of modeling other models. This flexibility allows RDF to represent and bridge between different data modeling paradigms.

Describing Other Data Models

RDF’s flexibility allows it to represent concepts from other data models:

Object-Oriented Models with SHACL

RDFS with SHACL (Shapes Constraint Language) can model object-oriented concepts like classes, inheritance, and properties with constraints:

@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Base Product class
ex:Product a rdfs:Class, sh:NodeShape ;
    rdfs:label "Product" ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ; sh:maxCount 1
    ] ;
    sh:property [
        sh:path ex:price ;
        sh:datatype xsd:decimal ;
        sh:minInclusive 0
    ] ;
    sh:property [
        sh:path ex:region ;  # Same region concept as Data Cube
        sh:datatype xsd:string ;
        sh:in ( "North America" "Europe" "Asia Pacific" )
    ] .

ex:PhysicalProduct a rdfs:Class, sh:NodeShape ;
    rdfs:subClassOf ex:Product ;
    sh:property [
        sh:path ex:weight ;
        sh:datatype xsd:decimal ;
        sh:minInclusive 0
    ] ;
    sh:property [
        sh:path ex:shippingCategory ;
        sh:datatype xsd:string
    ] .

ex:DigitalProduct a rdfs:Class, sh:NodeShape ;
    rdfs:subClassOf ex:Product ;
    sh:property [
        sh:path ex:downloadUrl ;
        sh:datatype xsd:anyURI ;
        sh:minCount 1
    ] ;
    sh:property [
        sh:path ex:fileSize ;
        sh:datatype xsd:integer ;
        sh:minInclusive 0
    ] .

This captures inheritance, properties, and constraints that you’d find in OOP languages like Java or C++. Notice how both models can reference the same concepts (like ex:Product and ex:region) - this is what we’re after!

Note: OWL is another more common way to achieve this type of modeling in RDF, but SHACL tends to be more approachable for people new to the space.

Relational Data with RDF Data Cube

The RDF Data Cube Vocabulary lets you model multi-dimensional statistical data (like what you’d find in relational OLAP systems):

@prefix qb: <http://purl.org/linked-data/cube#> .
@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Define the structure (schema) of a product sales data cube
ex:productSalesStructure a qb:DataStructureDefinition ;
    qb:component
        [ qb:dimension ex:product ;
          qb:order 1 ] ,
        [ qb:dimension ex:region ;
          qb:order 2 ] ,
        [ qb:dimension ex:quarter ;
          qb:order 3 ] ,
        [ qb:measure ex:revenue ] ,
        [ qb:measure ex:unitsSold ] .

# Define the dimensions and measures
ex:product a qb:DimensionProperty ;
    rdfs:label "Product" ;
    rdfs:range ex:Product ; # Links to the product definition above
    qb:concept ex:ProductConcept .

ex:region a qb:DimensionProperty ;
    rdfs:label "Sales Region" ;
    rdfs:range xsd:string ;
    qb:concept ex:GeographicTerritory .

ex:quarter a qb:DimensionProperty ;
    rdfs:label "Quarter" ;
    rdfs:range xsd:string ;
    qb:concept ex:FiscalPeriod .

ex:revenue a qb:MeasureProperty ;
    rdfs:label "Revenue" ;
    rdfs:range xsd:decimal ;
    qb:concept ex:GrossRevenue .

ex:unitsSold a qb:MeasureProperty ;
    rdfs:label "Units Sold" ;
    rdfs:range xsd:integer ;
    qb:concept ex:SalesVolume .

This defines what would traditionally be a fact table schema in a data warehouse, but the schema itself is data that can be queried and extended.

The Power of Meta-Modeling

This meta-modeling capability is why RDF excels at metadata management - it can morph to whatever model is needed for a usecase, while maintaining a single underlying implementation that allows you to share meaning across systems.

Semantic Building Blocks

In the modern data landscape, open, low-level formats like Apache Arrow and Apache Parquet have emerged as fundamental building blocks for the data ecosystem, with widespread adoption across libraries and vendors. They solve the problem of efficient data storage and transfer, but what they don’t solve for is data meaning. This is where I see the opportunity for RDF today - extending the open data stack to include semantic understanding using an already existing and open standard: RDF.

Looking Forward

RDF has quietly powered knowledge graphs at tech giants for years, but its potential extends far beyond that. As organizations increasingly need to share meaning across systems, RDF’s moment is arriving.

The challenge isn’t the technology itself; it’s accessibility. While other data technologies Arrow, Parquet, Iceberg get easier every day, RDF is still waiting for its breakthrough moment.

This is the opportunity I see: building tools that make RDF accessible and valuable to everyone who needs it:

  • Executives can visualize how their data connects across the enterprise
  • Subject matter experts can define and maintain vocabularies without first learning RDF and a dozen vocabularies
  • Engineers can integrate semantic understanding as easily as they import a library
  • LLM agents can leverage structured knowledge without hallucination

I’m actively exploring this space and would love to connect with others who see this potential. Whether you’re wrestling with metadata chaos, building knowledge graphs, or just RDF-curious—what’s your take on making semantic technologies more accessible?