What is a "language"?
Exploring the three layers of meaning in RDF - instances, schemas, and the modeling language layer (LBox/TBox/ABox)
What is a “language”?
When we say “language” in software, most people think of things like C++, Python, or SQL.
Those languages are relatively fixed:
- The keywords are defined.
- The grammar is locked in / evolves slowly
- The compiler/runtime is the authority on what is “valid.”
But in the semantic web / RDF world, “language” behaves very differently.
Three layers of meaning
A useful mental model:
Instances / ABox / Data The actual facts: instances, rows, events, edges.
Schema / TBox / Domain The business concepts:
Customer,Order,Position,EmissionsReport, etc.Modeling Language / LBox The modeling primitives you use to express schemas:
- Things like
Class,subClassOf,Property
- Things like
In Python, the “LBox” is the language spec itself: class, def, @dataclass, etc. You don’t model it; you just use it.
In RDF, that “language layer” is itself modeled in RDF.
That’s very weird, but also very powerful!
A mutable language layer
Unlike traditional languages, RDF’s LBox is not a closed spec shipped by a single vendor. It’s:
- Extensible – you can define new modeling constructs as vocabularies.
- Negotiated – tools choose which constructs they understand.
- Stackable – RDFS, OWL, custom constraint vocabularies can all co-exist.
Historically, OWL or RDFS have been the language of choice for modeling in RDF, but they exist at two extremes:
- RDFS is intentionally simple
- OWL is expressive with a steep learning curve.
Here’s some quick examples of the types of modeling and inference you can do with both RDFS and OWL. (Inference in this context means deriving new facts from existing ones based on the language’s semantics.)
RDFS inference: If you state
Employee subClassOf Person, then whenever you encounteralice type Employee, you can inferalice type Person.OWL inference: OWL adds more expressive inference capabilities, for example transitive properties:
Alice hasAncestor BobandBob hasAncestor Carollets you inferAlice hasAncestor Carol.- OWL gets much more complicated than this, and can quickly lead to situations where the inference is infeasible to compute.
Given the broad use of these vocabularies as modeling languages, many tools recognize and implement these basic types of inferences.
SHACL: New-ish kid on the block
On the surface, SHACL is “a constraint language for validating RDF data.”
That’s true, but I think that sells SHACL short as a potential piece of the “language layer”.
When you define SHACL shapes, you’re doing exactly what developers expect a schema language to do:
- Describe which properties are allowed/required.
- Constrain types and cardinalities.
- Capture expectations that are effectively closed-world for your system.
- Make the structure of the graph explicit and checkable.
In other words: SHACL isn’t just a validator tacked onto RDF. It’s a language for specifying the schema of the graph.
That puts SHACL squarely in the LBox: it defines the modeling constructs that describe how your data should look.
Why this framing matters
When I started thinking in this way, a few things snapped into focus:
- OWL, RDFS, and SHACL aren’t “just ontologies” – they’re languages you can choose from and combine.
- Combining languages often gives you access to more tools, and many OWL axioms provide value without requiring a full OWL reasoner.
- If a language doesn’t quite work for you, you can extend it! Just know that doing so puts you on the hook for tooling.
- SHACL offers a middle ground between RDFS and OWL that is much more friendly to people with traditional programming / data backgrounds.
- For teams building knowledge graphs, data products, or semantic APIs, being explicit about “This is our language for describing schemas, and these are our domain schemas, and here is our data” makes the whole system much more approachable.
I’m curious how others are thinking about this:
- Have you tried SHACL (or similar) as a first-class schema language, not just validation?
- How explicitly do you talk about your own “language layer” vs. your domain model vs. your data?
Would love to hear how you frame this with your teams.
Thanks to @HolgerKnublach for his article Ontology Modeling with SHACL: Getting Started which has greatly influenced how I think about modeling in RDF.