How we query Wikipedia’s Knowledge Graph (DBpedia) in SQL using timbr

Bar Cohen
5 min readApr 27, 2021

--

Querying a knowledge graph with SQL may sound a bit strange. Some would say that it is not necessary, but in this era in which most of our data is stored in SQL-fluent platforms, a good and practical solution is to enable people to keep querying their data in SQL, while maintaining the semantics, relationships, and reasoning of a knowledge graph.

Transform existing databases into Knowledge Graphs

Before jumping ahead, let’s take a brief look at the main solutions that exist today.

Let’s say that we have a large company’s database (relational), which includes tables representing: companies, employees, customers, etc., and we would like to query and find all the CEOs of US companies, who were born outside the US.
Tackling this in standard SQL is possible, however it would most likely involve many JOIN and WHERE statements.
Therefore, the current solution today to query knowledge graphs is to use SPARQL, unfortunately this raises two new issues:
1. We would need to have a deep understanding of the structure of Resource Description Framework (RDF) based data models, because SPARQL allows us to handle these types of questions only once we’ve modeled our data as RDF.
2. We would need to learn how to query using SPARQL, which at a glance may look similar to SQL but is different.
To overcome these issues, we can use the timbr.ai platform, which employs standard SQL to implement the Semantic Web principles, enabling ontology modeling of data as connected, context-enriched concepts with inference and graph traversal capabilities.

Semantic Web Inference and Relationships in SQL

One way to really leverage any SQL knowledge Graph made on the timbr.ai platform, would be to combine it with DBpedia.
For those who are unfamiliar with DBpedia, here's an explanation from the official site of DBpedia:

DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web.

Connecting DBpedia to the timbr platform, allows us to:
Query DBpedia in SQL as a Knowledge Graph
Explore DBpedia’s ontology
Integrate DBpedia with existing data-sources

In timbr-DBpedia, we can also find some interesting built-in dashboards, which allow us to explore DBpedia:

A summary of the DBpedia ontology by inheritance and relationships
A summary of the concept person
Insights into the concept person relationships

Looks nice? Let’s see some code examples:

Q: Which CEOs of US companies were born outside of the US?

Querying in timbr’s semantic SQL
Querying in SPARQL
Querying in standard SQL

We can see that while in standard SQL this query took 54 lines-of-code and in SPARQL 7 lines, in timbr-SQL it took only 4 lines-of-code.

In addition, timbr is fully integrated with Apache Spark and can act as a middleware when using any IDE such as Jupyter or Zeppelin to query data:

Querying in timbr’s semantic SQL via Zeppelin pyspark interpreter

So, how do we start? EASY.

Every query starts with a question, then we follow five simple steps to get the answers.
Q: Find administrative regions, where their leader wasn’t born in the same region

Step A: Find possible interesting concepts by looking at concept relationships

Step B: Get a graph preview of the concept administrativeregion using dtimbr

Step C: Get detailed column information about the concept administrativeregion

Step D: Learn about the different entity types that are derived from administrativeregion

Step E: Gain more insights by viewing concept relationship data

All done! Now we can build a brand new query according to what we’ve learned.

A new query using timbr-SQL answering our question
A snippet from timbr’s query results

What else can we achieve with our data using timbr?

We can get all relationships between two concepts, understand their direction (is_inverse) and names (to use in our SQL query), as well as understand the relationship type, whether it be transitive, many-to-many or perhaps one-to-many:

Get all relationships between concepts person and place
A Snippet of the relationships between concepts person and place

We can use timbr’s dereferenced schema (dtimbr), in order to perform graph traversals on our data:

dtimbr schema example
dtimbr schema result

We can also use timbr’s exhaustive schema(etimbr), which lets us subsequently query all the concepts and properties which are derived from a selected concept:

etimbr schema example
etimbr schema result

In this case, we queried all the derived concepts of the concept work containing one of the following properties: release_date, imdb, isbn.

Concept work and its derived concepts in timbr’s Ontology Modeler

What does the future hold for timbr?
We are currently working on our new Business Language Interface, which will enable us to ask the same questions in natural language!

The timbr BLI platform

Conclusion

We now understand the DBpedia knowledge graph with its concepts and relationships a lot better. We also learned in a nutshell how to query connected data using timbr’s semantic SQL.
I hope that the experience of going through timbr’s new platform encourages you to learn more about your data and how it’s all connected.

Creativity and curiosity are the fuel, timbr is the car and insights are the destination.

--

--