How we query Wikipedia’s Knowledge Graph (DBpedia) in SQL using timbr
Querying a knowledge graph with SQL may sound a bit strange. Some would say that it is not necessary, but in this era in which most of our data is stored in SQL-fluent platforms, a good and practical solution is to enable people to keep querying their data in SQL, while maintaining the semantics, relationships, and reasoning of a knowledge graph.
Before jumping ahead, let’s take a brief look at the main solutions that exist today.
Let’s say that we have a large company’s database (relational), which includes tables representing: companies, employees, customers, etc., and we would like to query and find all the CEOs of US companies, who were born outside the US.
Tackling this in standard SQL is possible, however it would most likely involve many JOIN and WHERE statements.
Therefore, the current solution today to query knowledge graphs is to use SPARQL, unfortunately this raises two new issues:
1. We would need to have a deep understanding of the structure of Resource Description Framework (RDF) based data models, because SPARQL allows us to handle these types of questions only once we’ve modeled our data as RDF.
2. We would need to learn how to query using SPARQL, which at a glance may look similar to SQL but is different.
To overcome these issues, we can use the timbr.ai platform, which employs standard SQL to implement the Semantic Web principles, enabling ontology modeling of data as connected, context-enriched concepts with inference and graph traversal capabilities.
One way to really leverage any SQL knowledge Graph made on the timbr.ai platform, would be to combine it with DBpedia.
For those who are unfamiliar with DBpedia, here's an explanation from the official site of DBpedia:
DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web.
Connecting DBpedia to the timbr platform, allows us to:
✔ Query DBpedia in SQL as a Knowledge Graph
✔ Explore DBpedia’s ontology
✔ Integrate DBpedia with existing data-sources
In timbr-DBpedia, we can also find some interesting built-in dashboards, which allow us to explore DBpedia:
Looks nice? Let’s see some code examples:
Q: Which CEOs of US companies were born outside of the US?
We can see that while in standard SQL this query took 54 lines-of-code and in SPARQL 7 lines, in timbr-SQL it took only 4 lines-of-code.
In addition, timbr is fully integrated with Apache Spark and can act as a middleware when using any IDE such as Jupyter or Zeppelin to query data:
So, how do we start? EASY.
Every query starts with a question, then we follow five simple steps to get the answers.
Q: Find administrative regions, where their leader wasn’t born in the same region
Step A: Find possible interesting concepts by looking at concept relationships
Step B: Get a graph preview of the concept administrativeregion using dtimbr
Step C: Get detailed column information about the concept administrativeregion
Step D: Learn about the different entity types that are derived from administrativeregion
Step E: Gain more insights by viewing concept relationship data
All done! Now we can build a brand new query according to what we’ve learned.
What else can we achieve with our data using timbr?
We can get all relationships between two concepts, understand their direction (is_inverse) and names (to use in our SQL query), as well as understand the relationship type, whether it be transitive, many-to-many or perhaps one-to-many:
We can use timbr’s dereferenced schema (dtimbr), in order to perform graph traversals on our data:
We can also use timbr’s exhaustive schema(etimbr), which lets us subsequently query all the concepts and properties which are derived from a selected concept:
In this case, we queried all the derived concepts of the concept work containing one of the following properties: release_date, imdb, isbn.
What does the future hold for timbr?
We are currently working on our new Business Language Interface, which will enable us to ask the same questions in natural language!
Conclusion
We now understand the DBpedia knowledge graph with its concepts and relationships a lot better. We also learned in a nutshell how to query connected data using timbr’s semantic SQL.
I hope that the experience of going through timbr’s new platform encourages you to learn more about your data and how it’s all connected.
Creativity and curiosity are the fuel, timbr is the car and insights are the destination.