In any kind of data analysis, we're looking at how data is related rather than at individual data points. As such, relationships between data points are where the real value in data is.   

Graph databases like Neo4j have been designed to work with relationships, and make it easy to investigate relationships. This makes is easy to investigate known relationships, but also to discover unknown relationships between your data.
Once the relationships in your data have been uncovered, it becomes easy to make predictions about future relationships, e.g. to make product recommendations to customer, based on what customers with similar profiles typically buy. 

Let's have a look at what a graph database like Neo4j exactly is. When talking about 'databases', people usually think about relational databases first (for now?).

In a relational database, data is organized in tables. Each table consist of a set of predefined columns, each with a predefined and fixed data type. Each record in a table contains a field that uniquely identifies that record (the primary key). Relationships between tables are defined by including references to other tables' primary keys in a table (foreign keys). The relationship between tables or data points is never stored, but is calculated by creating a 'join' between primary and foreign keys every single time a query is executed. Since relationships have to be calculated every time a query runs, relational databases (contrary to what their name implies), are not very good at working with 'relationships'. 

Graph databases have a number of similarities with relational databases, but differ conceptually:

  • nodes are the primary entities in a graph database. Nodes can be annotated with 'properties' and can be grouped by applying 'labels' to them. Nodes can be though of as the schema-less equivalent of tables in a relational databases.
  • relationships specify the relation between two given nodes. Relationship can have an optional direction and, just like nodes, can contain properties. Just like nodes, relationships are stored in the database. 


Having nodes with their relationships stored together in the database opens a number of use cases, without having to recalculate the relationships (or joins) for every single query, opens up a whole series of new use cases that would be very hard or impossible to implement with relational databases. 

  • fraud detection: by monitoring relationship in real-time, 'fraud rings' or other scams can be detected before they cause lasting damage
  • network and IT infrastructure monitoring: graphs are inherently more suitable that relational databases to store and analyse complex interdependencies in networks and IT infrastructure
  • social network analysis: analysis of relations within a social network, community detection and infer or recommend new relations is easy when all existing relationships can be queried
  • recommendation engines: existing relationships can be used to predict new relationships through recommendation algorithms

Although the mathematical graph theories have been around for centuries, it took until recently for graph databases to become popular. Increased access to cheap and powerful computing resources and the development of mainly graph database market leader Neo4J have created a huge increase in demand for graph databases.



We discuss Neo4j and graph databases in general in much more detail on our Graph page