See for yourself which insights a graph database can provide. Unlike other databases, relationships take first priority in graph databases. We at know.bi like to adapt this philosophy with our customers. Fill out the form to request a demo customized for your specific needs, or give us a call.
With the basic understanding of what a graph is, let’s have a look at how this translates to graph databases.
There are a number of graph implementations we’ll look at in some detail, but remember this is not an exhaustive list. There are other implementations, some databases have graph additions bolted on to their relational engine, etc. The bottom line however, is that graph database popularity is skyrocketing.
It goes without saying that native graph databases, being designed from the ground up to work with graphs, significantly outperform graph databases with other storage types.
In a labeled property graph, data is organized as nodes and relationships, both of which can contain properties (key-value pairs).
Nodes can be tagged with a number (0 or more) of labels to represent different roles in a graph or business domain.
Relationships provide directed (see “Graph Theory”), named and semantically relevant connections between two nodes. Just like nodes, relationships can have properties, which can add weight or cost to a relationship. When, for example, you’re trying to find the shortest route between two paths, it may be more efficient to follow a path that leads through three low cost (e.g. short distance) relationships instead of one costly (e.g. long distance) relationship. Although relationships are created with a direction, this direction can be ignored when traversing the graph.
Examples of labeled property graphs are Neo4J, AWS Neptune, ...
A relationship in a hypergraph can connect to any number of nodes. This model is especially useful for data that contains a large number of many-to-many relationships. Hypergraphs can always be created as labeled property graphs, this is not always the case in the opposite direction.
An example of a hypergraph is HypergraphDB
A triple store or RDF (Resource Description Framework) stores data as triples. A triple (e.g. “Bob is 35”, “Bob knows John”) consists of
A triple can be compared to a node in a labeled property graph. Relationships in a triple store are defined as ‘Arcs’, with a triple as the subject (start node), a triple for the object (end node) and an arc or type of relationship for the predicate.
Since arcs create logically linked triples or nodes, triple stores are considered graph databases. However, since their architecture is oriented towards individual triples, they are not as well suited for fast graph traversal like native graph databases, especially property graphs.
Examples of Triple Stores are AWS Neptune, AllegroGraph, Stardog
Relational databases store data in tables. These tables consist of a highly structured, predefined set of columns with strict data types.
Relationships are defined as a combination of columns that serve as row identifiers in one table (primary keys) and references to similar row identifiers in other tables (foreign keys). Relationships in relational databases are not stored in the database, but built at runtime (through JOIN statements in queries). Because relationships do not exist as database objects, they can’t contain any additional meta-information. Creating relationships in runtime is expensive, which makes it hard to work with highly connected data.
In short, the limitations of relational databases in highly connected use cases are:
These shortcomings of relational databases are exactly where graph databases, and Neo4j in particular, really shine:
The graph databases represented a $1 billion market in 2019, and is projected to grow to almost $3 billion by 2024.
According to DB Engines, below is an overview of the 5 largest (pure) graph databases. The DB Engines score is calculated based on number of mentions on search engines, Google Trends, number of questions on StackOverflow, relevance in job offerings and more.
There are many other databases that offer graph functionality (Azure Cosmos DB, AWS Neptune). Since these are not native graph databases (built from the ground up to work with graphs), they are not included here.
Neo4j is the market leader in the graph market. The platform provides native graph storage and processing, an extensive library of graph algorithms, clustered deployments and much more.
At the Nodes convention in June 2021, Neo4j demonstrated a 1000+ node cluster, querying over one trillion (!!!) relationships with response times of milliseconds. At the same Nodes convention, Neo4j announced a next round of investment, worth 325 milliion USD, the biggest investment in database technology ever.
This will fuel Neo4j's growth, and will establish the company and the platform as the dominant graph platform for the coming decade(s).
disclaimer: know.bi is a Neo4j partner and a core contributor to Apache Hop, which has full integration to load data to and extract data from Neo4j.
A graph database optimized for distributed clusters, runs on top of distributed NoSQL/key-value storage engines like Cassandra, HBase or Google BigTable.
DGraph is a horizontally scalable transactional graph database with fast arbitrary-depth joins using a GraphQL-like query language.
Apache Giraph is an iterative graph processing system built for high scalability
TigerGraph is a complete, distributed, parallel graph computing platform supporting web-scale data analytics in real-time
Not included in the DB Engines top 5 ranking, but worth an honorable mention is AWS Neptune, which is a hybrid RDF/property graph database (in this case, the “hybrid” only means one of two options can be chosen when a database is created, there is not hybrid functionality, nor is there an option to switch from RDF to property graph or vice versa after creation).
Modern social networks allow people to communicate and share information with other individuals in large networks. These networks can range from intense interactions with a number of close friends to being part of a larger network (or “community”).
Social network analysis (SNA) is focused on these relationships. It tries to find the way in which individual`s interactions with others influence their behavior or decisions.
Social networks tend to get very complex, consisting of thousands of individuals and millions of interactions (relations) between them. Analyzing these amounts of data requires building a model that simplifies the social network while at the same time remains representative.
Whether you’re looking at credit card fraud, ecommerce, insurance or other types of fraud, the complexity of fraudulent behavior is becoming increasingly complex.
In a graph database like Neo4j, transactions are stored as a graph where related pieces of data are connected, which makes it easy to traverse those relationships in real time and to find the fraudulent patterns quickly.
A lot of geographical or navigational problems work perfectly with graphs. Finding the best route from point A to point B is a matter of finding the shortest paths through a network (graph) of points along the way. Similarly, finding locations nearby is a matter of finding all points within a total distance of point A.
Real-time recommendation engines are key to the success of any online business. To make relevant recommendations in real time requires the ability to correlate product, customer, inventory, supplier, logistics and even social sentiment data. Moreover, a real-time recommendation engine requires the ability to instantly capture any new interests shown in the customer’s current visit – something that batch processing can’t accomplish. Matching historical and session data is trivial for a graph database.
Graph databases easily outperform relational and other NoSQL data stores for connecting masses of buyer and product data (and connected data in general) to gain insight into customer needs and product trends.
Traditionally, information about (groups of) people and resources (files, devices, products, legal documents, …) and access related to these people and resources have been stored in directory services (e.g. Active Directory). As these hierarchic systems start to grow, they start to struggle with a number of challenges:
With a graph engine that can traverse these complex networks of relationships in milliseconds, analyzing access to resources, identifying duplicate roles etc becomes trivial.
Consistent operational data needs to be managed across the entire organization. This requires a central area where master data about customers, products, processes and more is managed
Not only do you need to make sense of all this distributed data that lives in various systems, formats, locations and quality, even more importantly is a good comprehension of the relationships between all of this data.
Keeping an overview of how various infrastructure components and systems are interconnected in today’s large and complex organizations is a daunting task in the relational world.
Since all (virtual) hardware infrastructure, software servers, applications, data flows and user actions are connected, they already are a real-world graph that only needs to be persisted in a graph database.
A number of query results could then be:
Network and IT infrastructure easily become complex to manage, which sooner rather than later requires a configuration management database (CMDB). Where relational databases quickly start to struggle in managing the large number of interconnected systems, this is another use case that graph databases excel at.
A graph database enables you to keep track of your entire infrastructure, it also makes it easy to connect to your many monitoring tools and gain critical insights into the complex relationships between different network or data center operations. From dependency management to automated microservice monitoring, the uses for graphs in network and IT operations is endless.
Most people who have been involved in the design or modeling of a relational database schema consider the process to be hard. A number of notable reasons that feed this perception are
Compared to relational modeling, graph modeling is almost a walk in the park:
SPARQL (pronounced “Sparkle”) is a recursive acronym for “SPARQL Protocol and RDF Query Language”. It is an RDF query language (a semantic query language for databases) that is able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.
SPARQL queries come in 4 different forms:
Example (SELECT) query:
This query joins together all of the triples with a matching subject, where the type predicate, "a", is a person (foaf:Person), and the person has one or more names (foaf:name) and mailboxes (foaf:mbox).
Cypher is Neo4j’s graph query language that allows users to store and retrieve data from the graph database. It is a declarative, SQL-inspired language for describing visual patterns in graphs using ASCII-Art syntax.
Similar to other query languages, Cypher contains a variety of keywords for specifying patterns, filtering patterns, and returning results. Among the most common are: MATCH, WHERE, and RETURN. These operate slightly differently than the SELECT and WHERE in SQL; however, they have similar purposes.
The most important keywords to mention are:
This query matches:
The core Cypher language can be extended through plugins. Two notable plugins that add hundreds of procedures and functions are
Standardization is required to avoid fragmentation in the increasingly popular world of labeled property graphs, which resulted in the creation of GQL (Graph Query Language). In June of 2019, the ISO/IEC’s Joint Technical Committee 1 (responsible for IT standards) started the voting process for GQL. With GQL as the ISO/IEC’s first new standard since SQL, this is quite something!
GQL is created by combining the strengths of 3 graph query languages:
By combining the strengths of the leading graph query languages in the industry into what is intended to become the ‘SQL for graphs’, the ISO/IEC intends to prevent fragmentation and move the entire graph space forwards.
Let’s have a closer look at the data loading options in market leader Neo4j:
This is another area where Neo4j is quite far ahead of the competition. A number of algorithms types that are supported through the algo-library are: