Creating Neo4J nodes and relationships with PDI

PDI Neo4J Output Step

Update 2017-09: a new version of this plugin was released. Find out more!

While relational databases have proven their value for storing application data with a fixed, predictable structure (tables and columns) and a limited number of joins, they struggle when the data isn't predictable or when a large number or joins is required. Graph databases provide an alternative for these use cases, since they don't require a strict, predictable data structure and treat relationships as first class citizens. 
This sdtimes article provides a more detailed comparison of relational versus graph databases. 
If you're eager to find out more about Graph Databases in general and Neo4J in particular, have a look at the (free) ebooks on the Neo4J website. 

The Neo4J Community and (trial) Enterprise Editions are available for download here. Follow the -very straight forward- installation instructions for your platform on the download page and you should be good to go. 

As discussed in a previous post, standard PDI functionality allows to load data to Neo4J. Because the process is rather complex, has developed a PDI output step to ease the pain of creating Neo4J nodes and relationships. 

To get started, download the plugin from our github page and unzip to your PDI installation folder. 
The plugin extracts itself to the plugins/Neo4JOutput folder of your PDI installation, a couple of samples are extracted to your samples folder. 

After Spoon is (re)started, the Neo4J Output step is available from the 'Output' category. 

The step takes Neo4J server connection parameters for host, port, username and password.
Use the 'Test' button to verify your settings. 

With the server parameters in place, nodes and relationships can be created through the corresponding tabs. 
For this example, we'll be loading nodes and relationships from the 'sales_data.csv' file that comes with your PDI installation. 
To create the nodes, we'll create distinct lists of geography levels (territory, country, city), customer, products and product lines, and orders and order lines. 

For example, to create nodes for the order lines, we'll add a unique key along with a label and some details about the order line as properties: 

Now that the nodes are in place, we're ready to link the nodes through relationships, again by creating separate streams for distinct combinations of nodes (e.g. orders and order lines, customers with the orders they created etc). 

The relationships are created by specifying the 'from' and 'to' nodes, a relationship type and optional relationship properties. 

Once the nodes and relationships have been loaded, the graph can be queried from the Neo4J Browser (default http://HOSTNAME:7474).

The query 'match(n) return n;' fetches all nodes and relationships from the databases. More complex 'Cypher' queries can unlock more detailed insights into your graph, but if you're just getting started, double clicking on a node will expand related nodes and relationships. 

If you'd like to take the Neo4J Output step for a spin, download the plugin and the transformations used for this post, and please let us know if you find any issues. 


		            match(n) return n;