Load data to Neo4J
Whether you’re a Neo4J rock star or are just getting your feet wet, the biggest problem you’re probably facing is getting your data into Neo4J as quickly and easily as possible. Of course you can create and import CSV files, but that process quickly becomes tedious and time consuming. We think there is a better way.
As huge fans of both Kettle and Neo4J, we decided to bring the two together, and are proud to present the availability of the new version of our Neo4J plugin.
This plugin allows you to do the data preparation and loading of your nodes and relationships, with their labels and properties, all from a visual development environment.
To install the plugin from within Spoon, go to the Pentaho Marketplace through Tools → Marketplace.
In the marketplace, search for ‘neo4j’ and click the ‘install’ button next to the ‘Neo4J Output’ plugin.
The plugin will download and install, after which you'll need to restart Spoon. Once Spoon is back up, you’ll find a new ‘Neo4J Output’ step in the 'Output' category.
A sample graph, cheers!
Graphs are everywhere, and so is Belgian beer, so we didn't have to look hard for a sample graph to create through the plugin: we’ve recreated Neo4J rock star Rik Van Bruggen’s Beer Graph demo with the Neo4J Output step to show you how easy creating nodes and relationships can be. Get the ETL to create this sample graph here.
The sample (jb_beer_graph.kjb) job consists of two transformations:
- tr_beer_nodes.ktr: create nodes
- tr_beer_relationships.ktr: create relationships
The beer graph is created in two separate transformations (first nodes, then relationships) to ensure no node duplicates are created because of transaction overlaps in the transformation that creates the relationships. An alternative approach could have been to create all three relationships in separate transformation. As always, there are many ways to skin a cat, so YMMV.
You'll need to add key/value pairs to your kettle.properties like, for example:
NEO4J_PORT=7687 # the BOLT protocol port (default 7687), not the browser port (default 7474)
After running the job, the graph can be queried from the Neo4J browser (e.g. http://localhost:7474) with the query below, which reads like 'give me all nodes that have a label 'BeerBrand' and a 'name' property of 'Orval':
This query will return the node for the delicious Orval beer. By double clicking on the node, you'll find its brewery, beer type and alcohol percentage, which will look very similar to the graph below:
Creating nodes and relationships
Using the plugin is straight forward. First, set the connection properties and verify the connection works. Then, label, relationship and property fields can be selected for nodes and relationships. Properties for nodes and relationships can use the field name as the property name, but this can be overruled by manually entering a value in the 'Property Name' field.