creating graphs made fun with Pentaho Kettle

Easily load data to Neo4J

Load data to Neo4J

Whether you’re a Neo4J rock star or are just getting your feet wet, the biggest problem you’re probably facing is getting your data into Neo4J as quickly and easily as possible. Of course you can create and import CSV files, but that process quickly becomes tedious and time consuming. We think there is a better way. 

As huge fans of both Kettle and Neo4J, we decided to bring the two together, and are proud to present the availability of the new version of our Neo4J plugin. 
This plugin allows you to do the data preparation and loading of your nodes and relationships, with their labels and properties, all from a visual development environment. 

We’ll assume you’re familiar with getting both Neo4J and Kettle up and running. If you’re not, check here to find out how to get started with Kettle, and here to get started with Neo4J.

To install the plugin from within Spoon, go to the Pentaho Marketplace through Tools → Marketplace. 

In the marketplace, search for ‘neo4j’ and click the ‘install’ button next to the ‘Neo4J Output’ plugin. 

The plugin will download and install, after which you'll need to restart Spoon. Once Spoon is back up, you’ll find a new ‘Neo4J Output’ step in the 'Output' category.

A sample graph, cheers! 

Graphs are everywhere, and so is Belgian beer, so we didn't have to look hard for a sample graph to create through the plugin: we’ve recreated Neo4J rock star Rik Van Bruggen’s Beer Graph demo with the Neo4J Output step to show you how easy creating nodes and relationships can be. Get the ETL to create this sample graph here

The sample (jb_beer_graph.kjb) job consists of two transformations:

  • tr_beer_nodes.ktr: create nodes
  • tr_beer_relationships.ktr: create relationships

The beer graph is created in two separate transformations (first nodes, then relationships) to ensure no node duplicates are created because of transaction overlaps in the transformation that creates the relationships. An alternative approach could have been to create all three relationships in separate transformation. As always, there are many ways to skin a cat, so YMMV. 

You'll need to add key/value pairs to your kettle.properties like, for example: 

NEO4J_HOST=localhost
NEO4J_USER=neo4j
NEO4J_PASS=knowbi
NEO4J_PORT=7687            # the BOLT protocol port (default 7687), not the browser port (default 7474)

After running the job, the graph can be queried from the Neo4J browser (e.g. http://localhost:7474) with the query below, which reads like 'give me all nodes that have a label 'BeerBrand' and a 'name' property of 'Orval': 

This query will return the node for the delicious Orval beer. By double clicking on the node, you'll find its brewery, beer type and alcohol percentage, which will look very similar to the graph below: 

Creating nodes and relationships

Using the plugin is straight forward. First, set the connection properties and verify the connection works. Then, label, relationship and property fields can be selected for nodes and relationships. Properties for nodes and relationships can use the field name as the property name, but this can be overruled by manually entering a value in the  'Property Name' field. 

Detailed documentation about how to use this step can be found on github.

Feedback!

Try the step, beat the hell out of it and let us know if you find any issue by mailtwitter or directly on github

Partners

Neo4J