MongoDB
MongoDB is a document-oriented database that stores data in JSON-like documents with a dynamic schema. It means that you can store your records without worrying about the data structure such as the number of fields or types of fields to store values.
Apache Hop is a data engineering and data orchestration platform that is currently incubating at the Apache Software Foundation. Hop allows data engineers and data developers to visually design workflows and data pipelines to build powerful solutions. No other data engineering platform currently has the integration with Neo4j that Apache Hop offers.
With the following example, you will learn how to extract data from a MongoDB database using Apache Hop.
As always, the examples here use a Hop project with environment variables to separate code and configuration in your Hop projects.
Step 1: Create a MongoDB connection
The MongoDB connection, specified on a project level, can be reused across multiple (instances of) a transform or other plugin types.
To create a MongoDB Connection click on the New -> MongoDB Connection option or click on the Metadata -> MongoDB Connection option. The system displays the New MongoDB Connection view with the following fields to be configured.
The connection can be configured as in the following example:
Test the connection by clicking on the Test button.
Step 2: Add and configure a MongoDB input transform
The MongoDB input transform retrieves documents or records from a collection in MongoDB. After creating your pipeline (read-from-mongodb), add a MongoDB input transform. Click anywhere in the pipeline canvas, then Search 'mongodb' -> MongoDB input.
Now it’s time to configure the MongoDB input transform. Open the transform and set your values as in the following example:
Tab: Input options
Tab: Query
Tab: Fields
In this example, the data is extracted using the columns format: uncheck the Output single JSON field and click on the Get fields to get the collection fields.
To preview the read data click on the Preview button.
Step 3: Add and config a Text File output transform
The Text file output transform is used to export data to text file format. This is commonly used to generate Comma Separated Values (CSV files) that can be read by spreadsheet applications.
Add a Text File output transform by clicking anywhere in the pipeline canvas, then Search 'text' -> Text File output.
Tab: File
Tab: Fields
Step 4: Run your pipeline
Finally, run your pipeline by clicking on the Run -> Launch option.
The 'local' run configuration should have been created with your Hop project. If it isn't check the Hop documentation to create a pipeline run configuration.
Open the CSV file to see the read data.
You can find the samples in 5-minutes-to github repository.
Want to find out more? Download our free Hop fact sheet now!