5 minutes to configure Pipeline Log in Apache Hop

Originally posted on May 24, 2022 11:00:00 AM
Last updated on July 24, 2024
Bart Maertens

data architect and developer with over 20 years of experience in data engineering and analytics. Founder and lead of the know.bi expert team, Apache Hop co-founder and PMC member.

Pipeline Log

Apache Hop is a data engineering and data orchestration platform that allows data engineers and data developers to visually design workflows and data pipelines to build powerful solutions.

After your project has gone through the initial development and testing, knowing what is going on in runtime becomes important.

The Apache Hop Pipeline Log allows the logging of the activity of a pipeline with another pipeline. A Pipeline Log streams logging information from a running pipeline to another pipeline. The Pipeline Log will be created in JSON format.

Hop will pass the logging information for each pipeline you run to the pipeline(s) you specify as pipeline log metadata objects. In this post, we'll look at an example of how to configure and use the pipeline log metadata to write pipeline logging information to a relational database.

The examples here are provided we use variables to separate code and configuration according to best practices in your Apache Hop projects.

Step 1: Create a Pipeline Log metadata object

To create a Pipeline Log click on the New -> Pipeline Log option or click on the Metadata -> Pipeline Log option.

new-pipeline-log

The system displays the New Pipeline Log view with the following fields to be configured.

The Pipeline Log can be configured as in the following example:

Name: the name of the metadata object (pipelines-logging).
Enabled: (checked).
Pipeline executed to capture logging: select or create the pipeline to process the logging information for this Pipeline Log (${PROJECT_HOME}/hop/logging/pipelines-logging.hpl).

Next, select or create the pipeline to be used for logging the activity. We'll create a pipeline soon, important to note is that you can use all of the functionality in Apache Hop pipeline to work with the logging data. The only prerequisite is that the first transform in this pipeline needs to start with a pipeline logging transform.

Execute at the start of the pipeline?: (checked).
Execute at the end of the pipeline?: (checked).
Execute periodically during execution?: (unchecked)

Finally, save the Pipeline Log configuration.

PRO TIP: pipeline logging will apply to any pipeline you run in the current project. That may not be necessary or even not desired. If you want to only work with logging information for a selected number of pipelines, you can add a selection of pipelines to the table below the configuration options ("Capture output of the following pipelines"). The screenshot below shows the single "generate-fake-books.hpl" pipeline that logging will be captured for in the default Apache Hop samples project.

pipeline-log-selection

Step 2: Create a new pipeline with the Pipeline Logging transform

To create the pipeline you can go to the perspective area or by clicking on the New button in the New Pipeline Log dialog. Then, choose a folder and a name for the pipeline.

A new pipeline is automatically created with a Pipeline Logging transform connected to a Dummy transform (Save logging here).

pipeline-log

Now it’s time to configure the Pipeline Logging transform. This configuration is very simple, open the transform and set your values as in the following example:

Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (log).
Also log transform: selected by default.

Step 3: Add and configure a Table output transform

The Table Output transform allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This transform provides configuration options for the target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.

TIP: In this example, we are going to use a relational database connection to log but you can also use output files. In case you decide to use a database connection, check the installation and availability as a pre-requirement.

Add a Table Output transform by clicking anywhere in the pipeline canvas, then Search 'table output' -> Table Output.

Now it’s time to configure the Table Output transform. Open the transform and set your values as in the following example:

table output

Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (pipelines logging).

Connection: The database connection to which data will be written (logging-connection). The connection was configured by using the logging-connection.json environment file that contains the variables:
Target table: The name of the table to which data will be written (pipelines-logging).
Click on the SQL option to generate the SQL to create the output table automatically:
Execute the SQL statements:
Open the created table in your favorite database explorer (e.g DBeaver) to see all the logging fields:
Close and save the pipeline.

Step 4: Run a pipeline and check the logs

Finally, run a pipeline by clicking on the Run -> Launch option. In this case, we use a basic pipeline (generate-rows.hpl) that generates a constant and writes the 1000 rows to a CSV file:

The data of the pipeline execution will be recorded in the pipelines-logging table.

Check the data in the pipelines-logging table.

run-pipeline-table

Next Steps

You now know how to use the pipeline log metadata type to work with everything Apache Hop has to offer to process your pipeline logging information.

Feel free to reach out if you'd like to find out more or to discuss how we can help with pipeline logging or any other aspect of your data engineering projects with Apache Hop.

You can find the samples in 5-minutes-to github repository. As always, this post will be contributed back to the Apache Hop docs.

Want to find out more? Download our free Hop fact sheet now!

data engineering, open source, data orchestration, apache hop, etl, metadata

Workflow Log

Apache Hop is a data engineering and data orchestration platform that allows data...

What is Apache Hop?

Apache Hop is a visual, metadata-driven data engineering platform that allows...

5 minutes to configure Pipeline Log in Apache Hop

Pipeline Log

Step 1: Create a Pipeline Log metadata object

Step 2: Create a new pipeline with the Pipeline Logging transform

Step 3: Add and configure a Table output transform

Step 4: Run a pipeline and check the logs

Next Steps

Subscribe to the know.bi blog

Blog comments

Related posts

5 minutes to configure Workflow Log in Apache Hop

Workflow Log

run Apache Hop workflows and pipelines in Apache Airflow

5 minutes to get started with Apache Hop

What is Apache Hop?