5 minutes to configure Workflow Log in Apache Hop

5 minutes to configure Workflow Log in Apache Hop

Workflow Log

Apache Hop is a data engineering and data orchestration platform that allows data engineers and data developers to visually design workflows and data pipelines to build powerful solutions.

After your project has gone through the initial development and testing, knowing what is going on in runtime becomes important. 

The Workflow Logs in Hop allow workflow logging information to be passed down to a pipeline for processing as JSON objects. The receiving pipeline can process this logging information with all the functionality Hop pipelines have to offer, e.g. write to a relational or NoSQL database, a Kafka topic, etc.

Hop will send the logging information for each workflow you run to the Workflow Log pipeline you specify. 

What we are going to do is implement an example of how to configure and use the Workflow Log metadata.

Step 1: Create a Workflow Log metadata object

To create a Workflow Log click on the New -> Workflow Log option or click on the Metadata -> Workflow Log option.

The system displays the New Workflow Log view with the following fields to be configured.Apache Hop - New Workflow Log

The Workflow Log can be configured as in the following example:

Apache Hop - Workflow Log configuration
  • Name: the name of the metadata object (workflows-logging).
  • Enabled: (checked).
  • Pipeline executed to capture logging: select or create the pipeline to process the logging information for this Pipeline Log (${PROJECT_HOME}/hop/logging/workflows-logging.hpl).

TIP: You should select or create the pipeline to be used for logging the activity. In this case, the Workflow Logging transform is used but let’s do it as the second step.

  • Execute at the start of the pipeline?: (checked).
  • Execute at the end of the pipeline?: (checked).
  • Execute periodically during execution?: (unchecked).

Save the configuration.

Step 2: Create a new pipeline with the Workflow Logging transform

To create the pipeline you can go to the perspective area or by clicking on the New button in the New Workflow Log dialog. Then, choose a folder and a name for the pipeline.

A new pipeline is automatically created with a Workflow Logging transform connected to a Dummy transform (Save logging here).

Apache Hop - Workflow Log - default pipeline

Now it’s time to configure the Workflow Logging transform. This configuration is very simple, open the transform and set your values as in the following example:

Apache Hop - Workflow Log input transform name
  • Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (log).
  • Also log transform: selected by default.

Step 3: Add and configure a Table output transform

The Table Output transform allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This transform provides configuration options for the target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.

TIP: In this example, we are going to use a relational database connection to log but you can also use output files. In case you decide to use a database connection, check the installation and availability as a pre-requirement.

Add a Table Output transform by clicking anywhere in the pipeline canvas, then Search 'table output' -> Table Output.

Apache Hop - Workflow Log pipeline with table output

Now it’s time to configure the Table Output transform. Open the transform and set your values as in the following example:

workflow-logging-table-output
  • Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (workflows logging).
  • Connection: The database connection to which data will be written (logging-connection). The connection was configured by using the logging-connection.json environment file that contains the variables:

Apache Hop - Workflow Log - database connection

  • Target table: The name of the table to which data will be written (workflows-logging).
  • Click on the SQL option to generate the SQL to create the output table automatically:
    Apache Hop - Workflow Log - DDL statement
  • Execute the SQL statements. In this simple scenario, we'll execute the SQL directly. In real-life projects, consider managing your DDL in version control and through tools like Liquibase or Flyway
    Apache Hop -  Workflow Log - DDL statement executed
  • Open the created table to see all the logging fields:

Apache Hop - Workflow Log - Output table layout

  • Close and save the transform.

Step 4: Run a workflow and check the logs

Finally, run a workflow by clicking on the Run -> Launch option. The Workflow Log pipeline will be executed by any workflow you'll run.
In this case, we use a basic workflow that executes a pipeline, both are included in the 5-minutes repository.Apache Hop - Workflow Log - run workflow

The executed pipeline (generate-rows.hpl) generates a constant and writes the 1000 rows to a CSV file:

Apache Hop - Workflow Log - run pipeline

The data of the workflow execution will be recorded in the workflows-logging table.

Apache Hop - Run Workflow Log
 
run-workflow-metrics
 

Check the data in the table.

Apache Hop - Workflow Log - Output

You can find the samples in 5-minutes-to github repository.

Want to find out more? Download our free Hop fact sheet now!

Download Now

Blog comments