orchestrate unit and integration tests with Apache Hop in 5 minutes

orchestrate unit and integration tests with Apache Hop in 5 minutes

In the two previous posts in this series of three, we looked at how to build and bypass and remove transforms in Apache Hop unit tests. In this third and final post, we'll take a close look at how your unit tests can start to really add value to your project by actually running your tests on a regular (daily) basis. 

You'll typically build unit tests to check the new functionalities in your next release and to verify that there are no regressions, bugs fixed need to remain fixed once and for all. 

As you know by now, a unit test is a combination of zero or more input sets and golden data sets along with a number of tweaks you can add to the pipelines prior to testing. Let's take a closer look at how you can run your tests on a regular basis. After all, testing only makes sense when you actually run your tests.

The steps we'll describe below will use Apache Hop to test your Apache Hop project, just like we do in the source code integration tests. That's right, we use Apache Hop to test Apache Hop.

Apache Hop uses Jenkins to run unit and integration tests, but since the unit tests are just Apache Hop workflows and pipelines, you can run your unit and integration tests in any CI/CD or even scheduling platform you use. 

Main components of a unit test

Hop uses the following concepts (metadata objects) to work with pipeline unit tests:

  • Dataset: A set of rows with a certain layout, stored in a CSV data set. When used as input we call it an input data set. When used to validate a transform’s output we call it a golden data set.
  • Unit test tweak: the ability to remove or bypass a transform during a test
  • Unit test: The combination of input data sets, golden data sets, tweaks, and a pipeline.

You can have 0, 1, or more input or golden data sets defined in a unit test.

You can have multiple unit tests defined per pipeline.

The default dataset folder can be specified in the project dialog. Check the 'Data Sets CSV Folder (HOP_DATASETS_FOLDER)'. By default, the value for the ${HOP_DATASETS_FOLDER} variable is set to ${PROJECT_HOME}/datasets.

Unit test in runtime

When a pipeline is executed in Hop GUI and a unit test is selected the following happens:

  • All transforms marked with an input data set are replaced with an Injector transform
  • All transforms marked with a golden data set are replaced with a dummy transform (does nothing).
  • All transforms marked with a "Bypass" tweak are replaced with a dummy.
  • All transforms marked with a "Remove" tweak are removed

These operations take place on a copy of the pipeline, in memory only unless you specify a hpl file location in the unit test dialog.

After execution, transform output is validated against golden data and logged. In case of errors in the test, a dialog will pop up when running in Hop Gui.

Execute in workflows

There is a workflow action called "Run Pipeline Unit Tests" which can execute all defined unit tests of a certain type. The output of the transform can be stored in any format or location with regular Hop transforms. Execute the workflow through hop-run, in a scheduler, or through a CI/CD pipeline in e.g. Jenkins.

Step 1: Add and configure the Run Pipeline Unit Test in a new workflow

The Run Pipeline Tests action runs a series of pipeline unit tests. The action is successful if all tests run without error. Errors are logged.

After creating your workflow (run-pipeline-unit.hwf) add a Run Pipeline Unit Test action. Click anywhere in the workflow canvas, then Search 'run' -> Run Pipeline Unit Test.

ut-workflow1Now it’s time to configure the Run Pipeline Unit Test action.

ut-workflow2Open the action and set your values as in the following example:

ut-workflow3

  • Action name: choose a name for your action, just remember that the name of the action should be unique in your workflow (run-pipeline-unit-test).
  • Use the Get test names option in this action to specify which of the available unit tests you want to include in your workflow In this case, we will use the unit test in the following pipeline that generates a simple calculation from a generated number column and writes the results to a CSV file. Check the post 5 minutes to configure Unit tests in Apache Hoput-workflow4
  • Connect a Succes action to the run-pipeline-unit-test action.
    ut-workflow5

Step 2: Add and configure the Write to log

The Write To Log action writes a specific string to the Hop logging system. This action is similar to the Write To Log transform.

ut-workflow6
Add a Write to log action to the workflow and configure as follows:

ut-workflow7

  • Action name: choose a name for your action, just remember that the name of the action should be unique in your workflow (test-failed).
  • Log level: The logging level to use (Error)
  • Log message: The log message to write to the log (test failed)

Step 3: Run the workflowut-workflow8

If the pipeline runs with all tests passed, you’ll receive a success notification in the logs:

2022/12/10 15:21:56 - generate-rows - Unit test 'generate-rows-unit' passed successfully
2022/12/10 15:21:56 - generate-rows - ----------------------------------------------
2022/12/10 15:21:56 - generate-rows - write-to-csv - data-set-calc : Test passed successfully against golden data set
2022/12/10 15:21:56 - generate-rows - Unit test was successfully executed.
2022/12/10 15:21:56 - generate-rows - ----------------------------------------------
2022/12/10 15:21:56 - generate-rows - Pipeline duration : 0.211 seconds [ 0.211" ]
2022/12/10 15:21:57 - generate-rows - Execution finished on a local pipeline engine with run configuration 'local'
2022/12/10 15:21:57 - run-pipeline-unit - Starting action [Success]

Now try modifying the transform to make the test fail. For example, by removing the Bypass option:

ut-workflow9Execute the workflow after modifying the pipeline:

ut-workflow10

Notice the error message in the logs:

2022/12/10 15:39:41 - generate-rows - Unit test 'generate-rows-unit' failed, 1 errors detected, 1 comments to report.
2022/12/10 15:39:41 - generate-rows - ----------------------------------------------
2022/12/10 15:39:41 - generate-rows - write-to-csv - data-set-calc : Incorrect number of rows received from transform, golden data set 'data-set-calc' has 1000 rows in it and we received 1
2022/12/10 15:39:41 - generate-rows - ----------------------------------------------
2022/12/10 15:39:41 - generate-rows - Pipeline duration : 0.286 seconds [ 0.286" ]
2022/12/10 15:39:41 - generate-rows - Execution finished on a local pipeline engine with run configuration 'local'
2022/12/10 15:39:41 - run-pipeline-unit-test - ERROR: Error in validating test data set 'data-set-calc : Incorrect number of rows received from transform, golden data set 'data-set-calc' has 1000 rows in it and we received 1
2022/12/10 15:39:41 - run-pipeline-unit-test - ERROR: There were test result evaluation errors in pipeline unit test 'generate-rows-unit
2022/12/10 15:39:41 - run-pipeline-unit - Starting action [test-failed]
2022/12/10 15:39:41 - - ERROR: test failed

Your unit testing pipelines can be scheduled or, even better, integrated with CI/CD platforms like Jenkins, Github Actions etc. Needless to say, any test failures should be fixed as soon as possible...

You now have all the required tools to build and orchestrate a unit testing framework to improve the quality of your Apache Hop projects. 

Let us know in the comments if you found this useful and how we can continue to improve unit testing in Apache Hop to make your data projects even more successful. 

Want to find out more? Download our free Hop fact sheet now!

Download Now

Subscribe to the know.bi blog

Blog comments