The Apache Hop (Incubating) project just released version 1.0, the first major release of the...
Apache Hop 2.0 released!
The Apache Hop PMC and community released Apache Hop 2.0.0 late last week. This is the second major release of the platform and the first major release after Hop graduated as a Top-Level ASF Project.
The Hop community is growing and continues to work hard. While the initial plan for this release was to upgrade to Java 11 as its main feature, almost three months and over 150 tickets later, the 2.0.0 release became a lot more.
Let's walk through the highlights in this release.
Upgrade to Java 11
Earlier Hop versions could already be used with Java 11 but were still developed and built with Java 8. Over 8 years after its initial release in 2014, Java 8 is reaching (or already is beyond) its end-of-life. Upgrading the entire Hop codebase to a new Java version is not a small feat, so this alone justified a new major release.
Apache Hop has been running in a separate branch on Java 11 for months, gradually fixing all issues and running all of the available unit and integration tests. Before this release, With almost half a year of active testing and development, Hop 2.0 is robust and reliable on Java 11.
Since code changes were unavoidable in the Java 11 upgrade, the Apache Hop team took the opportunity to make some breaking API changes in a never-ending quest to clean up, improve and simplify the codebase.
TIP: if you still need Java 8 for other applications, set the `HOP_JAVA_HOME` variable in your operating system or in one of the Hop startup scripts to point to the Java runtime you want to use with Hop.
Just like many other Apache Projects, Hop is increasingly popular in Asia. The Hop community welcomed a growing number of Asian members and contributions. One impressive piece of work that was contributed are improvements to the Hop Translator and Chinese translations. The entire Hop Gui is now available in Simplified Chinese (zh_CN).
New Transform Plugins
One of the main architectural goals at the start of Apache Hop in 2019 was to move all non-core functionality to plugins. Three years later, Hop contains over 400 for transforms, action and over 20 other plugin types.
Hop 2.0.0 adds no less than 5 transform plugins and one significantly updated transform to the platform.
Apache Avro File Output
Avro is a row-oriented remote procedure call and data serialization framework originally developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format (source).
The Apache Avro File Output transform allows Hop data engineers to write data to binary files or fields in the Avro Binary or JSON format.
This plugin comes in addition to the Avro File Input, Avro Encode and Avro Decode transforms already available in earlier Hop releases and was added to Apache Hop with the help of know.bi.
Apache Doris Bulk Loader
From the Apache Doris website: "Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With its distributed architecture, up to 10PB level datasets will be well-supported and easy to operate."
The Apache Doris Bulk Loader transform allows you to insert data into Apache Doris at high speed and volume, making it a faster way to load data than using the traditional database insert statements.
This new Apache Doris Bulk Loader transform was developed by the Apache Doris community and donated to Apache Hop. This shows the importance of the Apache community and the interaction and collaboration between Apache projects.
Drools transforms - Rules Accumulator and Rules Executor
From the Drools website: "Drools is a Business Rules Management System (BRMS) solution. It provides a core Business Rules Engine (BRE), a web authoring and rules management application (Drools Workbench), full runtime support for Decision Model and Notation (DMN) models at Conformance level 3, and an Eclipse IDE plugin for core development."
The Drools Accumulator transform collects incoming rows and executes them against a rule set. This may be useful to determine the answer to a question or otherwise analyze a dataset.
The Drools Rule Executor transform allows fields of incoming rows to be executed against a rule set. This may be useful to determine additional information or route rows onto another transform.
Formula is one of the transforms that couldn't make it to Apache Hop after the project forked from Kettle (Pentaho Data Integration). Even though a lot of people missed it and asked for it, this plugin couldn't be included in Apache Hop because the license (LGPL) is incompatible with the Apache APL2.0 license, and because the library on which it depends (Pentaho's LibFormula) is hugely outdated (last significant updates date back to early 2017 and are related to the acquisition of Pentaho by Hitachi rather than functionality).
The new Formula transform allows Hop data engineers to apply Excel-like formulas and functions on fields in a pipeline.
Since it is based on Apache POI, all major row-level functions are supported.
This transform was co-developed by Lean With Data and know.bi. Development was sponsored by BaselTech, who also did a great job of testing all of the functions in the transform. Contact us to find out how know.bi and our partner Lean With Data can help with your custom Apache Hop plugin or feature development.
INFO: the Date & Time functions in the example below use 2021-12-17 not just as an example date. This is the date Apache Hop graduated as a Top-Level Project.
The Dimension Lookup/Update came over from Kettle/PDI and has been in the code base for ages. The Dimension Lookup/Update is a very powerful transform to build powerful Slowly Changing Dimensions from a user interface, which often is a lot harder than it should be in other data integration platforms.
Over time, more functionality was added to the plugin, making the user interface overcrowded and hard to use with complex dimensions.
Even though this transform is far from new, it deserves an honorable mention because of the UI cleanup Sergio (Serasoft) did.
The dialog options have been cleaned up and have been split into 4 tabs: keys, fields, technical key and versioning.
There is no impact on the functionality of the plugin, so your existing pipelines will continue to work. There's no impact on the Kettle/PDI importing process either if you still need to upgrade your PDI/Kettle projects to Hop.
Apache Beam upgrade
Last but not least, with Hop 2.0.0 comes an upgraded Apache Beam plugin.
Apache Beam has been a very important plugin for Apache Hop since the very early Hop days. Beam actually is the reason why Hop waited to switch to Java 11. Once Beam was completely Java 11 ready, Hop followed quickly.
Apache Beam is an advanced unified programming model that allows you to implement batch and streaming data processing jobs that run on any execution engine. Popular execution engines are for example Apache Spark, Apache Flink or Google Cloud Platform Dataflow.
Community is an incredibly important aspect for all projects in the Apache Software Foundation. Community is what drives the entire Hop team and developers. No single organization determines the roadmap or functionality in Hop, all of this is done by a global community through discussions and votes.
Apache Hop has seen the community grow with every month and every release, and this continues to be the case. Since the 1.2.0 release just 3 months ago, over 200 people have joined the Hop community over the various channels.
Community always has been very important for us at know.bi (we organized 3 of the 11 PCMs or Pentaho Community Meetings).
As contributors, developers and PMC members at Apache Hop, our community engagement has only become stronger. We're excited to see what the future brings for Hop!
Apache Hop 2.0.0 is available for download at the Apache Hop download page.