Apache Hop 2.9.0 is available

Jun 4, 2024 11:00:00 AM
Bart Maertens

data architect and developer with over 20 years of experience in data engineering and analytics. Founder and lead of the know.bi expert team, Apache Hop co-founder and PMC member.

Another two months after the 2.8.0 release, the Apache Hop community is proud to announce the availability of Apache Hop 2.9.0.

Just like in previous releases, our focus has been on hardening the Apache Hop platform while adding new functionality. This release contains two months of work by 9 contributors (2 of which are new) on over 80 tickets.

Let's walk through what Apache Hop 2.9.0 brings for you.

Static Schema

Apache Hop offers a lot of functionality to read data from files through the CSV Input, Text File Input, JSON Input and other transforms. All of these transforms have a "Get Fields" button to let you read the file layout. Sometimes, however, you know a file's layout in advance, and don't want to scan the first x rows of data to make a (smart) guess.This is where the static schema comes into play.

A Static Schema definition lets you specify a file layout that can be used in a CSV Input, Text File Input and other transforms.

After creating a Static Schema definition in the metadata perspective, you can now use that schema to specify a file layout in one of the supported transforms.

static-schema-metadata-type

static-schema-use-in-text-file-input

A new Schema Mapping transform lets you map your pipeline stream layout to a static schema definition. Fields not in your pipeline stream but specified in a static schema definition will be added at the right position in your stream with a blank value.

static-schema-mapping

The static schema definition metadata type and schema mapping transform were developed by know.bi in cooperation with our partner Serasoft. The development of the static schema functionality was sponsored by one of our customers who is migrating from Talend to Apache Hop and is building an entire new Apache Hop based data engineering and data integration platform. We'll have more exciting news on that customer case soon.

CrateDB

Another new addition to Apache Hop 2.9.0 is CrateDB. CrateDB is an enterprise database for time series, documents, and vectors.

CrateDB is based on PostgreSQL and works with the PostgreSQL JDBC driver and relational database transforms like Table Input, Table Output, Insert/Update and others.

Since CrateDB is built on top of PostgreSQL and offers additional functionality, Apache Hop 2.9.0 comes with a new CrateDB database dialect and bulk loader transform. The bulk loader transform lets you write data to CrateDB trough the COPY command or the REST endpoint.

Other improvement

Apache Beam has been upgraded to 2.56.0
The REST client transform now has a configurable timeout
The database join transform now supports caching
Azure blob storage improvements
Redshift Bulk Loader dialog was rewritten to align with the CrateDB bulk loader
Improvements in database schema/table listing widget
Lots of new and updated translations. Especially Peter Fabricius has been doing a tremendous job on the German translations in recent months.
Lots of new and updated documentation

Community

The Hop community continues to grow!

The overview below shows the community growth compared to the 2.8.0 release in March:

chat: 780 registered members (up from 729)
LinkedIn: 1.776 followers (up from 1.682)
Twitter/X: 947 followers (up from 915)
YouTube: 1.100 subscribers (up from 1.020)

hop-community-growth

Reach out if you want to find out more about Apache Hop, if you'd like to upgrade from PDI/Kettle or Talend, or if you'd like to discuss how we can help you build a successful data platform with Apache Hop.

data engineering, data integration, apache hop

Hop 1.1.0 - Apache Hop continues to move fast!

Apache Hop 1.1.0, the first Hop release as an Apache...

Apache Hop 2.9.0 is available

Static Schema

CrateDB

Other improvement

Community

Subscribe to the know.bi blog

Blog comments

Related posts

Apache Hop 2.1.0 is available

Apache Hop 2.4.0 is available!

Apache Hop 1.1.0 released!

Hop 1.1.0 - Apache Hop continues to move fast!