Imagine a couple of years ago. You are carefully exploring the market to make a tool selection for a data engineering platform. The platform you're looking for is innovative, extensible and customizable. You want it to be open source but commercially supported, either directly or through a local partner. Your platform of choice needs to have sufficient market adoption and needs to be carried by a significant community of developers and enthusiasts.
Imagine that was a couple of years ago and the platform you meticulously selected was Pentaho Data Integration (or PDI, or Kettle). It definitely was the right choice at the time: PDI had an active and vibrant community of users and developers, Hadoop was happily surfing its way to the top of the hype cycle and PDI integrated perfectly with it. You could visually develop, run and debug data processing on MapReduce, there even was support for a new upcoming Hadoop component called Apache Spark. The future looked bright!
Fast forward to today. Your organization now depends on the PDI project you’ve built. However, Pentaho now is a part of Hitachi Vantara, a large and hardware oriented organization. Even with the best of intentions, Hitachi doesn’t seem to know what to do with this open source data engineering platform they now own. Releases are sparse and without significant new functionality. New bugs appear faster than old ones are resolved. Even worse, the once vibrant community around PDI/Kettle has evaporated. What seemed to be the perfect choice back then doesn’t seem to be so perfect right now.
As always, things needed to get worse before they got better. About 1.5 years ago, in late 2019, Matt Casters (PDI/Kettle project founder and lead architect) and know.bi joined forces and created a PDI/Kettle fork: Project Hop. Being “just a fork” never was the long term goal: we aggressively started to clean up and re-architect the code base. Compatibility meant compromise, so we broke just about every point in the api there was to break.
Joined by PDI/Kettle community members and a growing community of new members, Project Hop started to take shape quickly after its inception.
After almost one year of development, the project moved so far away from the original code base it was no longer considered a fork. The code base was donated to the Apache Software Foundation, and has been known as Apache Hop (Incubating) since then. Check our previous post on why we think this has been a major leap forward for Hop.
Hop came a long way since the project started: the Gui was rewritten from scratch to be pluggable and web/cloud ready, there’s life cycle support with projects, environments an git integration. Docker is fully supported, Kubernetes support is making progress. Pipelines, designed in Hop Gui, can run on the native Hop engine (on local and remote configurations), but also on Apache Spark, Apache Flink and Google Dataflow through Apache Beam. Configuration and administration have been rewritten from scratch with unified, easy to use command line tools and lots, lots more!
PDI/Kettle and Hop are incompatible, so there’s no way of switching back and forth between the two platforms. However, given the shared history between PDI/Kettle and Apache Hop, there is a way forward.
Switching to Hop from PDI/Kettle offers a lot more than just a conversion from PDI jobs to Hop workflows and PDI transformations to Hop pipelines! Through the upgrade, your PDI project unlocks all of the added value in Hop: projects and environments, runtime configurations, a fast, single-click development UI, just to name a few.
In the upgrade process, you’ll receive a pre- and post-upgrade overview report as well as a detailed recommendations report. We’ll help you in cleaning up some of your PDI work to make use of the optimized, cleaner and lighter Hop ways of working, either through coaching or as a service.
Finally, we’ll teach you how to change your working habits from PDI/Kettle to Hop.
Imagine... you can keep your entire PDI project history and can start innovating again!