know.bi banners

APACHE HOP

Visually develop data pipelines to deliver high quality work fast, integrated with bleeding-edge technology

What Is Apache Hop

Apache Hop is an innovative open source, metadata-driven data engineering and data orchestration platform that lets you visually describe your data pipelines and workflows. Apache Hop is a low-code or even no-code platform: scripting and writing code are an option, not a necessity.

Pluggable runtimes allow you to design pipelines once and run them in the environment where they fit best: in the native engine for relatively small volumes of data, or on Apache Spark, Apache Flink or Google Dataflow through the Apache Beam runtimes. Design once, run anywhere. 

Git integration, unit testing, and execution information help you to manage your project throughout its entire life cycle. 

Apache Hop Is An Open Source Platform

We believe that the only way to build an innovative software platform currently is by relying on open standards and open-source software, leaving open source as the only viable option.

The global communities united around improving software solutions introduce new concepts and capabilities faster and more efficiently, getting full visibility into the code base, as well as all discussions about how the community develops features and addresses bugs.

open source gray

Our Community

Apache Hop is developed by an open and friendly community. Everybody is welcome to join the community and contribute to Apache Hop.

There are several ways to interact with the community and to contribute to Apache Hop including asking questions, filing bug reports, proposing new features, joining discussions on the mailing lists, contributing code or documentation, improving the website, or testing release candidates.

You can also share documentation, blog posts, and other content with the Apache Hop Community or go through our created content.

Data Preparation

Data comes from a wide variety of sources, often with widely varying quality. A lot of preparatory work is required to get results from data in a sound way.

Regardless of the goal of a project or the desired solution (data warehousing, big data analytics, machine learning), it is assumed that 80% of the time is spent on data preparation. Data preparation is a broad concept that can involve the collection, linking, cleaning, and writing of data.

website-hop-data-preparation

Visual Development

By merging & cleansing data from different data sources you will get better insights, changing data into useful information. Apache Hop enables non-technical people of all skill levels to be productive with data without the need to write code.

Data processes need to be easy to design, easy to test, easy to run, and easy to deploy. We believe that visually designing data processes greatly increases developer productivity.

With Apache Hop You Can Visually
  • Develop workflows and pipelines without writing a single line of code
  • Unlock the data sources
  • Clean and transform your data, and merge your data.
  • Obtain real-time insights based on consistent and correct information.
  • Turn data into useful information.

Although visually designed, all our work items can be managed like any other piece of software: version control, testing, CI/CD, documentation are all first-class citizens in the Apache Hop platform.

Metadata Driven

We know that metadata, properly managed, are powerful tools to make things happen. This concretization of the metadata concept is the basis of the Apache Hop metadata-driven functioning model.

Apache Hop implements a strict separation of data and metadata allows you to design data processes regardless of the data itself. Apache Hop is entirely metadata-driven. Every object type in Hop describes how data is read, manipulated or written, or how workflows and pipelines need to be orchestrated.

website-hop-pluggable-funnel-gray-orange

Metadata is wat drives Apache Hop internally as well. Hop uses a kernel architecture with a robust engine. Plugins add functionality to the engine through their own metadata.

Apache Hop manages metadata like relational database connections, run configurations, servers, git repositories, and so on.

The Apache Hop plugins can define their own metadata object types so depending on the installed plugins you can find extra types.

Runtime Agnostic

Design once, run anywhere. Be able to design a data process and run it on any engine you want. Apache Hop allows you to design a data pipeline and run it on your local laptop, a remote server, or on Apache Spark, Apache Flink, or Google Dataflow through Apache Beam. 

A Pipeline Run Configuration is a metadata object that decouples the design and execution phases of Apache Hop pipeline development. A pipeline is a definition of how data is processed, a run configuration defines where the pipeline is executed.

5

Runtime engines

Apache Hop comes supports a number of different runtime engines:
  • Beam DataFlow pipeline engine: runs pipelines on Google DataFlow over Apache Beam.
  • Beam Direct pipeline engine: runs pipelines on the direct Beam runner (mainly for testing purposes).
  • Beam Flink pipeline engine: this configuration runs pipelines on Apache Flink over Apache Beam
  • Beam Spark pipeline engine: runs pipelines on Apache Spark over Apache Beam.
  • Local pipeline engine: runs pipelines locally in the native Hop engine.
  • Remote pipeline engine: runs pipelines in the native Hop engine on a remote machine.

Why Pluggable?

Because a pluggable architecture translates into flexibility, extensibility and maintainability. Apache Hop is focused on increasing flexibility and all the components in the platform should be pluggable.

As a developer, this makes it easy to add new functionality.  As a system administrator, it gives you full control over the functionality you want to allow in your systems, as a data designer, it gives you full control to pick and choose the functionality you want to use. Being able to only include the plugins you need make Hop a perfect fit for your DevOps and CI/CD environments. 

website-hop-plugable-cloud

What Is Pluggable In Apache Hop?

Hop is built around an ecosystem of plugins, this gives the end-users and infrastructure team the ability to create a custom version of Hop tailored to the project or company needs.

The most important plugin types are the following 4:

  1. Database Plugins
  2. Workflow Action Plugins
  3. Pipeline Transform Plugins
  4. Miscellaneous Plugins for testing
 

Projects & Environments

Most developers who design and manage data processing on a daily basis work on a multitude of projects and modules.

Different sets of workflows and pipelines require management for at least development, acceptance, and production environments.

Every project or environment comes with its own set of variables and configurations for databases, file paths, etc.

website-hop-projects-environments

Driven By Metadata

In Apache Hop, metadata items are objects used for the implementation of the data integration processes.

For example, if you need to extract or load data from a relational database, the connection to that database will be a metadata object in Apache Hop. This way, you can share the connection you can use within your project whenever you need it.

Apache Hop allows developers to manage different projects and environments with their corresponding configurations and variables.

Tell me more about Hop