5 new features to look at in Pentaho 8.0


A look at what's new in Pentaho 8.0

On September, 19th 2017, Hitachi introduced Hitachi Vantara: A New Digital Company committed to solving the world's toughest business and societal challenges. The Pentaho software is a key part of this new company. On November, 16th 2017 Hitachi Vantara launched the new Pentaho 8.0 with real-time data processing to fast-track digital insights for enterprise customers. So what's this new release all about?

At first glance the Hitachi re-branding meets the eye. The user console and the applications are all adjusted to the Hitachi color scheme.

Pentaho 8.0 is rebranded to the Hitachi Vantara look and feel

Let's dive a little deeper into the technical enhancements.

AEL: Enhanced and Simplified

The Adaptive Execution Layer (AEL) is used to run transformations in different engines. The AEL translates the steps in your transformation to native operators in the engine you selected, for example Spark in a Hadoop cluster. This allows ETL code that was developed in PDI to run on Spark natively without modification. Compatibility is enabled for Spark libraries packaged with Cloudera, Hortonworks and Apache distributions. After Spark, support for other engines will be added in future releases.

Pentaho 8.0 includes Adaptive Execution Layer (AEL) enhancements

Kafka and Streaming Ingestion in PDI

Pentaho added Kafka streaming and data publishing to PDI with a number of Kafka steps. Kafka was already available via input and output plugins in the marketplace, Pentaho now added steps of their own to PDI. With the 'Get records from stream' step you can connect to a streaming data source such as Kafka to process the records. This enables real-time processing, monitoring and aggregation. Other streaming sources will be added in the feature.

Pentaho 8.0 includes support for Kafka streaming

Big Data Security: Named Clusters and Knox Support

Pentaho added support for the Apache Knox Gateway that simplifies Hadoop security management. This enhancement provides a secure, single point of access to Hadoop components on a cluster. Apache Knox is a gateway security tool that provides perimeter security for the Hortonworks Distribution of Hadoop services.

Pentaho 8.0 includes support for Apache Knox

Worker Nodes

The biggest change in Pentaho 8 are the addition of worker nodes. Worker nodes can dynamically distribute and scale work items across multiple nodes like: PDI jobs and transformations & report executions.

The use of worker nodes result in:

  • Run PDI workloads at scale
  • Coordinating and monitoring the items sent to the worker nodes.

The worker nodes, based on Lumada technology, contain two parts:

  • the container framework based on Docker (the company driving the container movement)
  • the Orchestration Framework based on Mesos (an open-source project to manage computer clusters) and Marathon (a container orchestration platform for Mesos)

Pentaho 8.0 adds worker nodes to dynamically scale out large workloads

Filters to inspect Your Data

Filters can now be added to the visualizations of your data within PDI:

  • Drill Down
  • Keep or Exclude Selected data
  • Filters panel

Pentaho 8.0 adds support for filters in data visualization within PDI

Additional Big Data Formats

To extend the range of supported Big Data formats Pentaho added Avro and Parquet data support.
Avro is an open source data format that provides data serialization and data exchange services for Apache Hadoop. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem. Input/output transformation steps are provided to make the process of gathering raw data and moving data into the Hadoop ecosystem easier.
Both steps can be used in transformations running on the Kettle engine or the spark engine via AEL

 Pentaho 8.0 comes with support for Avro and Parquet


  Talk to an expert!