On May, 16th 2018, Hitachi Vantara released Pentaho 8.1 Although this is a minor follow-up release to 8.0 as far as version numbers go, but nevertheless a lot of new exciting features and improvements have been added.
Four important steps have been added or replaced in the 'Streaming' category. There are now consumer and producer steps for JMS and MQTT.
Pentaho Data Integration now has the ability to stop the input steps in a transformation. Previously, when you stopped a transformation or an abort was triggered, all steps would stop simultaneously. It is now possible to stop the input and process all remaining data currently in the transformation, which means you can safely stop a transformation processing streaming data.
Both changes fit in Hitachi Vantara's strong IoT focus.
First up, public clouds! Pentaho Data Integration now has a Virtual File System improvement that allows you to directly access files stored in Google drive.
Another addition for the Google Cloud Platform (GCP) is a BigQuery loader. This step allows you to upload large amounts of data in bulk to BigQuery. This step supports CSV, Avro and JSON file formats.
Lastly, the connections menu now supports BigQuery connections in both PDI and the Business Analytics server.
There are some new features for the AWS platform as well.
First of all, the S3 CSV input and S3 output step have been revised.
Another addition for AWS is that PDI is now able to assume IAM role permissions. This means that you no longer have to provide a secret key and access token in every step, which greatly increases security and flexibility, and is more in line with AWS security scenarios.
As with each of the latest releases, the list of steps that are supported in Spark through the Adaptive Execution Layer (or AEL) has grown once more.
The Group By step has been added, significantly increasing the number of scenarios where AEL and other distributed processing can be used.
Mappings (or sub-transformations), which bundle commonly used logic, are also supported from now on.
In addition to all this, you can now configure AEL to send event logging to the Spark History Server
A lot of work has gone into the Big Data steps. PDI now supports the ORC format, for which input and output steps were added.
A lot of the other Big Data steps have also been updated (new features and bug fixes) in this release: Avro, Parquet, HBase, Cassandra, Splunk and MongoDB.