PCM17 - the tenth edition
After 9 years of absence, the annual Pentaho Community Meeting was back where it started in 2008: Mainz, Germany.
Same city, same venue, the only difference was the crowd: instead of the 30-something enthousiasts in 2008, there were now close to 300 registrations, of which over 200 showed up.
The weekend started with a hackathon on Friday evening, 30+ presentations on Saturday and a brunch on Sunday.
The keynote talks are covered below.
Talks in the technical and business use case rooms are covered in separate posts:
Read our overview of the talks in the Technical room
Read our overview of the talks in the Business roomKeynotes
All about Pentaho 8.0 - Pedro Alves
Pedro, Senior VP of Community, opened with an overview of what Pentaho 8.0 will hold.
Version 8.0, which will be available on Nov, 15th, will be a first release towards IoT implementations, where connections to the data will have to be done before ETL.
The main focus areas are
- data scaling, streaming:
- read streaming data in and out with Kafka
- stream processing with Spark
- Big Data security with Knox
- Processing resource optimizations:
- worker nodes to easily scale out
- enhancements in the AEL (Adaptive Execution Layer)
- Native support for Avro and Parquet
Other improvements are
- UI updates (new theme to match the Hitachi branding, repository explorer)
- extra supported databases for the OpsMarts
- updates (filters) to the data exploration tool that landed in PDI with 7.0.
What’s new in PDI 8.0 - Jens Bleuel
After Pedro, Jens Bleuel, Pentaho's Sr. Product Manager Data Integration and community member since day 1 to talk about the new PDI features that come with PDI.
Jens explained the concept of worker nodes, introduced with PDI (and BA) 8.0.
Worker nodes will perform more or less the same function as Carte servers currently do, but will be more scalable and flexible through container technology. Worker nodes will be available through Docker, but support for other technologies like Kubernetes can also be added for future releases.
Whereas processing on the AEL (Adaptive Execution Layer) or MapReduce scales out on data (distributes data across the cluster), worker nodes scale out on cluster size.
Jobs can be run with configuration options to run parallel loads on worker nodes, which, being containers, can be shut down after their (part of the) job is done to reclaim CPU and memory resources.
What’s brewing in the Pentaho Labs? - Matt Casters
Matt, PDI founder and Pentaho Chief Data Architect, doesn't need any introduction ;-)
Matt talked about a number of projects currently going on at Pentaho labs:
- PDI streaming plugin, where configuration can be done on number of records or amount of time to keep data in memory.
- PDI unit testing: this plugins allow unit testing of transformations, where steps can be excluded or skipped for testing
Introducing Pentaho on the Hitachi Vantara Community - Jill Ross
Jill Ross, Enterprise Community Manager for Hitachi Vantara, introduced the Hitachi Vantara Community portal to the crowd.
The key takeaway from the presentation is that all Pentaho community activity will now move to the Vantara Community.
The Pentaho forums are now end-of-life and, allthough they will remain, will become read-only soon.
Jill walked us through the portal, based on the Jive platform (?), where users can manage their profiles and exchange content in places/spaces (open) and groups (invitation only).
CERN's Business Computing Accelerated by Pentaho - Jan Janke
After Jill Ross, CERN's Jan Janke (Deputy Group Leader of Administrative Information Systems) took the stage.
Jan started with some next level number dropping to introduce CERN, which use the Large Hadron Collider to increase our understanding of how the universe works, and more specifically the first moments after the Big Bang. Some of facts: CERN has the world's
- biggest electrical force
- biggest mass
- emptiest vacuum
- coolest environment
When the LHC is active, it:
- produces 2PB/s (2 petabytes per second!!)
- which is filtered down to 10.5GB/s
- on 91.000 cores
- uses 30PB storage
CERN are full stack Pentaho users. They've built a decentralized administration system to work manage their multi-tenancy Mondrain schemas. This entire platform is based on the Pentaho REST api, and automatically modifies parts of the schemas to apply the different naming conventions in different environments. After publication, test reports are automatically created and executed.
The Pentaho scheduler was enhanced to allow scheduling of multiple reports of different types togheter and to allow previous executions and results of past runs to be saved.
Finally version control integration via GIT was implemented to enable change mangament and to keep clean and approved environment.