PCM13 - Pentaho Community Meeting 2013

Jens Bleuel - Call Kettle from PostgreSQL

At PCM12, Jens showed how to run Kettle from withing PostgreSQL through PL/Java. That was not geeky enough, so Jens secretly started his own secret project CCI, Community Confidential Information.

All of a sudden, Jens started talking about his Babel Fish project, that translates from French to English, trained with Matt's voice.

Believe it or not, but in a demo, the transformation actually translated random words from French to English! Useless, but mpressive nevertheless!!

Doug Moran - Adding new data sources to reporting

Doug elaborates on the new Kettle data source, which was already mentioned by Thomas. A transformation can be used as a data source, where XUL sits in between the PRD (Swings) and Kettle (SWT) user interfaces.

The steps needed to added a new data source are:

  • a transformation with input and output stepts (names matter). Must be saved in resources/datasource dir
  • Write a xul description
  • subclass BaseStepGenericXulDialog
  • fill in lifecycle and bindings code

Maria Roldan - Advanced techniques for DW maintenance with PDI

Maria, author of a number of books on Kettle, demolishes the myth that WebDetails is about CTools and dashboards by adding ETL development to the list. After all, all of the data that is visualized in a dashboard needs to be loaded into a data warehouse before it can be shown.

Maria then walks into the main topic of her presentation: DWH techniques.

Her first point: deliver good, useful data. A transformation that completes correctly doesn't necessarily means the loaded data is correct.

Second point: maintainability. With a date range processing example, Maria shows how to run recurring loads by paramaterization and looping in versions up to 4. Starting from PDI 5.0, this process can be simplified by using the Job Executor transformation step.

Finally, Maria announces updates to here existing books and a new one.

Matt Casters - Kettle 5 plugins galore

Matt -Mr. no powerpoint presentations- Casters talks about the development path to 5.0. His first topic is the meta store, a general store for all kinds of metadata.

He then jumps to extension points, which are a means to to add plugins to the Kettle ecosystem for execution at specific points in the Kettle code. The rework of the plugin architecture for the 5.0 release, of which the extension points are one, brings the number of plugin types in Kettle to 18.

Matt gives a crash overview of the instant preview, transformation metrics, usability improvements and more.

In demo mode, Matt shows how jobs can now be called from within transformations, which greatly simplifies looping over jobs. Small but meaningful improvement like parameterizing the number of copies of a step to start will make an ETL developer's life a lot easier.


Caio Souza - Brazilian Pentaho Community

Caio, turning 30 today (congratz!!), starts by showing some stats about Brazil. It turns out Brazil has 293 million potential Pentaho users, 200 of who showed up on the 2013 Brazilian Community Event (BCM13?). In his animated talk, he gives a couple of other statistics about the Brazilian community, but above all invites the European community to come to Brazil for BCM14.

Caio goes on to make a warm request to contribute, even through 'simple' contributions like translations. +1 for that!

Finally, Caio highlights Saiku Chart Plus, his contribution to Saiku, a mapping component for Saiku.

Massimo Bonometto & Luco Pazzaglia - BTable, a drill anywhere component for CDE

Massimo and Luca present BTable, and extension of the standard CTools table component. This extension adds drilling and sorting functionality to Mondrian based CTools tables. Dimension levels can be added, pivoted in a dashboard table dynamically. Subtotals, parameters etc can be added to a table, but can also be unlinked from a dashboard. Right-click on a measure opens a popup where you can select additional dimensions. When one of these dimensions is selected, a new dashboard tab is opened that drills into the selected dimension.

Francesco Corti - CMIS Input and Alfresco reporting integration

Francesco explains why he needed to be able to read data from Alfresco and other CMIS sytems. He created a PDI plugin that does exactly that.

There also is a AAAR dashboard that shows auditing KPIs with Pentaho technology integrated in Alfresco.

Although more of an implementation story than a component development like the other presentations, it was really nice to have a real world story!

Thomas Morgner - Reporting SDK and chart magic

Thomas, who is not talking about crosstabs for the first time on a PCM, starts to talk about a couple of usability improvements: enhanced error reporting, a new formula dialog, and an improved color selector

A couple of new data sources have been added to PRD. There is a printer list, but most importantly, the Kettle data source has been vastly improved.

Stylesheets can now be imported, so CSS classes and ids can now be used in a report. Additional effort has been put into windowing and orphans.

Against what was promised, Thomas mentions crosstabs anyway. They're not complete. Yet. Come back next year. Or use Saiku Reporting.

On to the SDK, which contains extension points, formulas, data sources, elements and more, with modules to create and run reports.

PRD now supports CGG charts, which is cool, and should be complete enough in the more or less near future to support the 2012 12 days of viz charts. Apart from that, the charting will be refactored to run charts either from its own query, or from the main query.

In a (very) quick demo, Thomas shows the CGG integration.

Pedro Vale - Community File Repository (CFR)

WebDetails' Pedro Vale (more than halfway through the day, and still no naming convention violations), explains that they needed a parallel solution to the Pentaho solution repository because Pentaho's repository has some problems with large files, or because customers wanted to use their own document stores.

CFR support two repository types out of the box: file system repositories and ECMs. CFR allows to set permissions over file and folder, and has an API to access it. CFR is available on GitHub, but is not yet part of the CTools release.

Pedro then quickly shows how a file browser component, reading from a CFR repository, can be added to any dashboard through CDE. CFR also contains a basic management interface.

David Duque & Ricardo Pires - Pentaho Solution Builder

Ricardo and David present their Pentaho configuration and deployment tool. Every serious project needs a lot of structured configuration, deployment, tests etc. At a certain point, manually maintaining all of this becomes a daunting, if not impossible task. XpandIT started using an Ant-based tool, but switched to PDI, using Ivy as a component repository, to have a more user friendly and maintainable solution. The tool still needs some work before it is production ready, but it sure is a nice start.

Pedro Alves - Sparkl, a Pentaho app builder

Sparkl is what Pedro calls 'the biggest change after CDE'. Sparkl, surprisingly not a CTool (by name), is a framework to build apps on top of the Pentaho platform.

Pedro starts by talking about and demoing an addition to dashboards that allows dashboards to provide operational capacities. The demo shows the BigWireless demo, with an option to directly order products from within the dashboard. This is not intended as a replacement for operational systems, nor will this become a major part of most dashboards, but it definitely is a nice possibility to have.

The most important part of Sparkl, however, is the application builder. These applications are a combination of PDI and CTools, and allow users to create Pentaho plugins without the need for any coding. With PDI to do the heavy data lifting and CTools to create dashboards as the front end, applications can be developed really quickly. Dashboards can be downloaded as a zipfile for easy transfer to other Pentaho servers.

As a proof of what Sparkl can do, Pedro and the WebDetails guys created and used a Sparkl application to pick two winners of Mondrian In Action hard copies. Very cool stuff!

12:50 - Tour of the palace + lunch break

100 geeks climbing up a hill.

12:20 - Julian Hyde - Optiq

After a quick introduction of his book 'Mondrian In Action', Julian talks about Optiq, a framework that creates a data federated architecture, so data from a variety of data can be used for analytics. Data is all over the place, in different locations, formats, workloads, with different data and query latency. Julian continues to explain how we still expect the advantages of using databases, but have grown beyond using databases (exclusively).

At its core, Optiq holds a query optimizer, with a JDBC server and SQL parser and validator as optional components. On top of that, third party plugins can be added. Optiq doesn't store any data, and even gets its metadata through APIs. Julian goes to a demo of how optiq can SQL can be used through sqlline.

Optiq already supports CSV, JDBC, MongoDB, Splunk and linq4j, with more data sources (HBase, Spark, Cassandra and Mondrian) to be added. Embedded adapters are Cascading (lingual) and Apache drill.

Julian goes on to explain in another example how Splunk and a MySQL database can be joined together through SQL, and explains the expression tree used by Optiq. He continues to explain how a little cheating is needed to provide answers to analytical questions over massive amounts of data fast, using materialized views and smart cache maintenance. All of this is exposed to the application on top of Optiq that thinks it's talking to a single database.

12:10 - Luc Boudreau - OLAP4J Ecosystem Update

Luc jumps into new features for OLAP4J, with ports to PHP, Ruby, Javascript, attributes orientation, measure groups, parent-child hierarchies and more. OLAP4J now has a consolidated API, that allowed the BI server to provide generic OLAP4J support, and OLAP4J federation through XMLA. Demo time! After a quick demo, Luc makes a call to (plugin developers in) the audience to test the APIs and provide feedback.

11:50 - Paul Stoelberger - Saiku OLAP

Saiku Analytics 2.5 has been released recently. Recent changes include UI and performance enhancements, new MDX functions, new chart types and more. Paul switches to demo mode, and shows sparklines and spark bars and a couple of charts and shows how easy it is to embed Saiku content into your own application. Saiku Adhoc now runs on Pentaho 5.0 for a full 24 hours, and will be available for 5.0 CE when it is released. For Saiku 3, there will be more integration with dashboards, more visualizations and a new query model that will -among other functionality- allow to select hierarchies instead of dimensions.

11:30 - Marius Giepz - Saiku Reporting

Marius starts with a little history about his Saiku Reporting project, a web-based report builder on top of the Pentaho report engine, using Pentaho metadata models. He just spent a year working on a 2.0 release. Although the front end hasn't changed drastically, a lot has changed behind the scenes. Marius switched to the latest releases of reporting and CTools, and will only be available as a 5.0 plugin.

Saiku Reporting supports among others templates, an improved filtering interface, inline editing, a formula editor, a SQL viewer and much more. Reports are saved as an .srpt file, which is an extension of the default report format, but can be opened with the standard report viewer. Saiku Reporting even has an easy to use interface to create crosstabs, a first in Pentaho land!

Future plans include:

  • integration into Saiku OLAP
  • user defined groups and consolidations
  • import CDA data and SQL queries as data sources
  • charts
  • briefingbook

11:10 - Nelson Sousa - Mondrian and Roles, oh my!

Nelson starts with a little Pentaho 5.0 bashing, then continues to explain some of the problems he experienced using Mondrian in a scenario with a lot roles.

Nelson describes two customer cases where Mondrian roles had to be created for hundreds, even thousands of roles. He solved this problem by using PDI to automate the role generation process. With a number of kettle.properties variables, Nelson reads roles from a hibernate database, compares them to roles available in a Mondrian schema, and updates a text file accordingly. The contents of that file are pasted into the Mondrian schema, and can then be tested with another transformation.

11:50 - Break

10:40 - Pedro Teixeira - Pentaho 5.0 Overview

Pedro Teixeira -compliant to the WebDetails naming convention- gives an overview of what has changed in Pentaho 5.0. First of all, the BI server switched to a JCR (jackrabbit) based repository. More visibly, there has been a complete redesign of the Pentaho User Console. This includes among others new plugin interfaces, full screen configuration interfaces. The Administration Console has been rewritten as a perspective in the user console, including a new scheduler interface. For developers, there is now a REST api and a redesign in the user interface. In a quick demo of the 5.0 user console, Pedro shows how content can be opened and scheduled. He also shows a quick overview of the administration interface.

10:20 - Will Gorman - Sugar: An Architectural Overview (Daniel Einspanjer)

Will talks about updates to the plugin architecture, an increased jquery based thin client ui. He goes on to talk about market place metadata, and how that can be used to get your project on the market place. He then passes the floor to Daniel Einspanjer, a long time community member who recently joined Pentaho.

Daniel talks about, Kettle Storm, a Pentaho Labs project he has been working on recently. Kettle Storm allows to run Kettle transformations on a Storm cluster. In a quick demo, Daniel shows how a transformation can be pushed to a Storm cluster to be executed there. The

10:00 - Jake Cornelius - Sneak Peak into Pentaho Roadmap

Jake starts by stressing Pentaho's reconnection with the community. These guys seem to be serious about it. Jake then picks up Doug's key takeaways, and elaborates on the importance of usability. Part of that effort is the redesigned and simplified User Console for 5.0. This includes the switch from a database to a content management (Jackrabbit) based repository, which opens up a world of possibilities for all kinds of content. Another change in the 5.0 release is the development of an administration perspective as a replacement for the Administration/Enterprise console. The second takeaway is data blending, on-the-fly data combination. The technology powering this is PDI (is there anything this thing can't do?), serving the results of a transformation through a JDBC driver. Jake then highlights a couple of improvements in PDI: transactional jobs, checkpoints, and load balancing to just name a few. For reporting and analysis, MongoDB has been opened up as a data source . Jake points out how Pentaho is extremely fit as a platform to build extensions on, or to integrate into other applications, and re-highlights the importance of the Labs for Pentaho. Finally, Jake gives a quick overview of the Pentaho roadmap, with a key focus on disclosing more and more data sources to the user.

09:40 - Doug Johnson - Pentaho Company Overview

Doug talks about how Pentaho fits into the BI market place. Three corner stones in Pentaho's strategy are cost reduction of data warehousing, big data and business optimization. Key takeaways for Pentaho are simplified analytics with a modern interface, blended big data and enterprise ready data integration with simplified integration. Pentaho's business momentum is growing both in revenue over the business analytics, big data and embedded analytics. With this pure revenue growth, Pentaho was able to keep a 95% customer satisfaction. After an overview of these results and a quick presentation of the Pentaho management team, doug mentions the good 5.0 market reception. Doug then goes into a little detail about the WebDetails acquisition, and stresses how that relates to Pentaho's renewed attention to the Community, which is a welcome change in direction. Doug's last point is the introduction of Pentaho Labs, Pentaho's think tank for innovation, with predictive analytics as a first project.

9:20 - Pedro Alves - opening PCM13

Pedro talks about the Community (CE) and the Enterprise Editions (EE). The Enterprise Edition is a stable and reliably supported environment for paying customers, whereas the Community Edition works as -among others- a developer and lab environment, where the community can not only download and use Pentaho software, but can also develop en test new functionality.

One of the cornerstones in the CE is the marketplace, where people can download plugins to the platform. Pedro sees the Pentaho BI platform as an application platform like iOS or Android, where people can create and upload their applications, and thus make Pentaho go as viral as possible. Pentaho 5.0 CE will be available somewhere during October 2013, but can't be released before the CTools live happily on the 5.0 server.

Pedro continues to give a quick overview of the new 5.0 landing page, the documentation and support channels (IRC among others, where each of the Pentaho architects have 'office hours').

8:50 - crowd moving in