With Jan Aertsen, last year's PCM blogger being sick, live coverage of the Pentaho Community Meeting for 2012 will be done by yours truly. The room is rather dark and the pictures are taken with my smartphone, so apologies for the poor quality...
Update 2012-10-01: added links to shared presentations and group photo.
After coffee and cake, Doug starts this year's community meeting with a round of introductions.
Will Gorman introduces the Pentaho development and test team at the Pentaho Ivory Towers in Orlando. After the introduction, he talks about the importance of big data, the open sourcing of the big data software components and the move to github.
After this overview, Will talks about the major enhancements in the Sugar release, with features like the new repository (JCR-based), data source management, REST APIs and a new scheduler. The publish password will be replaced by action based security, allowing users to publish content based on their role instead of the publish password. The Administration Console will be put to rest, and will be replaced by an administration perspective in the PUC (Pentaho User Console).
At the end of his session, Will mentions the new Pentaho Market Place, a collaboration between Pentaho and WebDetails, to be released in the fall of 2012.
Will ends his session with a Q&A, talking about among others repository import/export, migration automation.
Slawo starts by telling he's working on smoothening the Pentaho CE contribution process. He then promises to fix a PDI bug if anyone gets the correct number of 3D pie charts in his presentation.
Slawo uses the business user persona 'Linda' to give an overview of the BI stack (ETL, reporting, PME or Pentaho Metadata Editor, WAQR (or sublimely renamed to 'Wanker'), ...). From PME and WAQR, he jumps to Saiku and Saiku adhoc reporting as CE production-ready replacements for JPivot and Wanker respectively.
Matt starts by claiming he didn't want to write a presentation, then pulls out his PowerPoint presentation and bashes Apple and Dutch 'gastronomy' within one minute.
The current status of Kettle is:
Planned features for Kettle 5.0 are:
|Matt showing of the uebercool Kettle Metrics Gantt chart|
|Metadata in Kettle 5.0. Color effects are offered for free by christmas light effects at the ceiling.|
|PCMAM's interpretation of CCC: Coffee, Cake, Chat|
Edwin starts with a quick introduction of Dan Lindstedt's Data Vault: hubs, links and satellites.
He then continues to discuss how he manages a number of Data Vault projects, mainly at the St. Antonius Ziekenhuis at Utrecht, through Kettle. The design decisions are discussed, Edwin shows how metadata is handled through Excel files, and how hubs, links and satellites are handled based on the Excel metadata. Version management is performed in GIT.
Great stuff! Get Edwin's framework from Sourceforge.net here.
Update 2012-10-01: slides!
After the mutiny in the family Jens had to deal with after stealing his son's model trains last year, we had to do with a video of Jens's Kettle controlled model trains. Steel Wheels like never before, or as Doug stated, 'this already blew one sock of'. Jens continues with a video of a Kettle controlled model helicopter. But can it make coffee? If it can fly a helicopter, that can't be much of a challenge...
On to the serious stuff!
Integrate and embed Kettle into PostgreSQL. Jens uses Windows 7 (seriously?) 64bit, Java, PostgreSQL + PLJava + PDI to call Kettle from within PostgreSQL. There is still some work to be done (e.g. PostgreSQL java calls are single threaded, so needs to use the Kettle Single Threader), but this is nesat stuff! More information at kettle.bleuel.com.
|Cora and Yvonne preparing lunch while the bunch of us are growing sitzfleisch|
Mondrian 4 holds relatively few visible changes, but will make the life of schema developers far easier.
New in Mondrian 4, planned for beta release next week, are attributes, measure groups (groupings of measures on different levels of granularity etc), physical schema, internals improvements (performance, reliability). Because of the amount of changes, this is going to be a long beta.
While Paul is preparing his demo of Saiku on top of Mondrian 4, Luc mentions how each level in a hierarchy will be allowed to use as a stand alone object. This will offer a lot more flexibility, and -as Pedro points out- will allow a year and a month level to be used on different axes.
|Vampire Luc drinks blood after dusk|
Mondrian 4 will contain:
Mondrian 4 is an omelette, so existing stuff had to be broken:
Downloads are available from Pentaho CI and will be pushed to Sourceforge in the course of next week. Test, file bugs, contribute if you want to speed this up!
Coming up: the Mondrian book, eta May 2013!
The road ahead:
Update 2012-10-01: slides!
Luc explains why and how Mondrian was scaled to run on top of 140 petabytes (compared to 140 years of HD video). Apart from the amount of data, security (through programmatic roles) and scalability turned out to be the main challenges.
In scalability, specific topics that needed to be covered were caching, synchronization without locks and blocks, memory rollup, indexing and aggregation.
|Vampire Luc, the second coming|
New features in Saiku 2.4 will be mainly a switch to the Apache license, an updated Excel export (contributed by Sergio Ramazzina, (@sramazzina)) with a summary sheet and an explain plan.
Fun stuff that Paul has been working on are sparklines, heat grids, subtotals, parameters, new visualizations, and drilling.
Almost as a 'One more thing', Paul mentioned and showed crosstabs in Saiku adhoc. Way cool!
|Look through the colored bands to see the sparklines (right) in Saiku|
Update 2012-10-01: slides!
Julian shows how OptiQ allows you to query data through SQL from big data sources, from 2 or more data sources, .
"OptiQ does a lot of database-like stuff, but it is not a database."
OptiQ is a really, really smart JDBC driver, a framework and a data source management system.
Thomas only brough 1 slide, and shifts to demoing PRD crosstabs immediately.
Rendering a crosstab takes a (whoooole) lot of time and modifying the layout is still a very tedious -or as Thomas calls it- "developerish" task. This functionality is -imho- not ready for prime time, but it definitely is a step in the right direction. The Big Release is planned for the Sugar release, spring 2013.
Roland gives us an update about his xmla4js project.
"We had to port HttpRequest to Node.js, because it didn't exist." O, it's just that...
xmla4Js allows to create a thin client through REST, without having to deal with XML/A directly.
Roland goes on to show xmla4Js as a browser XML/A command line tool to work on XML/A directly, or as a query tool.
Xmla4Js is also available as a BI server plugin.
Download xmla4Js here.
OrgBox is a drag and drop ui to draw organization charts. Employees can be assigned to posts, files can be associated, KPIs can be identified for what-if scenarios etc. A tablet version of OrgBox is on the roadmap.
OrgBox is not open source (tssss....) and not really stable yet. The executable of OrgBox is available for free, but if an extension is requested, there is a cost involved (cough up and/or provide a customer reference).
A Mondrian schema can be ran on top of OrgBox data.
update 2012-10-17: presentation
CDM provides version control of dashbaords, synchronization of multiple dashboards and support for multitenancy. For example, CDM can detect what changes have been made to a dashboard, and apply those changes to other dashboards.
CDP writes data from the BI server to databases, parameterizes SQL and code, and provides hot swappability of code, which is demoed by Cees by hacking into one of the CDP files and showing the changes in a dashboard.
Next, Cees demonstrates version management tools in CDM (commit, diff, drop last commit, ...).
CDB aims to provide a central repository for your data sources, based on CDA.
In short, what CDB does is :
CDC is a Hazelcast implementation that allows to :
With CDC, a cluster of caching server can be put in place, to provide more caching memory than what can be provided by a single machine and/or to take the memory load away from the BA server.
WebDetails (Pedro Alves) - CDV - Community Data Validator
After showing the WebDetails timeline, Pedro tells that nothing annoys him more than a customer telling him that the data for their project is wrong, no matter what the reason is. This annoyance triggered the development of CDV.
After showing CDV, Pedro did a bit of freewheeling with upcoming CCC charts and other CTools work.
Jos van Dongen / Aly van Zalk - Antonius Intelligence update/meta data driven dashboards
The previous sessions have taken more time than expected, so Aly and Jos promise to do a St. Antonius presentation on steroids in 5 instead of 15 minutes.
Aly shows a number of dashboards that would have gotten an approving nod from Stephen Few. These dashboards allow to click through to the individual patient record.
Jos explains how the hospital uses a KPI generating framework that is totally metadata driven. A mangement interface was written in WaveMaker.
Jos mentions a contribution to PRD by Slawo that lets you set a color for a given data value. This was needed functionality because patients in ER get a triag. That color code needs to be represented in the chart, whether there is data for a given triage code or not.
After sending off everyone who is not interested in compiling java code, about 15 people were left in the room.
Slawo starts by explaining that this will be a hands on session, but not a full-fledged class.
The rest of this session involves writing and running java code.