Data comes in a large variety of sources, from different sources and often in different quality levels. To solidly retrieve results from data, a lot of preparatory work is required.
Not necessarily related to a project's goal or the desired solution (data warehousing, big data analytics, machine learning), a general assumption is that 80% of the time is spent on data preparation. Data engineering is a broad term that covers areas like data acquisition, linking data sets, data cleaning and the actual loading of data into the desired format or target platform.
To be efficient at data preparation, having the right tools for the task is crucial. Kettle (also known as Pentaho Data Integration or PDI) is an open source data integration platform with over 15 years of history.
Kettle allows to visually develop data streams or pipelines. After the initial development, the Kettle code is managed as software, including version control, testing, CI/CD etc.
Additionally, visual development allows developers, data engineers and data scientists to focus on what needs to be done, not on how to prepare the data.
Kettle supports a large number of data formats, is able to talk to every significant data platform in the market, and has extensive options to build scalable and extendable solutions.
know.bi has been involved in the development of Kettle from a very early stage. We know the platform inside out and can maximize your return on investment. Apart from standard services, we can provide help in tailoring the Kettle platform to your needs.
To make sure we can continue to serve our customers the best we can, we're heavily involved in Project Hop, which started from the Kettle code base, but will focus on innovative data engineering. Read more about Project Hop and our motivation to be involved in the project.