Easy to use with the power to integrate all data

Pentaho Data Integration



Pentaho Data Integration (PDI), based on the Kettle open source project, is the world's most popular ETL platform.
PDI provides a visual development environment (Spoon) to design ETL (transformation) and orchestration (job) workflows. 


In 'traditional' ETL, data is onboarded from a variety of data sources like application databases and flat or office file formats and loaded to star schemas in a data warehouse. Pentaho Data Integration has all the required functionality to build these end to end ETL streams with ease. 

Big Data

In Big Data environments, Pentaho can be used to work with your Hadoop distribution. Through adaptive layers, developers can use PDI to develop data integration code without having to worry about the underlying Hadoop version or distribution. All of this is taken care of by the adaptive layer. 
By visually developing and orchestrating your Big Data engineering, Hadoop doesn't have to be hard.

With the new Adaptive Execution Layer, Pentaho Data Integration jobs and transformations that were built in Spoon can be executed against other runtimes like Apache Spark. 

Data Science

The Data Science Pack and other built in functionality empower data scientists to work with a variety of libraries and algorithms in R, Weka and Python.

Data Blending and data services

PDI is the only data integration platform that crosses the gap between traditional ETL, Big Data data science through data blending. 
With data services, PDI transformations can be accessed over standard SQL to unlock any combination of data sources in realtime. 

Extensible, embeddable, customizable

Pentaho Data Integration comes with hundreds of built-in steps to perform almost any task out of the box.
However, if there is a need to modify or extend the existing functionality, PDI comes with tens of different plugin types that allow PDI to be extended to perform any possible data task.


Pentaho Data Integration is used by hundreds of enterprise level customers around the globe. 
Know.bi has been involved with the Kettle project and Pentaho Data Integration from the very beginning, and is your got to partner to make your Pentaho (Data Integration) journey a success. 
We offer training, coaching, or can take care of your entire implementation for you. 
Contact us if you want to find out how we can help. 

Increase your Business Analytics flexibility and lower costs in the cloud

Amazon Web Services



With an expected combined revenue for 2017 of close to 250 billion USD, cloud computing is quickly taking over the enterprise. 

Instead of buying, configuring and maintaining all of your infrastructure on premise, cloud computing allows to use all the resources you need, but only pay for those resources when you need them. 
Without having to worry about your infrastructure, your organization can free up significant amounts of time and resources to focus on what really matters: your business!

Being a Business Intelligence/Analytics company, know.bi focuses on the analytics products in the AWS stack: 

  • Quicksight: an analytics solution that lives within and works with your AWS infrastructure. 
  • Redshift: a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools
  • Glue: (not yet available) a fully managed ETL service that makes it easy to move data between your data stores.

Apart from the AWS offerings, we can help you in moving your existing implementation to the cloud to optimize flexibility and reduce costs. 

Amazon Web Services (AWS) is the undisputed market leader. With solutions for storage (S3), computing (EC2) and (analytical) databases (RDS), AWS has everything organizations need to move their business analytics implementation to the cloud. 

AWS Standard Consulting Partner

Know.bi is a Standard AWS Consulting Partner. We have certified consultants and extensive experience in implementing analytical solutions in the AWS cloud. 
Contact us to find out more.



Hadoop, IoT

Big Data


Big Data 

Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.


Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework.

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed. This approach takes advantage of data locality nodes manipulating the data they have access to— to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.

know.bi is a Hortonworks partner with several Hortonworks Certified consultants. 

Internet of Things 

The Internet of Things (IoT) is the network of physical objects—devices, vehicles, buildings and other items which are embedded with electronics, software, sensors, and network connectivity, which enables these objects to collect and exchange data. The Internet of Things allows objects to be sensed and controlled remotely across existing network infrastructure, creating opportunities for more direct integration of the physical world into computer-based systems, and resulting in improved efficiency, accuracy and economic benefit; when IoT is augmented with sensors and actuators, the technology becomes an instance of the more general class of cyber-physical systems, which also encompasses technologies such as smart grids, smart homes, intelligent transportation and smart cities. Each thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure. Experts estimate that the IoT will consist of almost 50 billion objects by 2020.

Contact us to find out how we can help!

Relationships matter




All data is related. In traditional relational databases, the data is stored in the database, but the actual relations between the data need to be calculated at query time through joins
Graph databases like Neo4J, on the other hand, store data and the relations between data points directly in the database. By storing the relationships with the data, graph databases allow simple and fast retrieval of complex hierarchical structures that are difficult or impossible to model in relational systems.

Relations or graphs are at the core of how we deal with data. By treating relationships as first-class citizens, graph databases open an entire new world of possibilities for enterprise to build intelligent solutions. Although graph databases are best known for their use in social networks, there are many other use cases like path finding, recommendation engines etc.

Neo4J is the world's most popular and open source based graph database, and is available as a 100% open source community edition and as a commercial enterprise edition.
The enterprise edition provides among others unlimited graph size, clustering capabilities and additional security (e.g. LDAP, Kerberus) functionality.

Selection 051.png

Customers like AirBnb, Deutsch Bahn and Airbus use Neo4J to get insights into their data that wouldn't have been possible with relational databases. 
Are you interested to find out how Neo4J can help you to take your data to the next level?
Contact us to find out how we can help!

Get lightning fast access to your data!



Are you interested to find out how you can use Vertica to get valuable insights from your data?
Contact us to find out how we can help!


Relational databases have been doing a great job for the last half century to retrieve and store data in  CRUD (Create, Read, Update, Delete) operations.
However, in analytical scenarios, the relational, row based architecture doesn't work anymore. Analytical requests, which consist of a relatively limited number of huge select queries, simply don't work against the relational architecture, which was designed to handle large amount of CRUD operations typically found in application (OLTP) databases. 

Column oriented or analytical databases like Vertica were developed with analytical workloads in mind. The data is stored in a columnar (as opposed to row) oriented architecture and use compression to reduce physical disk operations. Because of this columnar architecture and compression, there no longer is a need for indexes, partitioning and other time consuming tasks. Performance comes by design. 

Vertica is the industry's most advanced analytical database. It can be used on premise, in the cloud or on Hadoop. Vertica can be scaled up (bigger machine) or scaled out (more machines) through clustering, so you'll always be able to query even the largest data sets in no time.

On top of the pure SQL analytical functions, Vertica included embedded R from the early releases.
With Vertica 8, machine learning is more accessible and intuitive to use than ever before, allowing data science directly from the database.

Vertica is used by Facebook on a (at the start) +/- 300 node, 6PB+ cluster ( case).
The 2012 Obama election campaign used Vertica to store and analyse voter behaviour ( case). 

Are you interested to find out how you can use Vertica to get valuable insights from your data? 
Contact us to find out how we can help!

customer cases

Pentaho Business Intelligence

Pentaho Suite




Pentaho is the only complete commercial open source business intelligence platform in the market, consisting of components for ETL, reporting, OLAP, dashboards, Big Data, data science and more. 

Pentaho is available as a 100% open source Community Edition and an Enterprise Edition. Apart from enhanced functionality for administration, deployment, visualization and more, the Enterprise Edition comes with professional support and training credits. The Enterprise Edition allows organizations to work with the flexibility of an open source architecture without the risk of working with pure open source software. 

The Pentaho Enterprise Edition uses a subscription model, without limits on the number of users, data sources or data volumes. This model allows organizations to have a clear view on the license cost without surprises or hidden costs.  

The overview below discussed the different components Pentaho provides in more detail

(Big) Data Integration

Pentaho Data Integration (PDI), based on the Kettle open source project, is the most popular ETL platform on the planet. 

Apart from standard ETL functionality like support for databases (relational and NoSQL), file formats. PDI is the only ETL platform that crosses the gap between traditional ETL and Big Data and data science. Through integration with other Pentaho components, PDI can be used as a report bursting engine, can use ETL processes to combine (blend) data from a variety of sources at query time and much more. 

Adaptive layers allow ETL developers to work with their Hadoop distribution and version, without having to worry about API changes in the underlying distribution. By visually developing and orchestrating your Big Data engineering, Hadoop doesn't have to be hard.  

With tens of plugin types, PDI can easily be extended to perform any possible data related task. 

Business Analytics

Pentaho OLAP, based on the Mondrian open source project, allows users to quickly analyze data in a web based drag and drop interface that provides lightning fast access to data through smart caching.


Pentaho Dashboards allows users to build self-service dashboards in a flexible drag and drop user interface. More advanced users can develop highly customizable dashboards through the CTools framework. 

Data Science

With Weka, developed by Pentaho, data scientists can choose from a variety of machine learning algorithms. Support for Spark, R and Python is included out of the box, but integration with other machine learning platforms can easily be done as well.  


Through its open source architecture, Pentaho is the perfect platform to seamlessly embed in your application. Content can be integrated and tailored to your needs in look and feel, security etc. 

Know.bi is a Certified Pentaho Partner and System Integrator. We have extensive experience in implementing Pentaho. 
We can help you in your Pentaho project through training, coaching, implementation or by extending Pentaho for you. 

Contact us to find out how we can help!