Graph Connect 2018

Sep 21, 2018 3:30:40 PM / by Bart Maertens posted in data science, neo4j, graph databases, graphconnect


GraphConnect, the annual Neo4J event, was hosted in New York yesterday (2018-09-20). About 800 people gathered near Times Square for a day of talks about graphs and real-life relationship building (aka networking). 

Read More

Fraud Detection with Graphs

Jul 17, 2018 10:00:00 AM / by Shila Casteels posted in neo4j, fraud detection, graph databases, graph analytics


Catching the "bad guys" using graphs.

Figure 1: Gartner layered model for fraud detection
Read More

Amazon SageMaker at a glance

Jul 10, 2018 10:00:00 AM / by Yannick Mols posted in data science, aws, python


Amazon SageMaker is a "fully managed machine learning service". This means it provisions an environment for data scientists and developers without them needing to worry about managing servers.

Read More

Google Drive in Pentaho Data Integration

Jun 11, 2018 10:00:00 AM / by Hans Van Akelyen posted in pentaho data integration, cloud, Google Cloud Platform


One of the new features in Pentaho Data Integration 8.1 is the ability to directly connect to Google Drive. PDI uses the Virtual File System (VFS) which allows you to connect to a variety of file systems in a transparent way.

Read More

What's new in Pentaho 8.1

Jun 5, 2018 10:00:00 AM / by Hans Van Akelyen posted in pentaho, cloud, Big Data, data integration, hitachivantara


On May, 16th 2018, Hitachi Vantara released Pentaho 8.1 Although this is a minor follow-up release to 8.0 as far as version numbers go, but nevertheless a lot of new exciting features and improvements have been added.

Read More

3 reasons to automate your analytics projects

May 29, 2018 10:00:00 AM / by Bart Maertens posted in aws, devops, iac, infrastructureascode, cloudformation, codedeploy, ansible, automation, chef, puppet


Automate everything!

Analytics projects are often treated as ad-hoc projects. Code and content are often managed in a version control system (git), but often without full release management. Deployment of infrastructure and releases are often done manually. In this post, we'll take a look at why it makes sense to manage your analytics projects as full-blown software development projects. 

Although a lot of the components usually are in place, analytics teams often are reluctant to go the extra mile and automate every aspect of the project life cycle. 

A first step is to automate infrastructure deployment, or to apply "Infrastructure as Code" (IaC). According to Wikipedia, "Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools". 

Briefly put, this means that all your installation and (environment and release specific) configuration should be run from code (scripts, templates) and managed as software development projects. Taking this beyond just infrastructure, the benefits of treating "Everything as Code", including deployment, testing etc, significantly outweigh the downsides. 

Let's look at a number of these benefits in more detail. 

Read More

Basic Machine Learning - Linear Regression

Apr 26, 2018 10:00:00 AM / by Yannick Mols posted in data science, machine learning, algorithm, linear regression


What size is this?

Suppose you want to predict what the length or width of a flower petal.
For this we can look for a relation between the two.

Read More

Basic Machine Learning - Anomaly Detection

Apr 11, 2018 10:00:00 AM / by Yannick Mols posted in data science, outliers, anomaly detection


What's weird about this?

At certain times you might be faced with unexpected patterns or events appearing in your data. Let's take a look on how we can tackle anomalies, by detecting them.

Read More

Basic Machine Learning - Clustering

Mar 27, 2018 10:00:00 AM / by Yannick Mols posted in data science, artificial intelligence, machine learning, python, algorithm


How is this related?

In this post, we'll take a look at how we can find out in what way data is structured or related.

Read More

Basic Machine Learning - Classification

Mar 13, 2018 10:00:00 AM / by Yannick Mols posted in data science, python, classification


Is this A, or B?

As a follow-up to last week's machine learning tidbit let's look at an example of how we can solve a classification problem using machine learning (on recreational data).

Read More

Subscribe to Email Updates

Recent Posts