Amazon SageMaker at a glance

Originally posted on Jul 10, 2018 10:00:00 AM
Last updated on July 24, 2024
Bart Maertens

data architect and developer with over 20 years of experience in data engineering and analytics. Founder and lead of the know.bi expert team, Apache Hop co-founder and PMC member.

Amazon SageMaker is a "fully managed machine learning service". This means it provisions an environment for data scientists and developers without them needing to worry about managing servers.

Please note: at the time of this post Amazon SageMaker is only available in the Ireland region for Europe.

Leveraging the ease-of-use of Jupyter Notebooks, SageMaker enables you to easily explore and analyze data, sadly the service does not (yet) support JupyterLab.

overview

Training and hosting instances are billed by seconds of usage, with notebook instances being billed hourly.

When we first visit the Amazon SageMaker dashboard we are asked to create a notebook instance. Here we can choose a name, an instance type, an IAM role, a VPC, configure the instance's life cycle and choose an encryption key for the notebook data.

This indicates how Amazon has built a product that works on top of their own services. Next to Jupyter instances the service is enriched by the SageMaker (open source) libraries Python and Spark.

In fact, SageMaker actually gives a very transparent vibe and allows you to use Amazon's or your own algorithms and frameworks. Furthermore the service hosts jobs, models and endpoints.

Taking a more detailed look, this is all done by using the benefits of Docker (ECS) and S3 to create an environment many teams would strive for. Ready for you to use right out of the box!

Amazon's algorithms are well-documented and they've provided numerous example notebooks for you to explore. These are also available on any newly created Notebook server.

examples

An overview of the Amazon SageMaker workflow

The above image shows a simplified workflow on SageMaker.

After some data wrangling we train a model using a training image stored on Amazon ECR (green). We then have model artifacts (blue) which we can use to run, test and deploy (red). An endpoint is created to give applications access to the trained model and run inferences on new data (purple).

training

A few noteworthy realizations

SageMaker can run multiple training jobs on a data set using a range of specified hyperparameters and determine which version of a model is best
The underlying instance is accessible by using the Terminal option in the Notebook but not through the AWS Console

Jobs can be run in parallel
TensorFlow and MXNet have native support and are also open sourced
The SageMaker Python SDK supports local mode, which allows you to deploy on your local machine and work there

Amazon SageMaker streamlines the creation of ML pipelines and minimizes the need for maintenance while simultaneously cutting costs as you only pay for what you use. It comes with various (open-source) features and enables you to run "bring your own"-code.

Be sure to keep an eye on our blog in the coming weeks as we take a deeper dive into Amazon SageMaker!

data science, aws, python

Automate everything!

Analytics projects are often treated as ad-hoc projects. Code and content are...

3 reasons to move your ETL to the web, cloud

ETL development heavily relies on the desktop with...

Amazon SageMaker at a glance

Subscribe to the know.bi blog

Blog comments

Related posts

3 reasons to automate your analytics projects

Automate everything!

3 Reasons to take a look at WebSpoon for web or cloud ETL

3 reasons to move your ETL to the web, cloud

5 reasons to move your bi to the cloud