What is Amazon DMS
Every day, more and more companies are moving towards cloud computing, with Amazon Web Services (AWS) undoubtedly being the biggest player. Having all the possible AWS services available at your fingertips is great, but you still need to migrate your existing infrastructure and data into the (AWS) cloud. At re:Invent 2015, Amazon announced “AWS Database Migration Service”, aiming to make the process of moving data into databases on AWS a lot easier.
AWS DMS supports most open-source and commercial databases such as PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server, and of course their own Aurora, Redshift, DynamoDB and S3 services. Both homogeneous (e.g. Postgres to Postgres) and heterogeneous migrations (e.g. Oracle to MySQL) are supported. Either the source or target database is required to be in the AWS cloud. DMS regularly gets updated with new features and supported engines.
At the highest level, you have three components to take care of when starting a migration using DMS:
- Replication instance: The replication instance provides the compute resources you need for the migration. As with other AWS services, you have the freedom to choose which size fits your needs best.
- Endpoints: Either called ‘source’ or ‘target’ endpoints, which specify the entry points to the database you are migrating from/to.
- Tasks: Here’s where you tell DMS what to do, specifying: the endpoints you want to use for the migration, task settings which can be tuned for each type of migration, and table mappings which allow you to perform some additional modifications at schema, table or column level during the migration.
When performing a homogeneous migration, schema migration will be the easiest, relying on the engine’s native schema tools. For heterogeneous migrations between different engine types, AWS offers the AWS Schema Conversion Tool to generate a target schema for your application. Getting your schema right is the most critical part of your migration, so make sure you’ve covered everything here before even continuing your migration plans.
Migrations can either be a one-time migration of the source database (“full load”) or a continuous migration which replicates all changes on the source database to the target once the full load has completed (“full load and ongoing replication”). The full load tasks were available from day one, the support for continuous data replication was added mid 2016. Full load tasks are the easiest to set up, requiring no additional configuration on the source database except for the endpoint configuration with the usual VPC and security group config. For tasks with ongoing replication, you might need to apply additional configuration to the source engine to support the transaction backlog (e.g. Postgres WAL, MySQL binlog) required.
The setup and management of DMS resources can be done using the AWS Console, AWS API or via the AWS CLI. The AWS Console is sufficient in most cases, but when diving deeper into the task settings, you will definitely need to play with some JSON via the command-line interface. Surprisingly, you cannot edit the majority of them in the AWS Console at all.
AWS offers comprehensive documentation on the features and limitations of the managed service, so I highly recommend to dive into the documentation to understand how DMS can be used in your particular migration case. The concept of DMS is relatively straightforward with the potential for fast results, but it doesn’t come without its own set of challenges. You’ll need to address those challenges before being able to feel a hundred percent comfortable using it in your production system.
The pricing model for DMS is simple: you pay only for the replication instances by the hour they were running. DMS currently supports the T2 and C4 instance types in all regions. Up- and downscaling is a simple process with limited downtime. Data transfer costs are only applied to migrations to target databases in a different Availability Zone, Region or outside of AWS. Worth highlighting here: DMS is not exclusive to migrating your databases into AWS, you can also use it to get databases out of the AWS ecosystem if you ever wish to do so.
Use case and experience
I have been using DMS extensively over the past year at a client, where DMS is deployed in a production BI solution. DMS is used to copy data from seven different source systems (billing, CRM etc.) into one ‘landing’ Postgres RDS instance within the BI environment. All source engines are Postgres, except for one MySQL database. The data warehouse is built upon that landing using the full Pentaho suite. An Amazon Redshift cluster, also continuously loaded from the Postgres data warehouse through DMS, is used to provide split-second access to reports and dashboards. Before using DMS for this step (Redshift was not yet supported by DMS at the time), we loaded the data warehouse as CSV into Amazon S3, of which the files were then uploaded into Redshift. DMS eliminated a lot of complexity here.
As stated above, DMS is still very much in development with updates to the core product, as well as new features. After using it for quite a while now, there’s no denying AWS DMS is a powerful product which is easy to set up, but it requires some more effort to implement it properly. For a full load migration, translating the schema properly is crucial. For continuous migrations, things get more complicated, especially if the source system is also still in development (which is the case at my client). Monitoring and logging is vital to check if DMS runs as expected in those cases. Even with all those precautions in place, we’ve run into a variety of issues and bugs, which often resulted in a case for Amazon Support. A recent update on the Database Migration Service introduced new methods of migration validation, which is a perfect topic for a potential follow-up post…