Tags: Amazon DynamoDB, Apache Cassandra, NoSQL
Looking for greater flexibility, speed, and scale for your Apache Cassandra solutions? It’s time to consider Amazon’s DynamoDB as a great alternative. We would like to share our hands-on experience and insights for those looking to migrate Apache Cassandra to Amazon DynamoDB using the new data extraction agents that AWS recently released with the Schema Conversion Tool. This migration approach involves adding an additional datacenter to an existing Cassandra cluster. This datacenter clones all data from the original cluster, and then SCT uses this datacenter to migrate the data to Amazon DynamoDB.
Background on Apache Cassandra and Amazon DynamoDB
Apache Cassandra is a free and open-source distributed NoSQL database management system. Initially created by Facebook, Cassandra is designed to handle large amounts of data across many commodity servers. Cassandra is wide column store database. Rows are organized into tables and the first component of a table’s primary key is the partition key.
Amazon DynamoDB is a fully managed proprietary NoSQL database service. With DynamoDB, you can create database tables that can store and retrieve any amount of data. DynamoDB databases may serve any level of request traffic. You can easily scale up or scale down your tables’ throughput capacity without downtime.
Why migrate Apache Cassandra to Amazon DynamoDB?
There are situations where Amazon DynamoDB can provide greater scale and performance over your existing Apache or Datastax Cassandra workloads. Of course, this depends on your Cassandra use cases. For example, DynamoDB works better with real-time bidding platforms, gaming applications, and recommendation engines.
Talking in terms of performance, DynamoDB scans the data much faster, especially if you don’t have a primary key in your query. Also, DynamoDB provides you with strong consistency, while Cassandra can have issues with frequently updated data due to latency issues between distributed nodes. Datastax support posted an example of this problem in a post titled – Dude! Where’s my data? In addition to that, DynamoDB provides Global Tables for deploying multi-region, multi-master databases without you having to maintain your own replication solution. Security may also be a major concern for Cassandra’s data at rest, while DynamoDB encrypts the data at rest and in transit.
If you’re encountering similar issues, Amazon DynamoDB may be in your future! Let’s dig into the step-by-step workflow and then cover some of the key limitations we encountered while trying to migrate Apache Cassandra to Amazon DynamoDB for our customer.
Workflow to migrate Apache Cassandra to Amazon DynamoDB
DB Best was the first AWS partner to migrate Apache Cassandra to Amazon DynamoDB using data extraction agents. In this diagram, we highlight the key steps in the migration process.
Now let’s take a look at each of these steps in detail.
Step 1: Cloning the datacenter
To avoid interfering with production applications that use your Cassandra cluster, AWS SCT creates a clone datacenter. Then SCT copies your production data into this datacenter. Basically, the clone datacenter acts as a staging area. This way, AWS SCT performs further migration activities using the clone rather than your production datacenter.
AWS SCT creates a standalone Cassandra installation, enabling the clone datacenter to run on an Amazon EC2 instance. This allows for hosting your clone datacenter independently of your existing Cassandra datacenter.
All you need to do is provide the connection credentials to an empty Linux virtual machine in the AWS SCT wizard. After that, SCT will deploy Cassandra on this Amazon EC2 instance automatically. Moreover, the wizard will configure your VM according to the parameters of the source Cassandra cluster.
Once the Ec2 deployment of the standalone Cassandra is complete, SCT starts the background cloning process. Once SCT completes the data replication process, we can disable further replication from EC2 and make a fast backup of all data. Then, we can just use this backup as a source for migrating to DynamoDB.
Step 2: Installing the data extraction agents
You can find the installation files of the data extraction agents in the archive with the latest SCT version. You can download AWS SCT from the official site.
If you already have AWS Schema Conversion Tool installed on your PC, you might need to update it in order to use the data extraction agents. Press the Help button and select the ‘Check for updates’ option from the menu to download the latest SCT update.
Also, read our blog post on using data extraction agents in AWS SCT to discover more details.
Amazon recommends running the data extraction agent on an Amazon EC2 instance.
After installing the data extraction agent, you need to configure it. Amazon provides you with a step-by-step guide, so, the process will not take too long. Also, you can find the commands that we used in the video below.
First, we fill in the prerequisites for the C* node. We disable autocompaction, enable backups, and grant full access to data files.
# nodetool -u <jmxuser> -pw <jmxpass> disableautocompaction
# nodetool -u <jmxuser> -pw <jmxpass> enablebackup
# sudo chmod 777 -R <path_to_cassandra_data_folder>
# ls -ltrh <path_to_cassandra_data_folder>
Then we install ‘sshfs’ and ‘expect’ packages.
# sudo apt-GET install sshfs
# sudo apt-GET install expect
After that, we edit the fuse.conf file.
# sudo nano /etc/fuse.conf
Next, we install the data extraction agent.
# sudo dpkg -i aws-cassandra-data-extractor.deb
And then we configure the data extraction agent.
# sudo java -jar <path_to_cassandra_data_extractor> -c
To start the data extraction agent we use the following command.
# sudo systemctl START <path_to_cassandra_data_extractor>
Step 3: Keyspace conversion
The very first thing you need to do in any migration project is to convert the schema. An Apache Cassandra to Amazon DynamoDB migration project is not an exception.
Despite Apache Cassandra not having traditional schemas, you still need to use the “Convert schema” option.
Step 4: Applying converted code
After converting the database schema, AWS SCT provides you with the Assessment report. This report includes all the issues (called action items) that occurred during automatic conversion. So, you need to fix all action items in this assessment report. Then you can apply the converted code to the target database.
Together with the keyspaces, you have to upload the following metadata: tables, table columns, primary keys (partition keys, clustering keys). Then you can start the data migration.
In AWS SCT, you can specify the provisioned capacity for all DynamoDB tables. Particularly, you can increase the number of capacity units to increase the read/write speed. Of course, the more capacity units you use, the more money Amazon will charge you at the end of the month. You’ll want to reevaluate the capacity unit settings after your migration project to align with the performance needs of your applications.
Step 5: Data extraction
Now the target database is ready for the data transfer. During the first step of this complex operation, we extract the data from the source database using bulk extract.
To do so, we need to register the data extraction agent in SCT and press the “Start” button.
Step 6: Data migration
SCT can’t access DynamoDB storage directly. SCT writes the extracted data to an Amazon S3 bucket. The SCT data extraction agents manages this task for you.
As you may already know, every table in Cassandra consists of a number of SSTables. We can specify the number of Cassandra’s SSTables used for parallel data extraction. Increasing this value, we increase the file system load of the Cassandra server. At the same time, SCT will complete the data migration faster.
Step 7: Data upload
Next, SCT uploads the data from S3 to Amazon DynamoDB. AWS SCT uses a comprehensive approach to migrate data from a Cassandra cluster to Amazon DynamoDB. This approach includes two steps. On the first step, SCT migrates existing data. Then, SCT replicates ongoing changes. This is a continuous process. SCT uses the capture data changes feature and uses the captured changes as an input for DMS replication until you are ready to redirect your applications to your new DynamoDB database.
After SCT completes the load stage, the CDC phase starts. So, the agent works indefinitely in the background. You need to click the “Stop” button to stop the change data capture replication process.
Step 8: Shutting down the clone cluster
The last step to migrate Apache Cassandra to Amazon DynamoDB is to switch off the cloned datacenter. Actually, you might want to switch off all your Cassandra instances in case of a successful migration. However, we will not discuss this strategic decision, and instead concentrate on decommissioning the datacenter that you created on AWS EC2. For example, you might want to consider a database unification approach where data is replicated from DynamoDB back to your Cassandra database until you are completely satisfied with your DynamoDB deployment.
Of course, you don’t want to lose any information, so you need to make sure no clients are still writing to any nodes in this datacenter. Then you have to run a full repair with nodetool repair. This ensures that all data is propagated from the datacenter being decommissioned. Then change all keyspaces so they no longer reference the datacenter being removed. Finally, run nodetool decommission on every node in the datacenter being removed.
Now, you can take a look at our step-by-step video that covers how to migrate Apache Cassandra to Amazon DynamoDB.
Сhallenges and limitations that we see with Cassandra to DynamoDB migration projects
As an AWS Advanced Consulting Partner, DB Best has done more migrations of on-premises databases to the Amazon cloud than any other partner. So, not only we know the challenges that may emerge during the migration. We also know the possible workarounds and the ways to fix these issues.
The first challenge relates to application conversion. Usually, you may opt for using AWS Schema Conversion Tool to make the application code compatible with the new target database. This is not the case for Cassandra to DynamoDB migrations. So, you have to convert your application manually.
Managing data and applications is a hard task, especially when it comes to managing data in NoSQL environments. The key challenge here lies in validating the migration. Essentially, this means comparing data between the source and target databases. We implemented native Cassandra support into our in-house Database Compare Suite and we recommend using this tool to compare data after migration.
Target database limitations
All databases have some kind of limitations, and you need to consider them when migrating from another platform. This is also the case for Amazon DynamoDB. In order to maximize success when trying to migrate Apache Cassandra to Amazon DynamoDB, you need to carefully examine DynamoDB’s limitations before the start of a migration endeavor. Actually, you need to consider these important details while designing the future-state architecture of DynamoDB database. You can find the information about Amazon DynamoDB limits on the official site.
Here are what we consider to be the top 3 DynamoDB limitations that you really need to consider.
1. Item length
The maximum item size in DynamoDB is limited by 400 KB. Basically, item is a table row in DynamoDB. This limit includes both the attribute name binary length and attribute value lengths. The attribute name counts towards the size limit. So, you need to provision the items’ size in your Amazon DynamoDB database.
2. Collection types
Cassandra provides collection types as a way to group and store data together in a column. Cassandra’s contextual query language (CQL) contains three collection types: set, list, and map. DynamoDB doesn’t provide users with support of collection types. And what’s even more important, AWS Database Migration Service (DMS) doesn’t support uploading the data of these types to Amazon S3. So, even if your extraction agent will extract this type data from Cassandra, you will not see it in the target database.
3. Tables per region
You can store tables of practically unlimited size in DynamoDB. However, for any AWS account, there is an initial limit of 256 tables per region. If you reach this limit, you can either restructure your database design or request Amazon support for a service limit increase.
This completes our overview of how to migrate Apache Cassandra to Amazon DynamoDB. If you have any questions related to the database migration to the Amazon cloud, feel free to contact us.