Karl Robinson
Karl Robinson

November 12, 2019

Karl is CEO and Co-Founder of Logicata – he’s an AWS Community Builder in the Cloud Operations category, and AWS Certified to Solutions Architect Professional level. Knowledgeable, informal, and approachable, Karl has founded, grown, and sold internet and cloud-hosting companies.

You’re ready to take the plunge and initiate a data transfer to AWS, but you have terabytes or petabytes of data. You may wonder how you are ever going to transfer this much information from your on-premises data centre into the AWS cloud.

Don’t worry, AWS has a bunch of data transfer services that can help you—of course. But which is the right choice for you?

The answer to this question will largely depend on the following four factors:

  1. What is the volume of data to be transferred?
  2. How fast do you need to transfer the data?
  3. Is the data static or dynamic?
  4. Will you need to maintain a connection between AWS and your on-premises data centre once your data transfer is complete?

In this post I’m going to outline some of the services available, and where their use may be appropriate.

1. Direct Connect

Direct Connect is an option for data transfer to AWS

AWS Direct Connect is a point-to-point connection from your on-premises data centre, directly into the AWS cloud. Direct Connect is available in speeds of 1 Gbps or 10 Gbps, and use of Direct Connect has several advantages over transferring your data over the public internet:

  • Guaranteed data transfer speeds
  • Secure connection which bypasses the public internet

Direct Connect is delivered as a point to point-to-point ethernet circuit that plugs directly into a router in your network. In order to access Direct Connect, you either need to be hosted in a Direct Connect facility, connected to an AWS partner offering Direct Connect services, or you need to run a private circuit into a Direct Connect location.

AWS pricing for Direct Connect is based on two elements—hourly port fee for the 1 Gbps or 10 Gbps port, then an hourly data transfer fee based on how much data you are transferring over the Direct Connect. If you are working with a partner they may also add charges for using their network to connect to AWS. If you are running your own private circuit into the Direct Connect location, you will need to bear the cost of this.

Direct Connect is a great option for large data transfer volumes, and for those customers who need to maintain a secure high-speed connection between their premises and the AWS cloud—for hybrid cloud use cases, for example.

2. Snowball

AWS Snowball

AWS Snowball is a portable storage device that can be shipped to your on-premises data centre, where you can copy your data onto the device and ship it back to AWS for your data to be uploaded into the AWS cloud.

Snowball devices are available in 50 Tb or 80 Tb capacities. You simply create a job in the AWS management console and a Snowball device will be automatically shipped to your premises.

Once the local job is complete, you can arrange for collection of the Snowball from a built-in Kindle device (the ‘E-Ink label’) on the Snowball itself. Ruggedized to protect your data in transit, the device is protected by a hardware TPM (Trusted Platform Module), and the data is of course encrypted.

Once the Snowball arrives back at Amazon, the data will be uploaded into an S3 bucket and then erased from the Snowball. Snowball is a great solution for customers who do not have a high-speed internet connection to upload data over, or who do not have a longer-term connection requirement to warrant installing a Direct Connect.

Snowball pricing has five components:

  1. Shipping
  2. Fixed service fee per job
  3. Per day rate for the time the device is with you
  4. S3 Data transfer (transfer in is free, transfer out is billable)
  5. S3 Storage Fees

3. Snowmobile

Snowmobile

If the Snowball is not big enough for you, then you may need a Snowmobile—this is a shipping container pulled on an articulated lorry and has the capacity of 1,250 Snowballs!

The Snowmobile can move up to 100 petabytes of data in as little as a few weeks—this could take over 20 years on a 1 Gbps connection, or well over a year on a 10 Gbps connection.

The Snowmobile is, as you would expect, highly secure, both logically and physically. It is staffed exclusively with dedicated AWS security personnel, has GPS tracking, 24×7 CCTV and you can even elect to have a security escort vehicle to accompany your data in transit.

Snowmobile is available in most AWS regions, and is priced on a simple “per GB per month of data stored on the truck” formula. S3 charges also apply once data is uploaded.

4. Kinesis Data Firehose

Kinesis Firehose

Kinesis Data Firehose is the AWS solution for live streaming of data into the AWS cloud. If your data is being constantly generated and you have no window of opportunity to copy it onto a Snowball or Snowmobile, then the best solution is to live stream the data directly to the AWS cloud.

Example use cases include IoT sensor data, website clickstream data and application logs. Data can be streamed directly to an Amazon S3 bucket, into a Redshift data warehouse, or into an analytics service such as Amazon Elasticsearch, depending what you want to do with the data when it gets to AWS.

Pricing for Kinesis Data Firehose is based on the volume of data ingested on a per GB basis, decreasing as your data volumes increase. There is also a per GB charge for data format conversion, should this be required.

5. Cloud Endure

CloudEndure

Originally designed as Disaster Recovery Replication software, Cloud Endure is one of the most widely used software tools to replicate server or virtual machine data into AWS.

Cloud Endure was an independent vendor, but the company was acquired by AWS back in January 2019. AWS has a whole ecosystem of other vendor solutions which can be used to replicate server data into AWS, but because they own Cloud Endure, they offer free use of the software for 90 days for migrations into AWS. 

Like most of these solutions, Cloud Endure requires you to install an agent on the source servers to be migrated. The agent is pointed at an AWS target region where a staging area is created for your server data to be replicated into.

The staging area consists of low cost EC2 replication instances to handle the block level real time replication of your on-premises data into EBS (Elastic Block Store) volumes. Once all your disks have been replicated in, production EC2 instances are created and the EBS volumes mounted.

Data Transfer to AWS: Final Thoughts

So there you have it—five ways to get your data from your on-premises data centre to the AWS cloud. If you’re not sure which method is right for your business, or you’re not comfortable that you’ve got the necessary skills and experience required to manage large scale data transfer to the AWS cloud, then you could leverage the services of an AWS Managed Services provider. AWS has a huge partner ecosystem who can help you to realise value faster from public cloud services.