Amazon Redshift Data Warehouse

What is Amazon Redshift?

  • 4 min read

Amazon Redshift is a Data Warehouse in the Cloud.

Amazon Redshift is the AWS Data Warehousing solution enabling business intelligence in the AWS cloud.  Redshift enables customers to query petabytes of structured and semi structured data using standard SQL queries.

AWS customers can start building a Redshift Data Warehouse for as little as $0.25 per hour, and can scale up their data warehouse in line with business requirements.

How does Amazon Redshift Work?

Redshift can be configured either as a single 160GB Node, or as a multi node clustered solution with a ‘Leader’ node that manages client connections and receives queries, in front of up to 128 Compute Nodes which store the data and perform queries.

Redshift uses advanced compression technology – it compresses individual columns in the database to achieve significant compression relative to traditional relational database stores.  Therefore, data stored in Redshift consumes less storage space compared to competing technologies.

Redshift leverages ‘Massively Parallel Processing’ (MPP) technology – data and query workloads are automatically distributed across all compute nodes, enabling Redshift to resolve complex queries across large datasets quickly and efficiently.

How Easy is it to Scale an Amazon Redshift Data warehouse?

AWS offer a number of ways to scale a Redshift data warehouse in line with growing demands from data scientists.

Firstly, with AWS Redshift Data Spectrum, queries can be performed on data stored in Amazon S3 datalakes, without data being loaded into the Redshift cluster.  This means data scientists can perform queries without waiting for Extract Transform Load (ETL) jobs to complete.

Next, a Redshift cluster can be scaled by adding additional compute nodes in a matter of hours, compared to the days it may take to scale a traditional on premises data warehouse solution.

Capacity can also be increased rapidly by deploying a new Redshift data warehouse from a snapshot.

And finally with Redshift Elastic Resize, new compute nodes can be added or removed in minutes to enable your Redshift cluster to grow and shrink in line with peaks and troughs of demand.

How Secure is Data Stored in Amazon Redshift?

Redshift attempts to maintain 3 copies of the data stored in a Redshift Data Warehouse.  1 day backup retention is enabled by default, but the backup retention can be increased up to 35 days.

Redshift can also asynchronously replicate data to Amazon S3 storage in another AWS region for disaster recovery purposes.

Data in a Redshift data warehouse is encrypted both at rest and in transit.  At rest data isn encrypted with an AES-256 algorithm, while SSL takes care of encrypting data in transit.  Redshift takes care of encryption key management by default, but customers can manage their keys via a Hardware Security Module (HSM) or using the AWS Key Management Service (KMS).

How Resilient is an Amazon Redshift Data Warehouse?

Redshift is currently only available as a single availability zone deployment.  However, in the event of an outage, Redshift snapshots can be deployed into a different availability zone to get back up and running quickly.

How do AWS Charge for Amazon Redshift?

Like all AWS services, with Redshift you only pay for what you use.  AWS do not charge any fees for the Leader node in a Redshift cluster.  Compute nodes are billed by the hour, and you will also be charged for data backup and data transfer.

AWS claim that Amazon Redshift is at least 50% cheaper than all other cloud data warehouses, with 1TB of data costing only $1,000 per year.

nv-author-image

Paul Young

Paul is a content marketer at Logicata. With over 10 years experience in the Cloud space, Paul enjoys writing about Public Cloud Technology, particularly in relation to AWS and Azure Managed Services.