I’ve been working with AWS for a while now, and almost everywhere I’ve worked, I’ve had to produce an AWS services glossary. It’s an explainer document so that senior management can understand what I’m going on about when I spout AWS terminology.
So, to save you all that pain, here it is – full, unabridged, and hopefully somewhat useful – my AWS terminology cheat sheet.
Well, not full. AWS has over 200 services and releases new services and service updates more often than most people change their underwear. So here’s the “core set” of AWS terms you’re likely to need to explain.
Yep, let’s start at the beginning. AWS is an acronym (and there are a few of these coming up) for Amazon Web Services. I’m guessing you already knew that one though.
Interesting(?) fact, AWS made all of Amazon’s operating profit in 2021 (source: https://www.investopedia.com/how-amazon-makes-money-4587523). So you can feel better about all the money you’ve spent on Amazon. At least that’s what I’m clinging to. Read on for more AWS terms and definitions.
Compute & Storage Services
These start with things that most sysadmin types would recognize (servers, storage, etc.) and moves into the more modern areas (containers, lambda, that sort of thing). We’re going to be quite heavy on the AWS acronyms here, so I apologize in advance.
In this sense, “Elastic” is not that far from an elastic band. The capacity of your resources can stretch and shrink to meet demand, within limits. Compute is running apps, although in this case, it refers to virtual servers or virtual machines (VMs).
This is the sort of thing that your friendly neighborhood sysadmin would be familiar with, until you get into the realms of auto-scaling groups. Which use magic (metric tracking really) to spin up new VMs in response to increased demand.
Elastic Block Store.
This is a virtual disk, but it’s a type of disk suited to reading and writing in “blocks”. Databases tend to use this sort of storage type, as it has a much faster read & write speed.
Basically a network drive. Cool pricing model — you just use it and pay for what you use. Unlike disk-based storage pricing, such as EBS, where you have to provision and pay for a whole disk. One less headache.
It’s got 3 S’s, so S3. This is an object store, rather than a file store. It’s interchangeable with file storage to an extent. But instead of using native OS commands, you interact with it using the AWS CLI tool.
S3 also has the ability to run static websites (so no server-side content, sorry PHP devs), and in combination with CloudFront (I’ll talk about that later) can publish your content globally, with extremely low latency.
ECS & ECR
Elastic Container Service & Elastic Container Registry.
Ooh look, containers! Told you we’d get modern eventually.
ECS is Amazon’s service for orchestrating Docker containers (sort of Amazon’s take on Docker Swarm I guess?). ECR is their version of Docker Hub so that you can store all your Docker images inside AWS. Great if your InfoSec people don’t like the idea of data leaving controlled environments.
More containers, yay!
This is the Elastic Kubernetes Service. AWS managed Kubernetes, so you don’t have to worry about anything on the control plane, for the low price of $0.10/hour per cluster.
EKS is backed by EC2 to supply the compute nodes for the control plane to allocate pods to. This comes in two forms, EC2 (you provision the servers, and tell them to register to the cluster) or Fargate (AWS does all of that for you, but for more money).
Fully into the realms of “that can’t be done” from 10 years ago, AWS Lambda is a way to “run code without thinking about servers” (from the mouth of the AWS horse). It’s part of a new generation of compute services, called FaaS (Function as a Service). It supports all major programming languages, and for anything it doesn’t natively run, it will also run docker images.
Is this compute? I’m calling it compute-adjacent and leaving it here.
AWS has two offerings for caching services, both under the banner of Elasticache. Redis and Memcached. There are reasons why you’d use one over the other, but I’m not going to go into that here (use Redis if you value your sanity). Again a fairly traceable name. Cache because it’s a cache, elastic because it implements elasticity in the same way the EC2 service does.
Again, compute adjacent.
This used to be AWS ElasticSearch, until Elastic somewhat blew up their public image (personal opinion, and does not necessarily reflect the view of Logicata) by changing the license it used, thereby preventing AWS from reselling it.
AWS responded by forking the last open source version of ElasticSearch and running with it from there.
More compute-adjacent services, but everything in compute sends either logs or metrics here, so here it goes.
You can “watch” your “cloud” resources. CloudWatch. This covers metrics and logs, but there are different charges depending on what you’re looking at.
You can do some cool stuff with logs, like exporting them to other tools for analytics and graphing. CloudWatch is also starting to venture into Application Performance Management (APM) through things like heartbeat monitoring, link checking, and basically anything that you’d historically go elsewhere for.
Right, time for a new category. These services allow communication between services within your account, without a “hard” coupling between them. This design pattern is called (inventively) “loose coupling” and is the darling of the microservices world. And so we continue with AWS terms and definitions…
Simple Queue Service
It’s a queuing service. With a ridiculously high free tier (1 million requests per month!). It was also AWS’s first available service, way back in 2004, predating AWS itself by 2 years, which is pretty cool.
SNS & SES
Simple Notification Service & Simple Email Service
It sends notifications (think text messages), and emails. This blog is sort of writing itself at this point.
SNS will send emails, but SES gives you more control over the email content. It is however more awkward to configure and requires some DNS shenanigans.
Managed Apache ActiveMQ/RabbitMQ. I can’t really talk about this one at length because I don’t use either, but again, messages → queue → thing to receive messages. I guess you’d use this if you already used Mq/Rabbit and didn’t want to (or couldn’t) migrate to SQS. It would probably be worth the pain to migrate because this is run from EC2s, so you pay quite a bit more than SQS.
Kinesis & Firehose
These are two different tools, but I’ve lumped them together because they’re used together quite a lot.
Kinesis is (loosely) Greek for movement, and that’s what it does. Moves data.
It comes in two forms Streams and Firehose.
Kinesis Streams take streaming data, and lets you do transformations on the data, before outputting it. You could use this to dynamically change the content of a webpage as a user is interacting with it. Pretty cool huh?
Kinesis Firehose allows you to continuously stream data from disparate inputs (like IoT devices) into either analytics tools (e.g. kinesis streams or custom lambdas) or S3.
This is where the sysadmin types might shout at me because “database” here is used a little loosely.
AWS has a range of database services available, some more “traditional” than others.
Amazon will set up and manage a “Highly Available” (HA) cluster of a database engine of your choice. Not all DBMSs are available (sorry Sybase users), but the common ones are there.
You still get CLI and SSH access too, which is nice if you need/want/like to fine-tune anything.
You can pretty much use this as a drop-in replacement for an on-premises DB cluster, but you can’t quite do without a DBA. You will also need some EBS (see above).
Aurora is part of the RDS family but is fully managed, so you don’t get access to the underlying servers. It’s both MySQL and PostgreSQL compliant. Either/or, but not both at the same time.
DynamoDB is the next extension of AWS’s DB offering. Dynamo is a NoSQL (Not Only SQL) database. It’s largely cheaper to run than RDS/Aurora, and is fully serverless, but doesn’t enforce referential integrity (see here for an explanation). So if you can work with that (and being honest, you probably can) go for DynamoDB.
Dynamo DB pricing can be a little confusing, as you provision the amount of reads and writes you need upfront, based on the size of the operations (reads can be bigger than writes for the same number of “capacity units”) and whether you need the latest data or not. The short version though is just use Pay-As-You-Go pricing until you understand your usage patterns. Or just stay in PAYG forever, as it’s not that much more expensive, and has some free elements.
Redshift is AWS’s data warehousing solution, using columnar storage (most DBs are row-based, with the notable exception of SybaseIQ. Take 10 imaginary points if you’ve heard of that before).
This is one of those tools with a story behind the name, it’s one of two things, and I can’t find anything definitive to confirm either.
Option 1: Redshift is a physical phenomenon, and part of the doppler effect, where items getting further away appear red. This is usually due to expansion, so this could be the easy way you can expand the size of your redshift clusters.
Option 2: It’s a swipe at Oracle, who have a red logo. The idea being that teams would shift away from Oracle.
Take your pick which you believe. I think option 2 is more likely, but because I’m a nerd I like option 1 more.
This is a fully-managed graph database that supports Gremlin, SPARQL, and openCypher for queries. This is a serious competitor to Neo4j in this space, especially if you don’t want to run Neo4j (trust me, I’ve done it, you don’t), or don’t want to use Neo4j’s managed service “Aura”.
Aura honestly isn’t bad, but the scaling of Neptune is much more in line with the rest of AWS (it basically works as RDS does) and means that any traffic to your database doesn’t have to leave AWS, then come back again.
DocumentDB (with MongoDB compatibility)
Yes, that is actually the service name.
DocumentDB is a way of storing JSON documents (again, an obvious name when you think about it). This is useful because it saves a lot of conversion to/from JSON and whatever format the database is storing data in.
Personally, I’d just use DynamoDB, as it works similarly, and is much cheaper, but if you’re already running a MongoDB instance, it’s a good option.
Couldn’t come up with a good name for this arbitrary collection, so that’ll have to do.
Cloudfront is AWS’s take on a Content Delivery Network (CDN).
Cloudfront is pretty cool because it’s using AWS’ existing (and massive) network of servers. It talks natively to other AWS services so that you can include the setup/teardown of your CDN with your web-app deployment. This can take a little while though, so if you’re deploying CloudFront, make a coffee.
An API is an Application Programming Interface (but you knew that already, right?). API Gateway is an easy way to create and publish your APIs so that you can use them with other services (both AWS and not).
The upshot of using API Gateway, rather than self-hosting your API, is it talks natively to other AWS services, including CloudWatch. Meaning that you can monitor your API service just like any other AWS-hosted service. And it’s PaaS. And it’s cheap, 1 million messages per month for free, and $1/million after that, with no “standing cost” for having it available 24/7.
AWS Certificate Manager
It manages SSL/TLS certificates. Really nice tool if you’ve ever had the displeasure of provisioning TLS certs manually *shudder*. It talks to pretty much all AWS services that could use a certificate (load balancers, CloudFront, API Gateway, you get the idea), and will automatically rotate any certificates that it issued, so you don’t have to fix the SSL error at 4 pm on Christmas Day.
DDoS protection. Sort of what it says on the tin. Works on OSI layer 3 or 4 (OSI model), 24/7 coverage, with a human on the other end for when the heuristics fall over. Nice price protection too — stops you running up a massive scaling bill because you’ve been DDoS’d.
Web Application Firewall. Firewall as a service at a basic level. At the advanced pricing level for Shield, this comes free, which is nice.
Sorry, I know this one’s dull, but you need it to answer the question “can other AWS customers see our data?” (Hint. no, they can’t.)
Virtual Private Cloud.
To understand this, you need to understand the difference between public and private cloud. The short version is “with a private cloud you own it all and are the only person (or company) on the hardware. With public cloud, none of that is true (in most cases, but I’m not going to go into that here)”.
A VPC allows you to treat AWS as if it’s all yours. You’re not going to see anyone else’s resources when you log in, and they won’t ever see any of yours either.
For the most part in AWS, you have no idea that anyone else is using the service, except for a few unique naming rules.
You do have to be a touch careful with VPC though, because of the isolation that they provide. It’s quite easy to run up a large bill for running traffic between a server in a VPC, and S3 (which doesn’t sit in a VPC), unless you provision an endpoint in your VPC for S3. Why this isn’t done by default, I have no idea.
DNS, Amazon style.
Domain Name System is a translator, between human-readable web addresses and an IP address. For example, www.google.co.uk has an IP of 22.214.171.124 which your PC uses on the internet (yes that is actually google.co.uk’s IP). Route53 is Amazon’s implementation.
This is a piece of kit that you put in your existing datacenter. It gives you access to the storage services (EBS, EFS, S3, etc) using standard network file transfer protocols (SMB, NFS, iSCSI). Could be viewed as a stepping stone to getting into the cloud, but I think it’s more aimed at being an easier backup solution.
This creates a direct link between your datacenter and the AWS backbone, so you’re not talking over a VPN or public internet. This is significantly faster than a VPN and more consistent — no more spikes at peak times.
This is somewhere between compute and networking really (and in truth they live in the EC2 section of the AWS console), but I stretched the limits of what I could call a “compute service” already, so they’re going here instead.
There are 4 types of load balancers, and which one you’d pick depends on what you need them to do. This section really deserves its own blog post, so maybe I’ll do that eventually, but for now, the types are:
These didn’t really fit into my arbitrary AWS terminology categories, so you get them as one homogenous lump instead.
Auditing. Well, an audit “trail” on your “cloud”. You absolutely want to turn this on and have it log to an account that isn’t the one being recorded.
If you’ve ever had the displeasure of trying to work out where half of your servers went at 3 am without audit logs, you’ll understand what I mean.
Identity Access Management (what?)
This is AWS’ “permissions” setup. It’s a way to control who gets access to what. Broken down into users, groups, roles & policies. Users go into groups. Roles can be applied to groups or users. Policies are attached to roles.
The upshot of this is servers/other resources can hold an “IAM Role”. This allows them access to do/see/get/change something from another service, without having to create service accounts. If you’ve ever used them in the past, you’ll understand why this is “A Good Thing TM”.
It manages secrets (shocking, I know) and allows you to refer to them via their ARN (Amazon Resource Name). This gives you powerful options inside your resource stacks. Like not putting access keys or database passwords in source control, but referring to their ARN.
Athena was (is?) the Greek goddess of wisdom, and the tool allows you to query files directly in S3, using SQL. Thus gaining wisdom?
Quicksight is the graphing tool you can use on top of AWS Athena, to give quick (in)sight into your data.
This is an ETL (Extract Transform Load) tool. ETL is used a lot when creating derived data sets (e.g. data aggregations). Essentially it’s “gluing” data back together.
OpsWorks allows you to run your existing chef (and puppet) code in your AWS account. I think it’s called OpsWorks because in a traditional setup that’s the work of an Ops team.
This monitors your AWS estate and gives you some control over the change management, and compliance monitoring. Handy when you have a regulator to worry about.
Like EC2 this is a collection of services that do a range of things. The most useful of which is Parameter Store (in my opinion).
SSM parameter store stores parameters (you’d think it would, wouldn’t you?). What makes this service interesting though, is the pricing & throughput.
You can store an unlimited number of parameters of up to 4kB each, for free. Yup, free. Above that size, it’s $0.05 per parameter per month. Encrypted parameters are included in the free offering, so unless you really value automatic rotation (and if you don’t mind writing a lambda to do this instead, it’s a non-issue), it’s a very competitive offering compared to secrets manager ($0.40 per month per secret).
The throughput is so high (once you’ve turned on higher throughput) that if you hit the ceiling, I will personally buy you a drink of your choice (up to the value of £4.99, not exchangeable for cash, you must collect in person and supply appropriate evidence).
Ok, I’m done now. Hopefully, this long (but not exhaustive) AWS glossary saves you from having to write your own and saves me from having to do this again.
Or, you could completely skip the headache of explaining all of these AWS abbreviations to your leadership team, and let me do it instead (honestly, I don’t mind), by getting in touch with me via Logicata!