A Summary of Part I:Getting Started with AWS
AWS For Dummies
By Bernard Golden
Introduction
What is Cloud Computing?
The national institute of Standards and Networking (a U.S. Government agency) has created agreed upon definition of cloud computing.
“Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
With the cloud model being comprised of 5 key characteristics:
- On-Demand self-service: a user can immediately provision computing capabilities without human interaction from a service provider.
- Broad network access: capabilities are accessed via standard network mechanisms.
- Resource pooling: the provider’s resources are pooled and used by multiple clients with resources assigned and re-assigned according to consumer demands.
- Rapid elasticity: Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand.
Amazon Business Philosophy
AWS was released in March 13, 2006 with it’s first product — Simple Storage Service which today has been shortened to S3. S3 simply allowed a user the ability to store objects over the web be it a photo, software package, backup etc. When S3 was launched a user could simply upload, read and delete objects with limits on size and location (U.S. only).
A short while later AWS launched Simple Queue Services (SQS) which provided a way to pass messages between different programs. Later in 2006 AWS launched Elastic Compute Cloud (known as EC2) offering computing capacity on demand.
Network Effect
AWS benefits from a large network effect. Firstly, AWSs size encourages developers to build services for AWS clients (AWS Marketplace). Secondly, due to AWS having infrastructure across the world it is able to reduce latency (delay imposed by distance travelled in a network — user in London with servers in Mumbai).
Introducing the AWS API
On top of the AWS environment sits an API (Application Programming Interface). Every service that you can ever use is called via its API.
How does the API authenticate? AWS uses the access key and the secret access key to authenticate users. For example, if you wanted to upload an image to S3 using the API, you would encrypt the image using the secret access key (to prevent the payload from being examined during transmit), sign the payload by adding the secret access key to the now encrypted payload and send the final payload with your access key to AWS. On Amazon’s end it will de-encrypt the file by finding your secret access key from your access key.
AWS Console
For users who do not want to use API’s directly, Amazon offers a console which will allow you to send the calls through an interface.
AWS Storage
Storage is becoming an increasingly important topic due to the rise of big data, digital media etc. AWS’s first products were storage based and therefore storage sits at the heart of its offering.
AWS offers 4 storage options — S3, Elastic Block Storage, Glacier and DynamoDB.
S3 = URL object store — allows data to be accessed from within AWS as well as externally from the Internet
EBS = Highly persistent volumes to be attached (and detached) from running EC instances
DynamoDB = DynamoDB provides flexible, high-performance, and robust storage for webscale applications.
Glacier = Backups! Glacier’s job is to ensure that critical data is never lost, and it does its job well.
S3
S3 is one of the most widely used AWS offerings. It’s object storage is used by Dropbox, Netflix, Medcommons…
Every S3 object has a unique URL and is stored within a bucket (a group of objects). S3’s only limitation is the object size which is currently limited at 5TB. Although you can’t update objects in S3, AWS has gotten around this issue through offering versioning (you can modify version 2 of an S3 object, for example, and store the modified version as version 3). So if you upload an object with the same name as an existing object Amazon S3 creates another version of the object instead of replacing the existing object.
S3 offers encryption of objects stored in the service, securing your data from anyone attempting to access it inappropriately. You can log requests made against S3 objects to audit when objects are accessed and by whom. S3 can even be used to host static websites: They don’t dynamically assemble data to create the pages served up as part of the website — removing the need to run a web server.
Security
- You are able to restrict access to S3 files by a range of IP addresses!
- Restricting Access to a Specific HTTP Referrer
“Suppose you have a website with domain name (
www.example.com
orexample.com
) with links to photos and videos stored in your S3 bucket,examplebucket
. By default, all the S3 resources are private, so only the AWS account that created the resources can access them. To allow read access to these objects from your website, you can add a bucket policy that allowss3:GetObject
permission with a condition, using theaws:Referer
key, that the get request must originate from specific webpages. The following policy specifies theStringLike
condition with theaws:Referer
condition key.”
CloudFront
An S3 bucket is assigned to a region. Latency can become an issue if one user in Australia is accessing files in a bucket in America. The solution to this is a Content Delivery Network (CDN) which places servers around the world. When this user requests the web page in America, the static web site can come from America and the dynamic content can come from Australia. The original issue with CDNs were that they were complicated and costly to maintain limiting them to only a handful of enterprise clients.
CloudFront brings CDN technology to its entire user base.
Elastic Block Storage
Every server needs a drive.
Amazon Elastic Block Storage, or EBS, is essentially a cloud-based storage for the drives of your virtual machines — just like a D drive or an E drive (in case of Linux, its //Data/ etc. The core principle of EBS is that it stores data as blocks of the same size and organises them through the hierarchy, similar to a traditional file system.
EBS is attached to an EC2 instance. However, it is separate in the fact that when the EC2 is terminated the EBS is not lost but simply waits to be attached to another EC2 instance.
Amazon offers “snapshots” which takes a copy of an EBS and uploads it to S3 for you.
Elastic Compute Cloud
EC2 = Elastic Compute Cloud (get it 2c’s!)EC2 is the most widely used AWS service and is a revolution in Information Technology. EC2 provides virtual servers in a matter of minutes, all via self-service.
In the old days if you needed a server you had to buy a whole one, get it psychically delivered and set up. It could take up to six months. Then virtualization came around which allowed one server to be divided into multiple different chunks. This meant that it now took three to six weeks to get a VM up and running without the need to keep buying separate servers for different applications.
Networking
Networking is a big deal in the AWS scheme of things. Without it, none of of your AWS instances would be able to send and receive network traffic.
There are 2 network layers which are important with the cloud.
- The data link layer — Controls the flow of data between network entities on the same network.
- The network layer — Controls the flow of data between different networks. This network layer most commonly uses IP addresses.
Amazon assigns both a public and private IP address to an instance. The private address is within the 10.X.X.X address
range — a range designed to be unroutable over the public Internet and
to serve to enable private traffic within data centers.
Within AWS, instances can communicate using the private IP address (That traffic isn’t routed by the public Internet; instead, it’s confined within AWS) while people can access them over the internet using a public address.
Instance IP addresses aren’t persistent. Every instance that’s launched is assigned an address from the general pool of IP addresses. This is a problem as if you have a website how do you handle frequent changes to your IP address, due to launching new instances, whenever you have to restart a crashed instance, update software etc. AWS solves this problem with Elastic IP addresses.
Elastic IP Addresses = A public IP address assigned to your account that can be substituted for the temporary public IP address that’s assigned to your instance at launch-time. You request an Elastic IP address from AWS, and it’s provided to you so that you can assign a permanent IP address to your new instances. You can then create a public DNS entry with your URL (say, www.example. com) and the Elastic IP address AWS assigned to your account.
AWS Security
VPC
As useful as EC2 undoubtedly is, many customers prefer a more secure offer- ing. A potential vulnerability in applications is present when each EC2 instance has a public IP address.
Fortunately, AWS addresses this problem with its Virtual Private Cloud (VPC) offering. In broad terms, VPC lets users segregate their instances and shield them from direct Internet access. VPC makes it possible to implement AWS applications that are more secure.
VPC operates by providing you with a virtual network topology that’s separate from the general AWS environment. Another way to say this is that via the use of clever software, AWS provides you with a segregated computing environment. Instances are located within your own, private VPC, with no access to them other than via the VPC environment. In a certain sense, what you end up with isn’t dissimilar from a VLAN environment.
Using a VPC, you can create a separate set of resources that carry private IP addresses within a range you select. You set rules for how traffic enters and leaves instances within the VPC. You can choose to make instances accessible to the public Internet via Elastic IP addresses. Moreover, you can create subnets (in effect, subdivisions of the overall VPC) and control access to and from the subnets and between subnets.
Types of Subnets
Every VPC can have one or more subnets which can be used in a number of different scenarios.
- VPC + Public Subnet — public subnet is accessible to the public Internet, and instances within a public subnet can directly access the Internet with inbound or outbound traffic. By default, every VPC is cre- ated with a public subnet.
- VPC + Public Subnet + Private Subnet — a private subnet is located within a VPC and cannot access the Internet
- VPC + Public Subnet + Private Subnet + VPN Access — Similar to the above scenario but VPN also needed.
- VPC with only private subnet and hardware VPN access — This scenario allows AWS resources to be completely isolated from public Internet access but to be accessible from an external location, such as your corporate data center.
Additional AWS Services
Elastic Load Balancer
One useful benefit of cloud computing is that it supports scalability (the abil- ity to provision large amounts of computing capacity) and elasticity (the abil- ity to easily and rapidly grow and shrink the computing capacity assigned to your application). And one benefit of AWS is that it supports these aforemen- tioned benefits more than any other cloud provider in the market. You can easily start and stop instances and add them to, or remove them from, your application, paying only for the computing capacity you consume.
One key requirement for taking advantage of these benefits is the ability to direct network traffic to these instances, and a load balancer is the solution to this requirement. A load balancer spreads load across multiple computing resources that offer the same functionality, improving the overall application performance. If you have four instances that operate as web servers, for example, a load balancer directs traffic to each of the four so that no web server is overloaded and all users experience better performance.
You can also have the problem that as capacity increases you need to increase the capacity of the load balancers themselves!
AWS has Elastic Load Balancing which is an easy to use, scalable and automatic Load Balancing Services.
CloudFront
The issue of latency — the length of time a network request takes to complete its roundtrip — is always a big deal as it relates to network traffic.
The solution to this problem occurred with the creation of the content deliv- ery network (CDN), which places servers around the world and allows com- panies to locate their data on the servers. For example, a company located
in the United States could use a CDN to place images in Australia; when an Australia-based user accessed the U.S.-based website, the pages were sent (provisionally) without images, and the images were then placed into the pages on their arrival in Australia. This approach allows important or change- able data to reside in the central location and allows static or infrequently changed large content files to be located near the user.
CloudFront is easy to use and inexpensive, and it makes CDN technology available to entire new user bases that were previously unable to use existing CDN solutions.
Relational Database Services
The AWS relational database service (RDS) is set up for a single purpose:
to make it easier to run relational databases in AWS.
Traditionally, companies employed database administrators to handle the administrative tasks associated with running a relational database: configuring them, backing them up, and monitoring resource consumption and performance, for example. This approach has only two problems: It’s expensive and it’s error-prone. RDS addresses this by automating a lot of this work!
RDS:
- Supports MySQL, MSSQL and Oracle.
- Makes backups based on your pre-set schedule.
- In the case of MySQL and Oracle, RDS lets you seamlessly increase the amount of storage associated with your RDS service.
ElasticCache
The most common performance bottleneck for webscale applications is the database.
If you want to stick to a relational database model, performance can be improved via caching.
You may have seen the use of caching earlier in this chapter, in my discussion of CloudFront. Database caching uses the same technique to solve a different problem; whereas CDNs are designed to address the issue of accessing data in widely dispersed locations, database caching is used to address the issue of accessing data in centralised locations that require a solution beyond hard drives. This type of caching is called memcaching — in-memory caching.
When sending a database request, an extra step is added which first checks whether the data is memcached. If it is, it’s retrieved from memcached; if not, it’s retrieved from the database and, before being returned to the requesting application, is placed in memcached. This enables subsequent queries to find the data in memcached and avoid accessing the database.