AWS Cert Notes

My AWS cert notes.

This project is maintained by jangroth


SysOps Administrator Associate

5/2018 - 9/2018


Monitoring And Metrics

Virtualization Types

Linux Amazon Machine Images use one of two types of virtualization:

AMI Type Effect
PV Paravirtual Historically better performance than HVM, but no longer the case
HVM Hardware virtual machine More modern, same or better performance than PV

EC2 Instance Types

General Purpose Balance of computer, memory and networking
M5
(2017)
* Require HVM AMIs
* Instance store via EBS or NVMe SSD (physically connected to to the host server)
M4
(2015)
* Allows enhanced networking
* EBS-optimized
M3
(2012)
* SSD (instance) store
T3
(2018)
* 30% better price performance
T2
(2014)
* Intented for workloads that do not use the full CPU constantly (e.g. web server)
* Allows burstable performance
* Burst credits allow to 'burst' past the baseline performance up to 100%
* 1 credit = 100% load per core per minute
* Credits are earned per hour, expire after 24h
* EBS storage only
Compute optimized Lowest prize for compute performance
C5
(2016)
* Intel Skylake
* Use Nitro, Amazon’s lightweight hardware accelerated hypervisor
* Better performance and pricing than C4
C4
(2015)
* Intel Haswell
* Optimized for EC2
* Allows enhanced networking and clustering
* EBS-optimized
C3
(2013)
* SSD (instance) store
* Allows enhanced networking and clustering
Memory optimized Lowest prize for memory performance
Z1d
(2018)
* Offer both high compute capacity and a high memory footprint
* Ideal for workloads with high per-core licensing costs
X1
(2016)
* One of the lowest price per GiB of RAM
* SSD storage and EBS-optimized by default
* X1e has even more RAM
R5
(2018)
* Use Nitro, Amazon’s lightweight hardware accelerated hypervisor
R4
(2016)
* Improved networking and EBS performance
R3
(2014)
* SSD (instance) store
* High memory capacity
* Allows enhanced networking
GPU optimized .
P3
(2017)
* Faster than P2
P2
(2016)
* Intended for general-purpose GPU compute applications
G3
(2017)
* Optimized for graphics-intensive applications
* Faster then G2
G2
(2013)
* High frequency processors
* High-performce NVIDIA GPUs
Storage optimized Very fast SSD-backed instance storage optimized for high random I/O and high IOPS
H1
(2017)
* HDD-based local storage
* deliver high disk throughput
* Balance of compute and memory
I3
(2016)
* (NVMe) SSD-backed instance storage optimized for low latency
* very high random I/O performance
D2
(2015)
* Lowest price per disk throughput performance
I2
(2013)
* SSD (instance) store
* Allows enhanced networking
* Supports TRIM (more efficient SSD operations)
RDS instance types Optimized to fit different relational database use cases
db. General purpose, memory optimized, burstable performance

.*

EC2 Monitoring

EC2 Status Checks

System Status Check

Instance Status Check

EBS Monitoring

EBS Status Checks

EBS Performance Essentials

IOPS (Input/Output Operations Per Second) is a common performance measurement used to benchmark computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN).

. gp2 io1 st1 sc1
Volume type General purpose SSD Provisioned IOPS SSD Throughput optimized HDD Cold HDD
Purpose Balances price and performance For mission-critical low-latency or high-throughput workloads Low cost HDD volume designed for frequently accessed, throughput-intensive workloads Lowest cost HDD volume designed for less frequently accessed workloads
Volume Size 1 GiB - 16 TiB 4 GiB - 16 TiB 500 GiB - 16 TiB 500 GiB - 16 TiB
Max. IOPS(1)/Volume 10,000 32,000 500 250
Max. Throughput/Volume 160 MiB/s 500 MiB/s 500 MiB/s 250 MiB/s
IOPS * 3 IOPS per GB (larger volume means more IOPS)
* 100 IOPS <-> 10,000 IOPS
* Can burst to 3,000 IOPS if volume size is < 1TB
* Requires credits that are acquired per 3 IOPS/GB/second
* Max 5.4 miilion credit (also intitial value), enough for 3,000 IOPS for 30min
* Running out of credits reverts volume back to baseline performance
* 30 IOPS per GB (larger volume means more IOPS), up to 20,000
* Does not burst, delivers consistent IOPS rate instead
. .

(1) gp2/io1 based on 16 KiB I/O size, st1/sc1 based on 1 MiB I/O size

EFS Monitoring

Performance comparison

. Amazon EFS Amazon EBS Provisioned IOPS (io1)
Per-operation latency Low, consistent latency. Lowest, consistent latency.
Throughput scale 10+ GB per second. Up to 2 GB per second.

Storage Characteristics Comparison

. Amazon EFS Amazon EBS Provisioned IOPS
Availability and durability Data is stored redundantly across multiple AZs. Data is stored redundantly in a single AZ.
Access Up to thousands of Amazon EC2 instances, from multiple AZs, can connect concurrently to a file system. A single Amazon EC2 instance in a single AZ can connect to a file system.
Use cases Big data and analytics, media processing workflows, content management, web serving, and home directories. Boot volumes, transactional and NoSQL databases, data warehousing, and ETL.

S3 vs EFS vs EBS Comparison

Amazon S3 Amazon EBS Amazon EFS
Can be publicly accessible Accessible only via the given EC2 Machine Accessible via several EC2 machines and AWS services
Web interface File System interface Web and file system interface
Object Storage Block Storage Object storage
Scalable Hardly scalable Scalable
Slower than EBS and EFS Faster than S3 and EFS Faster than S3, slower than EBS
Good for storing backups Is meant to be EC2 drive Good for shareable applications and workloads

CloudWatch

Monitoring service that plugs into many other services

Key metrics for EC2

Metric Effect
CPUUtilization The total CPU resources utilized within an instance at a given time.
DiskReadOps,DiskWriteOps The number of read (write) operations performed on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
DiskReadBytes,DiskWriteBytes The number of bytes read (written) on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
NetworkIn,NetworkOut The number of bytes received (sent) on all network interfaces by the instance
NetworkPacketsIn,NetworkPacketsOut The number of packets received (sent) on all network interfaces by the instance
StatusCheckFailed,StatusCheckFailed_Instance,StatusCheckFailed_System Reports whether the instance has passed both/instance/system status check in the last minute.

Key metrics for EBS

Metric Effect
VolumeReadBytes,VolumeWriteBytes sum reports total bytes transferred, average also useful
VolumeReadOps,VolumeWriteOps total number of IO operations
VolumeQueueLength Number of read/write operation requests waiting to finish
VolumeTotalReadTime,VolumeTotalWriteTime Total number of seconds spent by all operations in a given time
VolumeThroughputPercentage Percentage of IOPS that was achieved out of total provisioned IOPS
VolumeConsumedReadWriteOps Total amount of r/w operations consumed within a specific time period

Key metrics for EFS

Metric Effect
BurstCreditBalance The number of burst credits that a file system has.
ClientConnections The number of client connections to a file system.
DataReadIOBytes,DataWriteIOBytes The number of bytes for each file system read(write) operation.
MetadataIOBytes The number of bytes for each metadata operation.
PercentIOLimit Shows how close a file system is to reaching the I/O limit of the General Purpose performance mode.
PermittedThroughput The maximum amount of throughput a file system is allowed.
TotalIOBytes The number of bytes for each file system operation, including data read, data write, and metadata operations.

Key metrics for ELB (classic load balancer)

Metric Effect
Latency Time it takes to receive an response. Measure max and average
BackendConnectionErrorr Number of not successfully established connections to registered instances, measure sum and look at difference between min and max
SurgeQueueLength Total number of request waiting to get routed, look at max and average
SpilloverCount Dropped requests because of exceeded surge queue. Look at sum
HTTPCode_ELB_3XX_Count
HTTPCode_ELB_4XX_Count
HTTPCode_ELB_5XX_Count
The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.
RequestCount Number of completed requests
HealthyHostCount,UnhealthyHostCount Self explainatory

spillover and surge queue give an indication of the ELB being overloaded

Key metrics for ALB (active load balancer)

Metric Effect
RequestCount Number of completed requests
HealthyHostCount,UnhealthyHostCount Self explainatory
TargetResponseTime The time elapsed after the request leaves the load balancer until a response from the target is received.
HTTPCode_ELB_3XX_Count
HTTPCode_ELB_4XX_Count
HTTPCode_ELB_5XX_Count
The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.

Key metrics for NLB (network load balancer)

Metric Effect
processedbyte The total number of bytes processed by the load balancer, including TCP/IP headers.
tcp_client_reset_count the total number of reset (rst) packets sent from a client to a target.
tcp_elb_reset_count the total number of reset (rst) packets generated by the load balancer.
tcp_target_reset_coun the total number of reset (rst) packets sent from a target to a client.

Key metrics for elasticache

Supports memcached and redis

Metric memcached redis
. Designed for simplicity Supports a much richer set of features. can be backed up if in cluster mode
cpu utilization * multithreaded
* stay under 90%/#cores
* -> increase # read replicase or use larger cache instance
* single threaded
* stay under 90%
* -> increase size of node or add more nodes
evictions * -> increase size or add nodes to cluster * -> increase node size
concurrent connections * -> check application logic * -> check application logic
swap usage * avoid swapping
-> increase memcached_connections_overhead
avoid swapping
* -> increase node size
* -> increase memory connection overhead (will decrease memory available for cache)

.*

Key metrics for RDS

Metric Effect
CPUUtilization Percentage of CPU utilization
DatabaseConnections Number of connections that we have at a given point in time
DiskQueueDepth Number of read/write requests waiting to access the disk
FreeableMemory Amount of available RAM
FreeStorageSpace Amount of available storage space
SwapUsage When data is stored in memory on disk
Increase In this usually has to do with running out of available RAMReadIOPS/WriteIOPS
IOPS Represent the number of I/O operations completed per secondIf we don’t have enough IOPS, performance will slow down
ReadLatency/WriteLatency * Average amount of time taken per disk I/O operation (input/output)
* High latency can be solved with more IOPSReadThroughput/WriteThroughput
* Average is number of bytes read or written to or from disk per second

.*


Costs

Consolidated Billing

Set up a billing account to pay for multiple linked accounts at the same time.

Limits:

Billing Metrics & Alarms

Costs Optimization

Cost Explorer


High Availability

Scalability & Elasticity Fundamentals

. Elasticity Scalability
. Scaling up/down on demand Scaling for growth in order to meet long term requirements
typically does not focus on shrinking back
DynamoDb Can provision more or less throughput Stores as much data as we like, scales transparently
EC2 Use autoscaling More instances or bigger instance types
RDS ./. Bigger instances, more read replicas

Reserved Instances

Autoscaling vs Resizing

Load Balancers

. ALB NLB ELB
. Active Load Balancer Network Load Balancer Classic Load Balancer
Layer 7 (application layer) 4 (transport layer) EC2-classic network (deprecated)
Protocoll HTTP, HTTPS TCP TCP, SSL, HTTP, HTTPS
Health checks
Cloudwatch metrics
Logging
Zone failover
Connection draining
Load balancing to different ports on the same instance .
WebSockets .
IP Addresses as targets .
Load balancing deletion protection .
Path-based routing . .
Host-based routing . .
Native http/2 . .
Configurable idle connection timeout .
Cross zone load-balancing
SSl-offloading .
Server-name indication .
Sticky-sessions .
Backend server encryption .
Static IP . .
Elastic IP . .
Preserve source IP address . .
Resource-based IAM permissions
Tag-based IAM permissions .
Slow start . .
User authenticaion . .
Redirects . .
Fixed responses . .

Elastic Load Balancer ('Classic LB')

Overview

Sticky Sessions

RDS HA

HA for IP-based Applications

HA/Fault Tolerance for Bastion Hosts


Analysis

Optimize the environment to ensure maximum performance

Offloading database workload

Looking at EBS volumes

Prewarming ELBs

Identify Performance Bottlenecks and Implement Remedies

Resizing or changing EBS root volumes

Setting up certificates for Elastic Load Balancers

Network bottlenecks

Identify Potential Issues on a Given Application Deployment

EBS Root Devices on Terminated Instances - Ensuring Data Durability

Troubleshooting Auto Scaling Issues


OpsWorks

Overview and components

BerkShelf

TODO: Quickstart OpsWorks

Cloudformation

Overview

Templates

Intrinsic Functions


Backups & Recovery

AWS Services with automated backups

Disaster Recovery Scenarios

DR of on-prem infra

DR of cloud infra

DR of RDS data

Storing log files and backups


Security

Implement and Manage Security Policies

IAM

IAM is a global service that helps to securely control access to AWS resources.

Policies

	{
		"Version": "2012-10-17",
		"Statement": [
			{
				"Effect": "Allow",
				"Action": "s3:ListAllMyBuckets",
				"Resource": "arn:aws:s3:::*"
			},
			{
				"Effect": "Allow",
				"Action": [
						"s3:ListBucket",
						"s3:GetBucketLocation"
				],
				"Resource": "arn:aws:s3:::productionapp"
			},
			{
				"Effect": "Allow",
				"Action": [
					"s3:GetObject",
					"s3:PutObject",
					"s3:DeleteObject"
				],
				"Resource": "arn:aws:s3:::productionapp/*"
			}
		]
	}

IAM Policies

IAM roles and EC2

S3 IAM and bucket policy concepts

Defaults

Bucket policies (resource level)

{
"Version":"2012-10-17",
"Statement":
  [
    {
      "Sid":"PutObjectAcl",
      "Effect":"Allow",
      "Principal":
      {
        "AWS":
          [
           "arn:aws:iam::111122223333:tom", "arn:aws:iam::444455556666:chris"
          ]
      },
      "Action":
        [
          "s3:PutObject",
          "s3:PutObjectAcl"
        ],
        "Resource":
        [
          "arn:aws:s3:::examplebucket/*"
        ]
    }
  ]
}

ACLs

<?xml version="1.0" encoding="UTF-8"?>
<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Owner>
    <ID>*** Owner-Canonical-User-ID ***</ID>
    <DisplayName>owner-display-name</DisplayName>
  </Owner>
  <AccessControlList>
    <Grant>
      <Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:type="Canonical User">
        <ID>*** Owner-Canonical-User-ID ***</ID>
        <DisplayName>display-name</DisplayName>
      </Grantee>
      <Permission>FULL_CONTROL</Permission>
    </Grant>
  </AccessControlList>
</AccessControlPolicy> 

IAM policies (user level)

. .
arn:partition:service:region:namespace:relative-id arn:aws:s3:::mybucket
arn:aws:s3:::* All buckets and objects in account
arn:aws:s3:::mybucket mybucket
arn:aws:s3:::mybucket/* All objects in mybucket
arn:aws:s3:::mybucket/mykey mykey in mybucket
arn:aws:s3:::mybucket/developers/($aws:username)/ folder matching the accessing user's name

Cloudfront

Ensure Data Integrity and Access Controls when Using the AWS Platform

MFA

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["ALICE", "BOB"]},
    "Action": [ "s3:PutObject", "s3:DeleteObject" ],
    "Resource": ["arn:aws:s3:::Alice-Bucket/*"],
    "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
  }]
}

Secure Token Service (STS)

Terms

Scenarios

Share responsibility model

AWS and IT Audits


Networking

Route53 Routing Policies

DNS Failover

Weighted

Latency-based

VPC Essentials

Default VPC (Amazon specific)

Non-default VPC (regular VPC)

VPC Peering

VPC Scenarios

Components

Security

Network ACL

Security Groups

Structure & package flow

Connection To On-prem Network/Direct Connect

TODO: VPN vs direct connect. Can I use VPN instead of DC?

Limits:

. .
VPCs per region 5
Subnets per VPC 200
Customer gateways per region 50
Virtual private gateways per region 5
Virtual private gateways per VPC 1
Gateway per region 5 Internet
Elastic IPs per account per region 5
VPN connections per region 50
Route tables per region 200
Security groups per region 500

Etc

Accessing the OS

SQS

DynamoDb