↖↓ DevOps Engineer Professional

10/2019 - 6/2020

↖↑↓ Exam Objectives

Implement and manage continuous delivery systems and methodologies on AWS
Implement and automate security controls, governance processes, and compliance validation
Define and deploy monitoring, metrics, and logging systems on AWS
Implement systems that are highly available, scalable, and self-healing on the AWS platform
Design, manage, and maintain tools to automate operational processes

↖↑↓ Content

Domain 1: SDLC Automation
Domain 2:Configuration Management and Infrastructure as Code
Domain 3: Monitoring and Logging
Domain 4: Policies and Standards Automation
Domain 5: Incident and Event Response
Domain 6: High Availability, Fault Tolerance, and Disaster Recovery

↖↑↓ Domain 1: SDLC Automation

Apply concepts required to automate a CI/CD pipeline
Determine source control strategies and how to implement them
Apply concepts required to automate and integrate testing
Apply concepts required to build and manage artifacts securely
Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to implement them using AWS Services

↖↑↓ Domain 2:Configuration Management and Infrastructure as Code

Determine deployment services based on deployment needs
Determine application and infrastructure deployment models based on business needs
Apply security concepts in the automation of resource provisioning
Determine how to implement lifecycle hooks on a deployment
Apply concepts required to manage systems using AWS configuration management tools and services

↖↑↓ Domain 3: Monitoring and Logging

Determine how to set up the aggregation, storage, and analysis of logs and metrics
Apply concepts required to automate monitoring and event management of an environment
Apply concepts required to audit, log, and monitoroperating systems, infrastructures, and applications
Determine how to implement tagging and other metadata strategies

↖↑↓ Domain 4: Policies and Standards Automation

Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security
Determine how to optimize cost through automation
Apply concepts required to implement governance strategies

↖↑↓ Domain 5: Incident and Event Response

Troubleshoot issues and determine how to restore operations
Determine how to automate event managementand alerting
Apply concepts required to implement automated healing
Apply concepts required to set up event-driven automated action

↖↑↓ Domain 6: High Availability, Fault Tolerance, and Disaster Recovery

Determine appropriate use of multi-AZ versus multi-region architectures
Determine how to implement high availability, scalability, and fault tolerance
Determine the right services based on business needs (e.g., RTO/RPO, cost)
Determine how to design and automate disaster recovery strategies
Evaluate a deployment for points of failure

↖↑↓ Concepts

↖↑↓ Overview

General Strategies

Strategy	Deploy time	Downtime	Testing	Deployment Costs	Impact of failed deployment	Rollback process
Single Target Deployment	🕑	complete deploy	limited	no extra costs	downtime	redeploy
All At Once	🕑	complete deploy	limited	no extra costs	downtime	redeploy
Minimum In Service	🕑🕑	none	can test new version while old is still active	no extra costs	no downtime	redeploy
Rolling	🕑🕑🕑	usually none	can test new version while old is still active	no extra costs	no downtime	redeploy
Rolling With Extra Batches	🕑🕑🕑	usually none	can test new version while old is still active	little extra costs	no downtime	redeploy
Blue/Green	🕑🕑🕑🕑	none	can test prior to cutover	extra costs for new stack	no downtime	revert cutover

Strategies per AWS service

Strategy	Auto Scaling Group	CodeDeploy EC2/On-Premises	CodeDeploy ECS	CodeDeploy Lambda	Elastic Beanstalk	OpsWorks
Single Target Deployment	.	.	.	.	redeploy	.
All At Once	`AutoScalingReplacingUpdate`	All-at-once	.	.	all at once	.
Minimum In Service	.	.	.	.	rolling	.
Rolling	`AutoScalingRollingUpdate`	One-at-at-time	.	.	rolling	.
Rolling With Extra Batches	.	.	.	.	rolling with extra batches	.
Blue/Green	.	Traffic is shifted to a replacement set of instances * All-at-once * Half-at-a-time * One-at-a-time	Traffic is shifted to a replacement task set * Canary * Linear * All-at-once	Traffic is shifted to a new Lambda version * Canary * Linear * All-at-once	immutable comes close or: create new environment and use DNS	create new environment and use DNS
Canary	.	.	See above * Canary	See above * Canary	Traffic Splitting	.

↖↑↓ Single target deployment

System	Deploy
`v1`	Initial State
`v1-2`	Deployment Stage
`v2`	Final State

When initiated a new application version is installed on the (single) target server
Practically not in use any more

pros	cons
Simple & very few moving parts	Downtime
Deployment is faster than other methods	Limited testing

↖↑↓ All-at-once deployment

System	Deploy	.
`v1` `v1` `v1` `v1` `v1`	Initial State	.
`v1-2` `v1-2` `v1-2` `v1-2` `v1-2`	Deployment Stage	.
`v2` `v2` `v2` `v2` `v2`	Final State	.

Single build stage triggers multiple target environments

pros	cons
Deployment is relatively fast	Downtime (like STD)
.	Limited testing (like STD)
.	Everything in-flight - can't stop deployment/rollback if targets fail
.	More complicated than STD, often requires orchestration

↖↑↓ Minimum in-service style deployment

System	Deploy	.
`v1` `v1` `v1` `v1` `v1`	Initial State	.
`v1` `v1` `v1-2` `v1-2` `v1-2`	Deployment Stage 1	Minimum targets required for operational state: 2
`v1-2` `v1-2` `v2` `v2` `v2`	Deployment Stage 2	.
`v2` `v2` `v2` `v2` `v2`	Final State	.

Orchestration engines know how many targets are required for a minimum operational state
System ensures that this number of instances is active while completing the rest of the deployment as quickly as possible
Happens to as many targets as possible
Suitable for large environments

pros	cons
No downtime	Many moving parts, requires orchestration
Deployment happens in (two) stages	.
Generally quicker & with less stages than rolling deployments	.

↖↑↓ Rolling deployment

System	Deploy	.
`v1` `v1` `v1` `v1` `v1`	Initial State	.
`v1-2` `v1-2` `v1` `v1` `v1`	Deployment Stage 1	Deploy first set of targets
`v2` `v2` `v1-2` `v1-2` `v1`	Deployment Stage 2	Only if health checks succeed: Deploy next set of targets
`v2` `v2` `v2` `v2` `v1-2`	Deployment Stage 3	Only if health checks succeed: Deploy next set of targets
`v2` `v2` `v2` `v2` `v2`	Final State	.

Do x deployments at once, than move on to the next x
Flexible on failing health checks - roll back stage or whole deployment
Was considered cheapest and least risk method until hourly and consumption-based billing entered the market

pros	cons
No downtime (if number of stage deployments is small enough)	Does not necessarily maintain overall application health
Can be paused to allow for multi-version testing	Many moving parts, requires orchestration
.	Can be least efficient deployment method in terms of time taken

↖↑↓ Rolling deployment with extra batches

System	Deploy	.
`v1` `v1` `v1` `v1`	Initial State	.
`v1` `v1` `v1` `v1` `.` `.`	Deployment Stage 1	Deploy new batch of servers
`v1` `v1` `v1` `v1` `v2` `v2`	Deployment Stage 2	Deploy new version to new servers
`.` `.` `v1` `v1` `v2` `v2`	Deployment Stage 3	Undeploy first batch
`v2` `v2` `v1` `v1` `v2` `v2`	Deployment Stage 4	Deploy new version to first batch
`v2` `v2` `.` `.` `v2` `v2`	Deployment Stage 5	Undeploy second batch
`v2` `v2` `v2` `v2`	Deployment Stage 6	Decommission servers from second batch

Deploy new batch of servers first
Very similar to rolling deployment

pros	cons
No downtime (if number of stage deployments is small enough)	Does not necessarily maintain overall application health
Can be paused to allow for multi-version testing	Many moving parts, requires orchestration
.	Can be least efficient deployment method in terms of time taken

↖↑↓ Blue/green deployment

System	Deploy	.
(`v1` `v1` `v1`)()	Initial State	Blue environment, all traffic goes here
(`v1` `v1` `v1`)(`v` `v` `v`)	Deployment Stage 1	Bring up green environment
(`v1` `v1` `v1`)(`v2` `v2` `v2`)	Deployment Stage 2	Deploy application into green environment
(`v1` `v1` `v1`)(`v2` `v2` `v2`)	Deployment Stage 3	Cutover - direct traffic from blue to green
() (`v2` `v2` `v2`)	Final State	Blue environment removed

Deploys to a separate environment to provide outage risks
Different cutover techniques
- DNS routing
- Swap Auto Scaling Group behind load balancer
- Elastic Beanstalk immutable: Merge new ASG into old one
  - Update auto scaling launch configuration
  - Swap environment of an AWS Elastic Beanstalk application
  - Clone stack in AWS OpsWorks and update DNS

pros	cons
Rapid all-at-once deployment process, no need to wait for per target health checks	Requires advanced orchestration tooling
Can test health prior to cutover	Significant cost for a second environment (mitigated by advanced billing models)
Clean & controlled cutover (various options)	.
Easy rollback	.
Can be fully automated using advanced templating	.
By far the best method in terms of risk mitigation and minimal user impact	.

↖↑↓ Red/black deployment

... are just like blue/green, but they happen at a much faster rate.

Example:

DNS -> LB -> ASG1
DNS -> LB -> ASG2

↖↑↓ A/B testing

Like blue/green deployment, but only shift a percentage of the traffic over, not everything at once
Gradually increase percentage of traffic sent to the new environment
Allows for different end goal:
- Measuer user feedback and decide whether a feature should be rolled out or rolled back
- This can be decided well before the new environment gets 100% of the traffic
- Could only role out a feature to 10% of the users and switch back after metrics have been collected
- -> different to red/green deployment
Achieved via AWS route 53 weighted routing

pros	cons
.	Different versions across the environment
.	DNS switching affected by caches and other DNS related issues

↖↑↓ Canary deployment

Like A/B testing, but gradually increases percentage of traffic to green environment

↖↑↓ EC2 Concepts

Instance Profile
Load Balancing with ELB/ALB
Auto Scaling
On-Premises strategies

↖↑↓ Instance Profile

A container for an IAM role that you can use to pass role information to an EC2 instance when the instance starts.
An EC2 Instance cannot be assigned a role directly, but it can be assigned an Instance Profile which contains a role.
If you use the AWS Management Console to create a role for Amazon EC2, the console automatically creates an instance profile and gives it the same name as the role.
If you manage your roles from the AWS CLI or the AWS API, you create roles and instance profiles as separate actions.

↖↑↓ Load Balancing with ELB/ALB

↖↑↓ ELB/ALB Logs

Access Logging is an optional feature of Elastic Load Balancing that is disabled by default.
After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket as compressed files.
You can disable access logging at any time.
Can log every 5 or 60 minutes
There's no additional charge

↖↑↓ ELB/ALB Health Checks

If you have associated your Auto Scaling Group with a Classic Load Balancer, you can use the load balancer health check to determine the health state of instances in your Auto Scaling Group. By default, an Auto Scaling Group periodically checks the health state of each instance.
Your Application Load Balancer periodically sends requests to its registered targets to test their status. These tests are called health checks.
The status of the instances that are healthy at the time of the health check is InService. The status of any instances that are unhealthy at the time of the health check is OutOfService.

↖↑↓ ELB Security

Need end-to-end security
Encrypt all communication
Use HTTPS (layer 7) or SSL (layer 4)
Need to deploy an X.509 certificate on ELB
Can configure back-end authentication
- Once configured, ELB only communicates with an instance if it has a matching public key

↖↑↓ Auto Scaling

Amazon EC2 Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling Groups. You can specify the minimum number of instances in each Auto Scaling Group, and Amazon EC2 Auto Scaling ensures that your group never goes below this size. You can specify the maximum number of instances in each Auto Scaling Group, and Amazon EC2 Auto Scaling ensures that your group never goes above this size. If you specify the desired capacity, either when you create the group or at any time thereafter, Amazon EC2 Auto Scaling ensures that your group has this many instances. If you specify scaling policies, then Amazon EC2 Auto Scaling can launch or terminate instances as demand on your application increases or decreases.

Auto scaling can play a major role in deployments
Need to avoid downtime during deployments
How long does it take to deploy code and configure an instance?
How do you test a new launch configuration?
How do you phase out older launch configurations?
Use lifecycle hooks for custom actions
CloudFormation init scripts
Cloud init scripts
Scale out/scale in
Can launch spot instances as well as on-demand instances (also configure ratio between these instance options)
On AWS: Service - FAQs - User Guide

↖↑↓ Components

Auto Scaling Group

Contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management
To use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies

Launch Configuration

Instance configuration template that an Auto Scaling Group uses to launch EC2 instances
One Launch Configuration per ASG, can be used in many ASGs though
Can't be modified, needs to be recreated
To change instance type, copy the old Launch Configuration, change instance type, double max and desired, wait till new values have propagated, revert max and desired

Launch Template

Similar to Launch Configuration
Launch Templates can be used to launch regular instances as well as Spot Fleets.
Allows to have multiple versions of the same template
Can source another template to build a hierachy
With versioning, you can create a subset of the full set of parameters and then reuse it to create other templates or template versions
AWS recommends to use Launch Templates instead of Launch Configurations to ensure that you can use the latest features of Amazon EC2

Termination Policy

To specify which instances to terminate first during scale in, configure a Termination Policy for the Auto Scaling Group.
Policies will be applied to the AZ with the most instances
Can be combined with instance protection to prevent termination of specific instances, this starts as soon as the instance is in service.
- Instances can still be terminated manually (unless termination protection has been enabled)
- Unhealthy instance will still be replaced
- Spot instance interuptions can still occur
  - Instance protection can also be applied to an Auto Scaling Group - protecting the whole group: protect from scale in
Can specify multiple policies, will be executed in order until an instance has been found
Default policy being last in a list of multiple policies is like catchAll, will always find an instance
- Determine which AZ has most instances and at least one instance that's not protected from scale in
- [For ASG with multiple instance types and purchase options]: Try to align remaining instances to allocation strategy
- [For ASG that uses Launch Templates]: Terminate one of the instances with the oldest Launch Template
- [For ASG that uses Launch Configuration]: Terminate one of the instances with the oldest Launch Configuration
- If there are multiple instances to choose from, pick the one nearest to the next billing hour
- Choose one at random

.	.	.
0	Default	Designed to help ensure that your instances span Availability Zones evenly for high availability `3`->`4`->`random`
1	OldestInstance	Useful when upgrading to a new EC2 instance type
2	NewestInstance	Useful when testing a new launch configuration
3	OldestLaunchConfiguration	Useful when updating a group and phasing out instances
5	OldestLaunchTemplate	Useful when you're updating a group and phasing out the instances from a previous configuration
4	ClosestToNextInstanceHour	Next billing hour - useful to maximize instance us
6	AllocationStrategy	Useful when preferred instance types have changed

Auto Scaling Lifecycle Hooks

The EC2 instances in an Auto Scaling Group have a path, or lifecycle, that differs from that of other EC2 instances. The lifecycle starts when the Auto Scaling Group launches an instance and puts it into service. The lifecycle ends when you terminate the instance, or the Auto Scaling group takes the instance out of service and terminates it.

Allows to cater for applications that take longer to deploy/tear-down.

After Lifecycle Hooks are added to the instance:

ASG responds to scale-out/scale-in events
Lifecycle Hook puts instance into pending:wait/terminating:wait state, instance is paused until we continue or timeout
- This can be extended by configuring a heartbeat
Custom actions are performed through one or more of these options:
- CloudWatch Events target to invoke Lambda function
- Notification target for Lifecycle Hook is defined
- Script on instance runs as instance starts, script can control lifecycle actions
Can also notify SQS or SNS, but Lambda is the preferred option
Going into pending:proceed/terminating:proceed after that.
By default, the instance remains in a wait state for one hour, and then the Auto Scaling Group continues the launch or terminate process

.	.
Scale out	`scale out` -> Pending -> Pending:Wait -> Pending:Proceed -> InService
Scale in	`scale in` -> Terminating -> Terminating:Wait -> Terminating:Proceed -> Terminated
Troubleshoot	InService -> StandBy

Scaling

Scaling is the ability to increase or decrease the compute capacity of your application. Scaling starts with an event, or scaling action, which instructs an Auto Scaling Group to either launch or terminate Amazon EC2 instances.

Manual scaling
- Specify min/max/desired
Scheduled scaling
- Specify time and date
Scaling Policies
- Target Tracking Scaling Policy
  - With target tracking scaling policies, you select a scaling metric and set a target value. Amazon EC2 Auto Scaling creates and manages the CloudWatch alarms that trigger the scaling policy and calculates the scaling adjustment based on the metric and the target value.
- Simple Scaling Policy
  - With simple and step scaling policies, you choose scaling metrics and threshold values for the CloudWatch alarms that trigger the scaling process.
  - Both require you to create CloudWatch alarms for the scaling policies.
  - Both require you to specify the high and low thresholds for the alarms.
  - Both require you to define whether to add or remove instances, and how many, or set the group to an exact size.
- Step Scaling Policy
  - The main difference between the policy types is the step adjustments that you get with step scaling policies. When step adjustments are applied, and they increase or decrease the current capacity of your Auto Scaling Group, the adjustments vary based on the size of the alarm breach.
  - We recommend that you use step scaling policies instead of simple scaling policies, even if you have a single scaling adjustment
- After a scaling activity is started, the policy continues to respond to additional alarms, even while a scaling activity or health check replacement is in progress.
- Therefore, all alarms that are breached are evaluated by Amazon EC2 Auto Scaling as it receives the alarm messages.
- However, scaling actions from previous alarms are taken into account (thereby not changing the absolute outcome of the scaling action)
Predictive scaling
- AWS using data collection from actual EC2 usage

Protect instances from scaling in by setting termination protection, e.g. per API call

Long running workers
'Special' instances, e.g. master of a cluster

Notifications

Can send SNS notifications
- Success/failure on instance launch/termination
Better to integrate with CloudWatch Events
No direct integration with CloudWatch Logs
- However if the CloudWatch agent is installed on the instances they will send logs

Health Checks

Amazon EC2 Auto Scaling can determine the health status of an instance using one or more of the following:

Status checks provided by Amazon EC2 to identify hardware and software issues that may impair an instance. The default health checks for an Auto Scaling group are EC2 status checks only.
Health checks provided by Elastic Load Balancing (ELB). These health checks are disabled by default but can be enabled.
Your custom health checks.

↖↑↓ Integration with other services

ALB

ALB -> Target Group <- ASG

Configure ASG
- Target Group from ALB
- Health Check Type ELB
Can configure Slow Start Mode on Target Group level (up to 15min), so that new instances don't get the full load immediately
Should redirect from http (80) to https (443)
Can put instances into Standby so that they don't receive traffic.
Can put instances into Scale In Protection so that they don't get terminated on Scale In.

CodeDeploy

Install CodeDeploy agent on instances as per UserData
Create CodeDeploy Application and Deployment and tie it to the Auto Scaling Group
- CodeDeploy will create Deployments for existing and new instances
Can choose between in-place and blue-green deployments
- Blue-green will provision new Auto Scaling Group
  - Must have a load balancer configured so that traffic can be switched over
When deploying a new application version this will not be considered 'latest' until it has succeeded
- So for Auto Scaling events, the old version is still getting deployed

CloudFormation

CreationPolicy
- Wait for notification from instances that they created successfully
- Instances use cfn-signal in UserData section
UpdatePolicy
- If Launch Configuration or Launch Template are changing, deployed instances will not update unless defined in UpdatePolicy
- Can configure policy for rolling, replacing and scheduled updated

SQS

It's common pattern to have instances from within an ASG consuming messages from SQS
Can implement custom metric in CloudWatch to control Auto Scaling behaviour
- E.g. size of individual backlog on instances

↖↑↓ Deployment Concepts

Name	Before	Intermediate	After
In Place	`[ASG [Instance 1]]`	-	`[ASG [Instance 2]`
Rolling	`[ASG [Instance 1]]`	`[ASG [Instance 1,2]]`	`[ASG [Instance 2]`
Replace	`[ALB [ASG1 [...]]]`	`[ALB [ASG1 [...]][ASG2 [...]]`	`[ALB [ASG2 [...]]]`
Blue/Green	`[R53 [ALB1 [ASG1 [...]]]]`	`[R53 [ALB1 [ASG1 [...]]]][ALB2 [...]]`	`[R53 [ALB2 [ASG2 [...]]]]`

↖↑↓ Troubleshooting

Possible Problems

Attempting to use wrong subnet
AZ no longer available or supported (outage)
Security group does not exist
Associated keypair does not exist
Auto scaling configuration is not working correctly
Instance type specification does not exist in that AZ
Invalid EBS device mapping
Attempt to attach EBS block device to instance-store AMI
AMI issues
Attempt to use placement groups with instance types that don't support that
AWS running out of capacity in that AZ
If an instance is stopped, e.g. for updating it, auto scaling will consider it unhealthy and terminate - restart it. Need to suspend auto scaling first.

Suspending ASG processes

You can suspend and then resume one or more of the scaling processes for your Auto Scaling Group. This can be useful for investigating a configuration problem or other issues with your web application and making changes to your application without invoking the scaling processes.

.	.
`Launch`	Disrupts other processes as no more scale out
`Terminate`	Disrupts other processes as no more scale in
`HealthCheck`	.
`ReplaceUnhealthy`	.
`AZRebalance`	.
`AlarmNotification`	Suspends actions normally triggered by alarms
`ScheduledAction`	.
`AddToLoadBalancer`	Will not automatically add instances later

↖↑↓ On-Premises strategies

↖↑↓ EC2 and On-Premises VMs

Can download Amazon Linux 2 AMI in VM format to run on-premises
Can import existing VMs into EC2

↖↑↓ AWS Application Discovery Service

Gather information about On-premises instances to plan a migration
Server utilization and dependency mappings
Track with AWS Migration Hub

↖↑↓ AWS Database Migration Service

Replicate
- On-prem -> AWS
- AWS -> On-prem
- AWS -> AWS

↖↑↓ AWS Server Migration Service

Incremental replication of on-prem instances into AWS

↖↑↓ Cost Allocation Tags

Overview

↖↑↓ Overview

A tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. For each resource, each tag key must be unique, and each tag key can have only one value. You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. After you activate cost allocation tags, AWS uses the cost allocation tags to organize your resource costs on your cost allocation report, to make it easier for you to categorize and track your AWS costs. AWS provides two types of cost allocation tags, an AWS generated tags and user-defined tags. AWS defines, creates, and applies the AWS generated tags for you, and you define, create, and apply user-defined tags. You must activate both types of tags separately before they can appear in Cost Explorer or on a cost allocation report.

↖↑↓ Data/Network Protection

Data Protection
- In Transit
- At Rest
Network Protection

↖↑↓ Data Protection

↖↑↓ In Transit

TLS for transit encryption
ACM to manage SSL/TLS certificates
Load Balancers
- ELB/ALB/NLB provide SSL termination
- Can have multiple SSL certificates per ALB
- Optional SSL/TLS encryption between ALB and EC2
CloudFront with SSL
All AWS services expose https endpoint
- S3 also has http (shouldn't use it)

↖↑↓ At Rest

S3
- SSE-S3: Server-side encryption using AWS' key
- SSE-KMS: Server-side encryption using own KMS key
- SSE-C: Server-side encryption using own key
- Clinet-side encryption: Already encrypted data is send through to AWs
- Can enable default encryption on S3 buckets
- Can enforce encryption via bucket policy
- Glacier is encrypted by default
Other services:
- Easy to configure for EBS, EFS, RDS, ElastiCache, DynamoDB, ...
  - Usually either service encryption key or own KMS key
Data categories:
- PHI - protected health information
- PII - personally-identifying information

↖↑↓ Network Protection

Direct Connect
- Private direct connection between on-site and AWS
Public Internet: Use VPN
- Site-to-site VPN that supports Internet Protocol Security (IPsec)
Network ACLs for instance protection
WAF - Web Application Firewall
Security Groups
System Firewalls running on EC2 instances

↖↑↓ Multi AZ

Services where multi AZ needs to be enabled manually
Services that are implicitely multi AZ

↖↑↓ Services where multi AZ needs to be enabled manually

Assign AZ
- ELB, EFS, ASG, Elastic Beanstalk
Synchronous database for failover in different AZ
- RDS, ElastiCache, Aurora (for DB itself, data is already multi AZ)
- Elasticsearch

↖↑↓ Services that are implicitely multi AZ

S3 (with the exception of One Zone Infrequent Access)
DynamoDB
All of AWS' propriertrary services

↖↑↓ Multi Region

Services that have a concept of multi region
Multi Region with Route 53

↖↑↓ Services that have a concept of multi region

.	.
DynamoDB Global Tables	multi-way replication, implemented by Streams
AWS Config Aggregators	multi region as well as multi account
RDS	Cross-region read replicas
Aurora Global Database	One region is master, other for read & DR
EBS/AMI/RDS	Snapshots
VPC Peering	Private traffic between VPCs between regions
Route 53	Uses global network of DNS servers
S3	Cross-region replication
CloudFront	Global CDN at Edge Locations
Lambda@Edge	For global Lambda functions at Edge Locations
CloudFormation	StackSets
CodePipeline	action can be region specific -> multi-region deploys

↖↑↓ Multi Region with Route 53

Deploy stacks behind ALB in different regions
Use Route 53 routing
- Latency
- Geo-proximity
Configure health checks
- Trigger automated DNS failover
- E.g. base health checks on CloudWatch Alarms

↖↑↓ Multi Account

Services that have a concept of multi account

↖↑↓ Services that have a concept of multi account

.	.
IAM	Define IAM Trust to enable cross account actions Use STS to assume into roles in different accounts
CodePipeline	Trigger CodeDeploy across accounts
AWS Config	Agregate across accounts
CloudWatch Events	Use EventBus to share events across accounts
CloudWatch Logs	Use Logs Destination to send events into logging account
CloudFormation	StackSets can be deployed across accounts
CloudTrail	Can deliver trails into cross-account bucket

↖↑↓ Disaster Recovery

DR is about preparing for and recovering from a disaster
Recovery Point Objective - RPO
- How often do you run backups? How much data will be lost (since last backup)
Recovery Time Objective - RTO
- How much downtime is acceptable?

From	To	.
On-prem	On-prem	Traditional DR, very expensive
On-prem	Cloud	Hybrid recovery
Cloud Region A	Cloud Region B	.

.	RPO	RTO	Costs	Comment	What to do for DR
Backup & Restore	High	High	$	Regular backups	Restore
Pilot Light	Medium	Medium	$$	Core system is always running	Add non-critical systems
Warm Standby	Low	Low	$$$	Full system at minimum size always running	Add resources
Multi Site/Hot Site	Lowest	Lowest	$$$$	Full system at production size always running	Only switch traffic

↖↑↓ Security Automation & Compliance

Service	What it does	Will warn about (example)
Amazon Inspector	* Application and service security, scans EC2 instances for CVEs * Network scans	Root login via ssh not disabled
Config	* Ensure instance has proper AWS configuration, e.g. no open SSH port * Track audit and compliance over time	Checks whether Amazon SNS topic is encrypted with KMS
GuardDuty	* Scans accounts and workloads	Instance has bitcoin activiy, unusual console logins (e.g. new location)
Macie	* Protects data	SSH private key uploaded to S3
Security Hub	* Aggregates view from GuardDuty, Amazon Inspector, Macie, IAM Access Analyzer, AWS Firewall Manager. Also integrates 3rd party services	Whatever was integrated with SecurityHub
Service Catalog	* Restrict how instances are launched by minimizing configuration	.
Systems Manager	* Run automations, patches, commands, inventory at scale	.
TrustedAdvisor	* Scans accounts, recommends cost optimisations, fault tolerance, performance, service limits, security	Open security groups, EBS snapshot permissions

↖↑↓ Notifications

Service	SNS (native)	CloudWatch/EventBridge Events	CloudWatch Metrics/Alarms	Comment
Amazon Inspector	+	-	+ (every 5 min)	Notify SNS on assessment run and findings
API Gateway	-	-	+ (API monitoring)	.
Auto Scaling Lifecycle Hooks	+ SNS or SQS	+	+	.
CloudFormation	+	-	-	.
CloudTrail	+	-	-	.
CodeBuild	-	+	+	.
CodeCommit	+ Trigger to SNS or Lambda Notification to SNS or Chatbot(Slack)	+	-
CodeDeploy	+ Trigger to SNS Notification to SNS or Chatbot(Slack)	+	-	.
CodePipeline	+ Notification to SNS or Chatbot(Slack)	+	-	.
Config	+ All events only	+	-	.
ECS	-	+	+	.
Elastic Beanstalk	+	-	+ minimal, environment health only	.
GuardDuty	-	+	-	.
Kinesis	-	-	-	.
Lambda	-	-	+	.
Macie	-	+	-	.
OpsWorks	-	+	+	.
S3	+ Event notifications: SNS SQS Lambda	-	+	documentation
Server Migration Service	-	+	-	.
Service Catalog	+	-	+	.
Systems Manager	+	+	+ Run Command metrics	Various CloudWatch Events
Trusted Advisor	-	+	+	documentation

All services have API calls delivered to EventBridge via CloudTrail.
EventBridge supports many services.
All services that publish CloudWatch Metrics.

↖↑↓ External Tools

Jenkins
- Integrating into CodePipeline
- Plugins

↖↑↓ Jenkins

Can replace CodeBuild, CodePipeline, CodeDeploy
- Tight integration with those services
Master/Slave setup
- Master/slaves can run on the same instance, but usually run on separate instances
- Can have multiple master with respective set of slaves assigned to them
Must manage Multi-AZ, deploy on EC2, ...
Jenkinsfile to configure CI/CD
Many AWS plugins

↖↑↓ Integrating into CodePipeline

CodePipeline can send build jobs to Jenkins instead of CodeBuild
Jenkins can pull from CodeCommit and eg. upload build result to ECR, invoke Lambda, ...
Direct Jenkins support in CodePipeline, requires CodePipeline-plugin on the Jenkins end

↖↑↓ Plugins

EC2-Plugin
- Allows Jenkins to start agents on EC2 on demand, and kill them as they get unused.
- Also support spot instances
CodeBuild-Plugin
- Send builds to CodeBuild
- Official AWS plugin
ECS-Plugin
- Launch slaves into ECS

↖↑↓ Services

↖↑↓ Amazon EMR

Overview

↖↑↓ Overview

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and database, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Use cases:

Machine learning
Extract transform load (ETL)
Clickstream analysis (from S3, using Apache Spark and Apache Hive)
Real-time streaming
Interactive analytics
Genomics

↖↑↓ Amazon Inspector (Core Service)

Overview

↖↑↓ Overview

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices. After performing an assessment, Amazon Inspector produces a detailed list of security findings prioritized by level of severity. These findings can be reviewed directly or as part of detailed assessment reports which are available via the Amazon Inspector console or API.

Amazon Inspector security assessments help you check for unintended network accessibility of your Amazon EC2 instances and for vulnerabilities on those EC2 instances. Amazon Inspector assessments are offered to you as pre-defined rules packages mapped to common security best practices and vulnerability definitions. Examples of built-in rules include checking for access to your EC2 instances from the internet, remote root login being enabled, or vulnerable software versions installed. These rules are regularly updated by AWS security researchers.

Network Assessments
- Does not require agent
Host Assessments
- Requires agent
Can automate assessments via scheduled CloudWatch Events
- Can use tag to find instances to asses
- Cannot launch AMI, requires instance
Assessment templates can notify SNS
- Could trigger Lambda to remediate EC2 findings via SSM documents
On AWS: Service - FAQs - User Guide

↖↑↓ API Gateway (Core Service)

Overview
- Benefits
Concepts

↖↑↓ Overview

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. With a few clicks in the AWS Management Console, you can create REST and WebSocket APIs that act as a “front door” for applications to access data, business logic, or functionality from your backend services, such as workloads running on Amazon Elastic Compute Cloud (Amazon EC2), code running on AWS Lambda, any web application, or real-time communication applications.

On AWS: Service - FAQs - User Guide
See also: AWS Geek 2020
See also: AWS Geek 2018

↖↑↓ Benefits

RESTful (stateless) or Websocket (stateful) APIs
Powerful, flexible authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
Developer portal for publishing your APIs.
Canary release deployments for safely rolling out changes.
CloudTrail logging and monitoring of API usage and API changes.
CloudWatch access logging and execution logging, including the ability to set alarms.
Ability to use AWS CloudFormation templates to enable API creation
Support for custom domain names.
Integration with AWS WAF for protecting your APIs against common web exploits.
Integration with AWS X-Ray for understanding and triaging performance latencies.

↖↑↓ Concepts

↖↑↓ Endpoint

A hostname for an API in API Gateway that is deployed to a specific region. The hostname is of the form {api-id}.execute-api.{region}.amazonaws.com.

The following types of API endpoints are supported:

Regional - deployed to the specified region and intended to serve clients in the same AWS region.
Edge Optimized - deployed to the specified region while using a CloudFront distribution to facilitate client access typically from across AWS regions
Private - exposed through interface VPC endpoints

↖↑↓ Stage

A logical reference to a lifecycle state of your REST or WebSocket API (for example, dev, prod, beta, v2).

API stages are identified by API ID and stage name.
Each stage has its own configuration parameters.
Can be rolled back in history.
Have stage variables, that are like environment variables for API Gateway.
- Can be used to configure enpoints that the stage talks to.
- Accessible from Lambda context as well.
Can enable canary deployments for a stage (usually PROD)
- Canaray releases attaches a new version to an existing stage deployment and randomly shift traffic over
- Logs and metrics are generated separately for all canary requests
- This is blue/green for API Gateway/Lambda

↖↑↓ Deployment

After creating your API, you must deploy it to make it callable by your users. To deploy an API, you create an API deployment and associate it with a stage. Each stage is a snapshot of the API and is made available for client apps to call.

↖↑↓ Canary Deployments

Use stage variables for canary deployments
- Integrate Lambda via alias: GetStartedLambdaProxyIntegration:${stageVariables.lambdaAlias}
- Overwrite stage variable in canary deployment
Could also use Lambda's canary functionality with weighted aliases

↖↑↓ Integration

Lambda Proxy - request is passed through straight to a Lambda
- Proxy Lambda deals with complete http request
Lambda Non-Proxy/Custom
- Allows integration of mapping template
- Can transform request as well as response
- Allows to evolve the API while keeping Lambda function static
Any service
- All AWS services support dedicated APIs to expose their features. However, the application protocols or programming interfaces are likely to differ from service to service. An API Gateway API with the AWS integration has the advantage of providing a consistent application protocol for your client to access different AWS services.

↖↑↓ Mapping Template

A scripts in Velocity Template Language (VTL) that transforms a request body from the frontend data format to the backend data format.
Cannot add default values to fields, only add new static fields

↖↑↓ Model

A data schema specifying the data structure of a request or response payload.

↖↑↓ Throttling

Account-wide limit of 10,000 requests per second.
- Applies at service/account level
Can create usage plan:
- Rate, burst, quota
- Can assoicate with stage/method/API key (to limit certain clients)

↖↑↓ Athena

Overview

↖↑↓ Overview

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

On AWS: Service - FAQs - User Guide

↖↑↓ CloudFormation (Core Service)

Overview
Components
- Template
- Stacks
- Processes
- StackSets
Concepts
Limits

↖↑↓ Overview

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This file serves as the single source of truth for your cloud environment.

AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications.

Allows to create and provision resources in a reusable template fashion
Declarative - no need for ordering and orchestration
Separation of concerns - different stacks for different purposes
On AWS: Service - FAQs - User Guide

↖↑↓ Components

↖↑↓ Template

A CloudFormation template is a JSON or YAML formatted text file

Element	Comment
`AWSTemplateFormatVersion`	`2010-09-09`
`Description`	.
`Metadata`	Details about the template
`Parameters`	Values to pass in right before template creation
`Mappings`	Maps keys to values (eg different values for different regions)
`Conditions`	Check values before deciding what to do
`Resources`	Creates resources - only mandatory section in a template
`Outputs`	Values to be exposed from the console or from API calls

Parameters

Type: String, Number, List, CommaDelimitedList, AWS-specific types like AWS::EC2::KeyPair::KeyName, SSM-Parameter key (AWS::SSM::Parameter)
- SSM SecureString is not supported
Description, Default Value, Allowed Values, Allowed Pattern
Validation: regular expression/MinLength/MaxLength/MinValue/MaxValue
Can set NoEcho for secrets, will be masked with ****
Can set UsePreviousValue for stack updates
Pseudo parameters are parameters that are predefined by AWS CloudFormation. You do not declare them in your template. Use them the same way as you would a parameter, as the argument for the Ref function.
- AWS::AccountId, AWS::NotificationARNs, AWS::NoValue, AWS::Partition, AWS::Region, AWS::StackId, AWS::StackName, AWS::URLSuffix
Reference a parameter within the template !Ref myParam
Usage of parameters might make it hard to instantiate stacks

Mappings

Fixed (hardcoded) variables within CloudFormation template

  RegionMap:
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
...
  myEC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]

Conditions

Conditions:
  CreateProdResources: !Equals [ !Ref EnvType, prod ]

Can use (and combine) Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or

Resources

Over 220 types of resources, AWS::aws-product-name::data-type-name
AWS figures out creation, update and delete of resources for us
- Update can impact a resource in 4 possible ways
  - No interruption
    - E.g. change ProvisionedThroughput of a DynamoDB table
  - Some interruption
    - E.g. change InstanceType of an EC2 instance
  - Replacement
    - E.g. change AvailabilityZone of an EC2 instance
    - E.g. change ImageId of an EC2 instance
    - E.g. change Tablename of a DynamoDB table

Can toggle creation with Condition:

 Resources:
 	MountPoint:
 	Type: "AWS::EC2:VolumeAttachment"
 	Condition: CreateProdResources

Outputs

Can be
- constructed value/parameter reference/pseudo parameter/output from a function like fn::getAtt or Ref
Can be used in a different stack (cross stack references)
- !ImportValue NameOfTheExport
- Cannot delete stack if its outputs are used in another stack

Intrinsic Functions

Used to pass in values that are not available until runtime
Usable in resources, outputs, metadata attributes and update policy attributes (auto scaling). You can also use intrinsic functions to conditionally create stack resources.
Most intrinsic functions have a short and a long form (not Ref):
- Fn::GetAtt: [ logicalNameOfResource, attributeName ]
- !GetAtt logicalNameOfResource.attributeName

Name	Attributes	Description
`Ref`	logicalName	* Returns the default value of the specified parameter. For resource typically physical id
`Fn::Base64`	valueToEncode	* Provides encoding, converts from plain text into base64
`Fn::Cidr`	ipBlock, count, cidrBits	* Returns an array of CIDR address blocks. The number of CIDR blocks returned is dependent on the count parameter
`Fn::FindInMap`	MapName, TopLevelKey, SecondLevelKey	* Returns the value corresponding to keys in a two-level map that is declared in the Mappings section
`Fn::GetAtt`	logicalNameOfResource, attributeName	* Returns the value of an attribute from an object, either the default or the specified attribute * Object is either from the same or a nested template
`Fn::GetAZs`	region	* Returns an array that lists Availability Zones for a specified region * If region is omitted return AZs from the region the template is applied in
`Fn::If`	boolean, string1, string2	* Returns `string1` if `boolean` is `true`, `string2` otherwise
`Fn::And`, `Fn::Equals`, `Fn::Or`, `Fn::Not`	.	* Good for `condition` element
`Fn::ImportValue`	sharedValueToImport	* Returns the value of an Output exported by another stacki * You can't delete a stack if another stack references one of its outputs. * You can't modify or remove an output value that is referenced by another stack.
`Fn::Join`	delimiter, [ comma-delimited list of values ]	* Joins a set of values into a single value separated by the specified delimiter
`Fn::Select`	index, listOfObjects	* Returns a single object from a list of objects by index
`Fn::Split`	delimiter, source string	* Split a string into a list of string values so that you can select an element from the resulting
`Fn::Sub`	- String - { key: Value, ... }	* Substitutes variables in an input string with values that you specify string list Also: `!Sub 'arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:vpc/${vpc}'`
`Fn::Transform`	Name: String Parameters: { key: Value, ... }	* Specifies a macro to perform custom processing on part of a stack template

↖↑↓ Stacks

Related resources are managed in a single unit called a stack
- All the resources in a stack are defined by the stack's CloudFormation template
- Controls lifecycle of managed resources
- Stack has name & id
- Can be updated directly or via change set
- Will rollback stack if it fails to create (can be disabled via API/console)
- Possible to detect stack drift, if supported by created rersources
- Can enable termination protection
- Can send stack events to SNS topic
A stack policy is an IAM-style policy statements that governs who can do what
- Defines which actions can be performed on specified resources.
- With CloudFormation stack policies you can protect all or certain resources in your stacks from being unintentionally updated or deleted during the update process.
- Check stack policy if updates are allowed
  - No policy present: All updates are allowed -> This differs from IAM default!
  - Once a policy is applied
    - It cannot be updated or removed from the stack (deleted)
    - All resources that are not explicitely allowed are denied
    - Default deny can be explicitely overwritten
  - Policy format JSON
    - Contains policy documents
- Don't confuse with DeletionPolicy, UpdatePolicy, UpdateRollbackPolicy attributes

Element	.
`Effect`	.
`Principal`	Must be wildcard for stack policies
`Action`	`Update:Modify`,`Update:Replace`,`Update:Delete`,`Update:(wildcard)`
`Resource`,`NotResource`	.
`Condition`	Typically evaluates based on resource type

↖↑↓ Processes

Stack Creation

Template upload into S3 bucket
Template syntax check
- CloudFormation will check for any IAM resources being created, and require CAPABILITY_IAM| CAPABILITY_NAMED_IAM if so
- Will raise InsufficientCapabilities otherwise
Stack name & parameter verification & ingestion (apply default values)
Template processing & stack creation
- Resource ordering
  - Natural ordering
    - CloudFormation knows about 'natural' dependencies between resources.
  - DependsOn
    - Also DependsOn attribute
    - Allows to direct CloudFormation on how to handle more complex dependencies
    - Applies to creation as well as deletion & rollback
    - DependsOn can be a single resource or a list of resources
    - Will error on circular dependencies
    - DependsOn is problematic if the target resource needs more complex setup than just stack creation
  - -> Wait conditions allow further control about what happens when
- Resource creation
  - Will try to create as many resources as possible in parallel
  - Includes pausing and waiting for other resources to be created first
  - Associate the CreationPolicy attribute with a resource to prevent its status from reaching create complete until AWS CloudFormation receives a specified number of success signals or the timeout period is exceeded.
- Output creation
Stack completion or rollback
- Rollback settings can be provided while creating the stack
  - onFailure - ROLLBACK | DELETE | DO_NOTHING
- Can try to manually resolve problems if in state UPDATE_ROLLBACK_FAILED

Stacks Updates

Direct updates
- You submit changes and AWS CloudFormation immediately deploys them
Change sets
- You can preview the changes AWS CloudFormation will make to your stack, and then decide whether to apply those changes by executing the change set
- Change sets are JSON-formatted documents that summarize the changes AWS CloudFormation will make to a stack
Use the UpdatePolicy attribute to specify how AWS CloudFormation handles updates to the AWS: AutoScaling::AutoScalingGroup, AWS::ElastiCache::ReplicationGroup, AWS::Elasticsearch::Domain or AWS::Lambda::Alias resources.
- Values depend on resource type, e.g. ASG replacing vs rolling update
Use the UpdateReplacePolicy attribute to retain or (in some cases) backup the existing physical instance of a resource when it is replaced during a stack update operation.
On Failure, the stack will rollback automatically to the last known working state
Interuption while updating
- Update can impact a resource in 3 possible ways
  - No interruption
    - E.g. change ProvisionedThroughput of a DynamoDB table
  - Some interruption
    - E.g. change EbsOptimized of an EC2 instance (EBS-backed)
    - E.g. change InstanceType of an EC2 instance (EBS-backed)
  - Replacement
    - E.g. change AvailabilityZone of an EC2 instance
    - E.g. change ImageId of an EC2 instance
    - E.g. change Tablename of a DynamoDB table

Stack Deletion

Specify the stack to delete, and AWS CloudFormation deletes the stack and all the resources in that stack.
With the DeletionPolicy attribute you can preserve or (in some cases) backup a resource when its stack is deleted.
If AWS CloudFormation cannot delete a resource, the stack will not be deleted.
A stack can have termination protection enabled, which will prevent it from being deleted accidentally
Resource Deletion policy
- Policy/statement that is associated with every resource of a stack
- Controls what happens if stack is deleted
- DeletionPolicy
  - Delete
    - (default)
    - Creates transitive environment - immutable architecture
  - Retain
    - Obviously needs further cleanup - non-immutable architecture
  - Snapshot
    - Takes snapshot prior to deletion
    - Some resourcetypes only
      - AWS::EC2::Volume
      - AWS::ElastiCache::CacheCluster
      - AWS::ElastiCache::ReplicationGroup
      - AWS::Neptune::DBCluster
      - AWS::RDS::DBCluster
      - AWS::RDS::DBInstance
      - AWS::Redshift::Cluster
    - Allow data recovery at a later stage

↖↑↓ StackSets

StackSets lets you create stacks in multiple AWS accounts across multiple regions by using a single CloudFormation template. All the resources included in each stack are defined by the stack set's AWS CloudFormation template. As you create the stack set, you specify the template to use, as well as any parameters and capabilities that template requires.

Can roll out from Organizations master account
- To all accounts
- to all accounts of an OU

Concept

Stack sets are created in a region of an administrator account
A stack instance is a reference to a stack in a target account within a region
- Can exist without a stack, e.g. if stack failed to create, then the stack instance shows the reason for that
Operations: Create, Update, Delete
For updates, can choose to overwrite parameters only for some accounts/regions

Operation options	.
Maximum concurrent accounts	Maximum number or percentage of target accounts in which an operation is performed at one time
Failure tolerance	Maximum number or percentage of stack operation failures that can occur, per region, beyond which AWS CloudFormation stops an operation automatically
Retain stack (delete operations only)	Keep stacks and their resources running even after they have been removed from a stack set

↖↑↓ Concepts

↖↑↓ Running code on instance boot

Script Name	Purpose
`cfn-init`	Use to retrieve and interpret resource metadata, install packages, create files, and start services.
`cfn-signal`	Use to signal with a CreationPolicy or WaitCondition, so you can synchronize other resources in the stack when the prerequisite resource or application is ready.
`cfn-get-metadata`	Use to retrieve metadata for a resource or path to a specific key.
`cfn-hup`	Use to check for updates to metadata and execute custom hooks when changes are detected.

Define code and scripts to run

CloudFormation User Data
- Scripts and commands to be passed to a launching EC2 instance.
- Failing user data scripts don't fail the CFN stack
- Logged to /var/log/cloud-init-output.log
- Needs to be base64-encoded

UserData:
  Fn::Base64: |
        #!/bin/bash -x
        ...

cfn-init
- Use the AWS::CloudFormation::Init type to include metadata on an Amazon EC2 instance for the cfn init helper script. If your template calls the cfn-init script, the script looks for resource metadata rooted in the AWS::CloudFormation::Init metadata key.
- Different sections: packages, groups, users, sources, files, commands, services
- Need to make sure aws-cfn-bootstrap is in place und up to date
- Logged to /var/log/cfn-init.log
- Can use WaitCondition/cfn-signal to make CloudFormation wait for successful finish of code
By default, user data scripts and cloud-init directives run only during the boot cycle when you first launch an instance.
Can use cfn-hup to detect changes in resource metadata and run user-specified actions when a change is detected
Use cfn-get-metadata to fetch a metadata block from AWS CloudFormation and print it to standard out

Signal outcome of installation back to CFN

Creation Policy

Can (only) be used for EC2 Instances and Auto Scaling Groups
Creation policy definion
- Defines desired signal count & waiting period
Signal configuration
- Call to cfn-signal from EC2 user-data

AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
  CreationPolicy:
    ResourceSignal:
      Count: '3'
      Timeout: PT15M

LaunchConfig:
  Type: AWS::AutoScaling::LaunchConfiguration
  Properties:
    ...
    UserData:
      "Fn::Base64":
        !Sub |
          #!/bin/bash -xe
          yum update -y aws-cfn-bootstrap
          /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}

Wait Conditions and Handlers

For other resources (external to the stack)
Wait condition handler
- CloudFormation resource with no properties
- Generates signed URL to communicate success or failure
- URL can be used by cfn-signal to send data to
  - Takes custom data as well
Wait condition
- Links handler and resource
  - Know which resource they depend on
  - Hold reference to handler
  - Have response timeout
  - Have a desired count (defaults to 1)
- Allows to define complex wait order

WebServerGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
WaitHandle:
  Type: AWS::CloudFormation::WaitConditionHandle
WaitCondition:
  Type: AWS::CloudFormation::WaitCondition
  DependsOn: "WebServerGroup"
  Properties:
    Handle:
      Ref: "WaitHandle"
    Timeout: "300"
    Count:
      Ref: "WebServerCapacity"

↖↑↓ Custom Resources

Problems with existing CloudFormation resources:
- Sometimes lacks behind AWS services
- Cannot deal with non-AWS resources
- Cannot do much logic beyond the scope of intrinsic functions
- Cannot interact with external services
Solvable with custom resources:
- Custom resource type that is backed by SNS or Lambda
  - Type: Custom::NameOfResourceType
  - ServiceToken: arnOfSnsOrLambda
- If stack is created, updated or deleted a payload is sent to ServiceToken
  - Payload contains any custom data that's defined with the resource together with the action type
  - This invokes a Lambda that performs any sort of custom action
- Or an SNS topic, e.g. to communicate with on-prem resources
  - Returns outcome of operation back to CloudFormation, typically includes custom data as well

↖↑↓ Drift detection

Can detect drift on an entire stack or on a particular resource
- Not supported by all resources types
CloudFormation
- Compares the current stack configuration to the one specified in the template that was used to create or update the stack
- Reports on any differences, providing you with detailed information on each one.

↖↑↓ Stacks Nesting

Resources in a stack can be references by other stacks
How to nest
- Declare resources as AWS::CloudFormation::Stack
- Point TemplateURL to S3 URL of nested stack
- Use Parameters to provide the nested stack with input values (defaults will be used otherwise)
- Output values of nested stack are returned to parent (root) stack
  - !GetAtt nestedStack.Outputs.db_name
Benefits
- Allows infrastructure to be split over many templates
- Allows infrastructure reuse
- Allows to workaround limitations like max resources or max template size
- Considered best practice by AWS

↖↑↓ Limits

.	.
Max stacks per region	200
Max templates per region	unlimited
Max template size (stored in S3)	460kB
Parameters per stack	60
Mappings per stack	100
Resources per stack	200
Outputs per stack	60

↖↑↓ CloudFront

Overview
Lamdba@Edge

↖↑↓ Overview

Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content, such as .html, .css, .js, and image files, to your users. CloudFront delivers your content through a worldwide network of data centers called edge locations. When a user requests content that you're serving with CloudFront, the user is routed to the edge location that provides the lowest latency (time delay), so that content is delivered with the best possible performance.

CloudFront is usually the cheapest and simplest way to add caching to web application

↖↑↓ Lamdba@Edge

Lambda@Edge is a feature of Amazon CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency. With Lambda@Edge, you don't have to provision or manage infrastructure in multiple locations around the world. You pay only for the compute time you consume - there is no charge when your code is not running.

With Lambda@Edge, you can enrich your web applications by making them globally distributed and improving their performance — all with zero server administration. Lambda@Edge runs your code in response to events generated by the Amazon CloudFront content delivery network (CDN). Just upload your code to AWS Lambda, which takes care of everything required to run and scale your code with high availability at an AWS location closest to your end user.

Master Lambda, associated with CloudFront distribution trigger -> Edge replica
Can use Lambda@Edge for A/B testing

↖↑↓ CloudSearch

Overview

↖↑↓ Overview

Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website or application.

With Amazon CloudSearch you can search large collections of data such as web pages, document file, forum posts, or product information. You can quickly add search capabilities without having to become a search expert or worry about hardware provisioning, setup, and maintenance. As your volume of data and traffic fluctuates, Amazon CloudSearch scales to meet your needs

↖↑↓ CloudTrail (Core Service)

Overview
Concepts
- Event
Trail

↖↑↓ Overview

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. This event history simplifies security analysis, resource change tracking, and troubleshooting. In addition, you can use CloudTrail to detect unusual activity in your AWS accounts. These capabilities help simplify operational analysis and troubleshooting.

CloudTrail is enabled by default in every account. All activities in an AWS account are being recorded as CloudTrail events.

On AWS: Service - FAQs - User Guide

↖↑↓ Concepts

↖↑↓ Event

JSON format, who did what (API calls).
~15min delay
Stored for 90 days

↖↑↓ Trail

Can configure what type of events to log
- Management events
- CloudWatch Insights events
- Data events
One region/all regions/organization-wide
Store data in nominated S3 bucket, this can be encrypted as well
- Can be in a different region
Can also deliver and analyse events in a trail with CloudWatch Logs and CloudWatch Events
Can validate integrity of log files using digest files
Can deliver trails from multiple accounts into the same bucket
- Change bucket policy to allow that

↖↑↓ CloudWatch (Core Service)

Overview
Concepts
Unified CloudWatch Agent
Key metrics
- EC2
- Auto Scaling Group
- ELB
- ALB
- NLB
Tutorials

↖↑↓ Overview

Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. You can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize your applications, and ensure they are running smoothly.

Access all your data from a single platform
Easiest way to collect custom and granular metrics for AWS resources
Visibility across your applications, infrastructure, and services
Improve total cost of ownership
Optimize applications and operational resources
Derive actionable insights from logs
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2019

↖↑↓ Concepts

↖↑↓ CloudWatch Logs

Log events are records of some activity recorded by the application or resource being monitored
Log streams are sequences of log events from the same source
Log groups are groups of log streams that share the same retention, monitoring, and access control settings
Metric filters allow to extract metric observations from ingested events and transform them to data points in a CloudWatch metric
- Search for and match terms, phrases, or values in log events
- Can increment the value of a CloudWatch metric
Retention settings can be used to specify how long log events are kept in CloudWatch Logs Default: Indefinetely
Logs can be exported to S3 for durable storage.
- This can be automated with EventsFilter -> Lambda

Subscriptions

Allow real-time delivery of log events
Can create subscription filter for Lambda, Elasticsearch, Kinesis Data/Firehose (not supported from console)
- Only Kinesis Stream supports cross-account
  - Need to establish log data sender and log data recipient.
  - The log group and the destination must be in the same AWS region. However, the AWS resource that the destination points to can be located in a different region.

AWS-Managed Logs

Service	Target(s)
Load Balancer Access Logs (ELB, ALB, NLB)	S3
CloudTrail Logs	S3, CloudWatch
VPC Flow Logs	S3, CloudWatch
Route 53 Access Logs	CloudWatch
S3 Access Logs	S3
CloudFront Access Logs	S3

↖↑↓ CloudWatch Metrics

Namespaces

Container for CloudWatch metrics
Metrics in different namespaces are isolated from each other
The AWS namespaces typically use the following naming convention: AWS/service

Metrics

Metrics are the fundamental concept in CloudWatch Metrics
A metric represents a time-ordered set of data points that are published to CloudWatch.
Available metrics are based on currently used service
Not everything is available out of the box, e.g. no data on memory usage of EC2 instances
Can also create Custom Metrics
- Publish individual data points via AWS CLI or API
- Exist only in the region where they were created
- Expire after 15 months if no data is published
- aws cloudwatch put-metric-data --metric-name PageViewCount --namespace MyService --value 2 --timestamp 2016-10-20T12:00:00.000Z
Can also export metrics
- get-metric-statistics --namespace <value> --metric-name <value> --start-time <value> --end-time <value>...
Metrics produced by AWS services are standard resolution by default.
- When you publish a custom metric, you can define it as either standard resolution or high resolution.
- When you publish a high-resolution metric, CloudWatch stores it with a resolution of 1 second, and you can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds.
- Higher resolution data automatically aggregates into lower resolution data

Resolution	Data retention
<60s	3h
60s	15d
300s (5min)	63d
3600s (1h)	15m

Time Stamps

Each metric data point must be associated with a time stamp.
Can be up to two weeks in the past and up to two hours into the future.

Dimension

A dimension is a name/value pair that is part of the identity of a metric.
You can assign up to 10 dimensions to a metric.
Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
For example 'ec2 instance id'

Statistics

Statistics are metric data aggregations over specified periods of time
Average, Sum, Minimum, Maximum, Sample Count, pNN.NN (value of specified percentile)
Can be computed for any time periods between 60-seconds and 1-day

Period

The length of time associated with a specific Amazon CloudWatch statistic

Aggregation

CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics

↖↑↓ CloudWatch Alarms

Based on thresholds defined on metrics, including custom metrics
- Can only be based on a single metric
Can trigger Lambda, SNS, email, ...
- Also Auto Scaling or EC2 action
- Alarms do not raise CloudWatch Events themselves
High resolution alarms down to 10 seconds
Takes place once, at a specific point in time
- Disable with mon-disable-alarm-actions via CLI
Can be added to dashboard
Using alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your EC2 instances.

Billing Alarms

Notfications on billing metrics
Only available in us-east-1

↖↑↓ CloudWatch Events

Define actions on things that happened
Or schedule cron-based events
Events are recorded constantly over time
- Targets process events
  - Lambda functions
  - Amazon EC2 instances
  - Streams in Amazon Kinesis Data Streams
  - Delivery streams in Amazon Kinesis Data Firehose
  - Log groups in Amazon CloudWatch Logs
  - Amazon ECS tasks
  - Systems Manager Run Command
  - Systems Manager Automation
  - AWS Batch jobs
  - AWS Step Functions state machines
  - Pipelines in AWS CodePipeline
  - AWS CodeBuild projects
  - Amazon Inspector assessment templates
  - Amazon SNS topics
  - Amazon SQS queues
  - Built-in targets: EC2 CreateSnapshot API call, EC2 RebootInstances API call, EC2 StopInstances API call, and EC2 TerminateInstances API call
  - The default event bus of another AWS account
- Rules match incoming events and route them to targets
  - CloudTrail integration allows to trigger events on API calls
    - ReadOnly calls (List*, Get*, Describe*) are not supported
- E.g. CodeCommit automatically triggers CodePipeline on new commits
Can deliver cross-account
- Must be in the same region

EventBridge

CloudWatch Events in its core
Adding other (3rd party service partners) event sources into the mix

S3 Events

On bucket events send to SNS, SQS or Lambda
- Not everything is covered by S3 notifications
- Only object-level operations, not bucket-level
Can also integrate with CloudTrail, but need a trail configured for that bucket

↖↑↓ Dashboards

Customizable home pages in the CloudWatch console that you can use to monitor your resources in a single correlated view
Even those resources that are spread across different Regions.
You can use CloudWatch dashboards to create customized views of the metrics and alarms for your AWS resources.

↖↑↓ Unified CloudWatch Agent

Collects metrics and logs from
- EC2 instances (Linux/Windows)
- On-Prem instances (Linux/Windows)
Stores configuration in SSM parameters
- Easy to share
- Can configure CloudWatch Agent to boot directly from SSM parameter store
Offers a variety of metrics on top of the EC2 standard metrics

↖↑↓ Key metrics

↖↑↓ EC2

EC2 metrics are based on what is exposed to the hypervisor.
Basic Monitoring (default) submits values every 5 minutes, Detailed Monitoring every minute

Metric	Effect
`CPUUtilization`	The total CPU resources utilized within an instance at a given time.
`DiskReadOps`, `DiskWriteOps`	The number of read (write) operations performed on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
`DiskReadBytes`, `DiskWriteBytes`	The number of bytes read (written) on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
`NetworkIn`, `NetworkOut`	The number of bytes received (sent) on all network interfaces by the instance
`NetworkPacketsIn`, `NetworkPacketsOut`	The number of packets received (sent) on all network interfaces by the instance
`StatusCheckFailed`, `StatusCheckFailed_Instance`, `StatusCheckFailed_System`	Reports whether the instance has passed both/instance/system status check in the last minute.

Can not monitor memory usage, available disk space, swap usage
- This can be achieved with the Unified CloudWatch Agent

↖↑↓ Auto Scaling Group

Metric	Effect
`GroupMinSize` `GroupMinSize`	The minimum/maximum size of the Auto Scaling Group.
`GroupDesiredCapacity`	The number of instances that the Auto Scaling Group attempts to maintain.
`GroupInServiceInstances` `GroupPendingInstances` `GroupStandbyInstances` `GroupTerminatingInstances`	The number of instances that are running/pending (not yet in service)/standby (still running)/ terminating as part of the Auto Scaling Group.
`GroupTotalInstances`	The total number of instances in the Auto Scaling Group. This metric identifies the number of instances that are in service, pending, and terminating.

↖↑↓ ELB

Metric	Effect
`Latency`	Time it takes to receive an response. Measure `max` and `average`
`BackendConnectionErrorr`	Number of not successfully established connections to registered instances, measure `sum` and look at difference between `min` and `max`
`SurgeQueueLength`	Total number of request waiting to get routed, look at `max` and `average`
`SpilloverCount`	Dropped requests because of exceeded surge queue. Look at `sum`
`HTTPCode_ELB_4XX` `HTTPCode_ELB_5XX`	The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets. Look at `sum`
`HTTPCode_Backend_2XX`... ...`HTTPCode_Backend_5XX`	The number of HTTP XXX server error codes that originate from the backend. Look at `sum`
`RequestCount`	Number of completed requests
`HealthyHostCount`,`UnhealthyHostCount`	Self explainatory

Elastic Load Balancing reports metrics only when requests are flowing through the load balancer. If there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals.

spillover and surge queue give an indication of the ELB being overloaded

Typically this means that the backend system cannot process requests as fast as they are coming in
- Ideally load balance into an Auto Scaling Group.

↖↑↓ ALB

Metric	Effect
`RequestCount`	Number of completed requests
`HealthyHostCount`,`UnhealthyHostCount`	Self explainatory
`TargetResponseTime`	The time elapsed after the request leaves the load balancer until a response from the target is received.
`HTTPCode_ELB_3XX_Count` `HTTPCode_ELB_4XX_Count` `HTTPCode_ELB_5XX_Count`	The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.

↖↑↓ NLB

Metric	Effect
`processedbyte`	The total number of bytes processed by the load balancer, including TCP/IP headers.
`tcp_client_reset_count`	the total number of reset (rst) packets sent from a client to a target.
`tcp_elb_reset_count`	the total number of reset (rst) packets generated by the load balancer.
`tcp_target_reset_coun`	the total number of reset (rst) packets sent from a target to a client.

↖↑↓ Tutorials

.	.	.
CloudWatch Events/Eventbridge	Relay Events to AWS Systems Manager Run Command	Configure event ASG, Instance Launch and Terminate, EC2-Instance-Launch lifecycle action Target: SSM Run Command, add command(s)
CloudWatch Events/Eventbridge	Log the State of an Amazon EC2 Instance Usings	Configure event EC2, Instance State Change, Specific states: Running Invoke lambda the logs state from incoming event
CloudWatch Events/Eventbridge	Log the State of an Auto Scaling Group	Configure event ASG, Instance Launch and Terminate, EC2-Instance-Launch lifecycle action Invoke lambda the logs state from incoming event
CloudWatch Events/Eventbridge	Log Amazon S3 Object-Level Operations	Configure CloudTrail trail to monitor S3 bucket(s)(no events otherwise) Implement Lambda that logs state Configure event to trigger on `PutObject` Invoke Lamdba
CloudWatch Events/Eventbridge	Use Input Transformer to Customize What Is Passed to the Event Target	.
CloudWatch Events/Eventbridge	Log AWS API Calls	.
CloudWatch Events/Eventbridge	Schedule Automated Amazon EBS Snapshots	.
CloudWatch Events/Eventbridge	Set AWS Systems Manager Automation as an EventBridge Target	.
CloudWatch Events/Eventbridge	Relay Events to a Kinesis Stream	.
CloudWatch Events/Eventbridge	Run an Amazon ECS Task When a File Is Uploaded to an Amazon S3 Bucket	.
CloudWatch Events/Eventbridge	Schedule Automated Builds Using AWS CodeBuild	.
CloudWatch Events/Eventbridge	Log State Changes of Amazon EC2 Instances	.
CloudWatch Events/Eventbridge	Download Code Bindings for Events using the EventBridge Schema Registry	.

↖↑↓ CodeBuild (Core Service)

Overview
Benefits
Components
- How it works

↖↑↓ Overview

AWS CodeBuild is a fully managed continuous integration service that compiles source code, Runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools. With CodeBuild, you are charged by the minute for the compute resources you use.

Provides preconfigured environments for supported versions of Java, Ruby, Python, Go, Node.js, Android, .NET Core, PHP, and Docker
Build images are Amazon Linux 2, Ubuntu 18.04, Windows Server Core 2016
- Can use build image from ECR, even cross-account
On AWS: Service - FAQs - User Guide

↖↑↓ Benefits

Fully managed build service
- Serverless
- Leverages Docker under the hood
- Can use own docker images
- Integrates with KMS, IAM, VPC and CloudTrail
Continuous scaling
Extensible
- Can use own build tools and runtimes
Can attach to VPC

↖↑↓ Components

Source code from CodeCommit, S3, GitHub, Bitbucket
Build defined in buildspec.yml
- Build timeouts up to 8h
- Uses queue to process build jobs
Triggers can schedule a build, can also use cron expressions
CloudWatch integration
- Logs (can also go to S3)
- Metrics to monitor CodeBuild statistics
  - Can set up Alarms on top of those
- Can schedule CloudWatch Events

↖↑↓ How it works

CodeBuild is provided with a build project.
- Defines how Codebuild runs
  - Source code location, build environment to use, build commands
CodeBuild uses information from build project to create build environment
- Build runs in container, in phases
- submitted, queued, provisioning, download_source, install, pre_build, build, post_build, upload_artifacts, finalizing, completed
Download source code into build environment, use buildspec to build

.	.	.
`version`	`0.3`	.
`run-as`	.	.
`env`	`variables`, `parameter-store`, `exported-variables`, `secrets-manager`, `git-credentials-helper`	.
`phases`	`install`, `pre_build`, `build`, `post_build`	Every phase has a `finally` section Failed `build` transitions to `post_build`, all others to `finalizing`
`reports`	.	.
`artifacts`, `secondary-artifacts`	.	.
`cache`	`paths`	.

Environment variables
- Can come from SSM/Secrets Manager
- Precendence: start-build (CLI), project definition, buildspec.yml
If there is build output and CodeBuild is not managed by CodePipeline
- Uploaded to S3, encrypted by default
- With default configuration, artifacts are overwritten each time
- Can provide Namespace type to save output to different location every time
While build is running, the build environments sends information to CloudWatch and CodeBuild
- Can also use console, CLI, SDK to retrieve information about running build

↖↑↓ CodeCommit (Core Service)

Overview
Benefits
How To

↖↑↓ Overview

AWS CodeCommit is a fully-managed source control service that hosts secure Git-based repositories. It makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.

On AWS: Service - FAQs - User Guide

↖↑↓ Benefits

Fully managed
Highly available
Faster development cycle
Code lives close to actual environments
Secure
- Encryption, IAM integration
Collaborate on code
Use existing tools
- CodeBuild, Jenkins, other CI tools
Fully enabled for automation
- Provides notifacations and triggers for all repository events

↖↑↓ How To

↖↑↓ Protect branches

Use IAM policy:

"Condition": {
		"StringEqualsIfExists": {
				"codecommit:References": [
						"refs/heads/master"
				]
		},
		"Null": {
				"codecommit:References": false
		}
}

↖↑↓ Send Notifications

CodeStar Notifications integration
- Uses CloudWatch Events Rules under the hood
Set up notification/notification rules:
- Pick triggering event (on commit, ..., brunch updated)
- Pick SNS topic as target
- Add subscriber to target
Set up CloudWatch Events directly:
- Pick event source (AWS service)
- Pick event type (different types, also includes CloudTrail)
- Pick target (many different types, e.g. Lambda, SNS, SSM, Code*)

↖↑↓ Triggers

Can trigger SNS or Lambda
Up to 10 triggers
Can be augmented with custom data (an uninterpreted string) that you can use to distinguish the trigger from others that run for the same event.
More limited in scope than Notifications. Does not us CloudWatch Events under the hood.

↖↑↓ CodeDeploy (Core Service)

Overview
Components
How it works
Deploys

↖↑↓ Overview

AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. AWS CodeDeploy makes it easier for you to rapidly release new features, helps you avoid downtime during application deployment, and handles the complexity of updating your applications. You can use AWS CodeDeploy to automate software deployments, eliminating the need for error-prone manual operations. The service scales to match your deployment needs.

CodeDeploy can be chained into CodePipeline and can use artifacts from there
CodeDeploy does not provision resources
Automated deployments
- EC2, on-prem, (ASG), ECS, (Fargate), Lambda
Minimize downtime
Centralized control
Easy to adopt
Integrates with AWS SAM
On AWS: Service - FAQs - User Guide

↖↑↓ Components

Application
- The application that should be deployed
Deployment
- Revision
  - Specific version of deployable content, such as source code, post-build artifacts, web pages, executable files, and deployment scripts, along with an AppSpec file
  - AppSpec File
    - Environment variables are being exposed as well (e.g. DEPLOYMENT_ID, DEPLOYMENT_GROUP_NAME), allow to implement logic in installation process.
    - Slightly different format for EC2/ECS/Lambda.
- Deployment group
  - Set of instances or Lambda functions, defined by tags, e.g. 'environment=prod'
  - Can define multiple deployment groups with an application, such a prod or staging
  - Can be associated with
    - CloudWatch Alarms that would stop the deployment if triggered
    - Triggers for notification
    - Rollbacks
- Deployment configuration
  - Specifies how the behavior for how deployment should proceed
  - CodeDeployDefault.OneAtATime, ..., CodeDeployDefault.LambdaCanary10Percent10Minutes
  - Can create own
    - Define minimum healthy hosts by percentage or number
    - E.g. 9 instances in total, 6 minimum healthy hosts, deploy 3 at a time. Deployment is successful after 6 hosts have been successfully deployed

EC2/On-Premises	ECS	Lambda
`version`: 0.0 `os`: operating-system-name `files`: source-destination-files-mappings `permissions`: permissions-specifications `hooks`: deployment-lifecycle-event-mappings	`version`: 0.0 `resources`: ecs-service-specifications `hooks`: deployment-lifecycle-event-mappings	`version`: 0.0 `resources`: lambda-function-specifications `hooks`: deployment-lifecycle-event-mappings
`hooks` section contains mappings that link one or more scripts	`hooks` section specifies Lambda validation functions	`hooks` section specifies Lambda validation functions

↖↑↓ How it works

↖↑↓ Overview

CodeDeploy (Agent) continuously polls CodeDeploy
CodeDeploy sends AppSpec file
Revision is pulled from S3/GitHub/BitBucket
- Actual revison always comes from S3, needs IAM permissions for the bucket
Run in phases ApplicationStop, (DownloadBundle), BeforeInstall, (Install), AfterInstall, ApplicationStart, ValidateService
CodeDeploy (Agent) reports back success/failure

↖↑↓ Notifications and logging

CodeStar Notifications integration (goes out to SNS only)
CloudWatch Event integration
Deployment triggers can notify SNS
- Can trigger based on deployment or instance events
CloudWatch Log Agent on machine can push logs to CloudWatch
- No logs for ECS/Lambda deploys

↖↑↓ Rollback

Manual Rollback by simply deploying a previous version
Automated Rollbacks
- Options
  - When deployment fails
    - Rollback when any deployments on any instance fails
  - When alarm thresholds are met
    - Add CloudWatch Alarm to deployment group
- Can define cleanup file to help removing already installed files

↖↑↓ Deploys

↖↑↓ To EC2/On-premises

Step	Comment
Create application	.
Specify deployment group	Tags and/or ASG name.
Specify deployment configuration	`AllAtOnce`,`HalfAtATime`,`OneAtATime` (default) In-place (default), Blue/green
Upload revison	.
Deploy	.
Check results	.
Redeploy as needed	.

In-place
- The application on each instance in the deployment group is stopped
- The latest application revision is installed
- The new version of the application is started and validated.
- You can use a load balancer so that each instance is deregistered during its deployment and then restored to service after the deployment is complete.
- Only deployments that use the EC2/On-Premises compute platform can use in-place deployments.0
Blue/green
- Instances are provisioned for the replacement environment.
- The latest application revision is installed on the replacement instances.
- An optional wait time occurs for activities such as application testing and system verification.
- Instances in the replacement environment are registered with an Elastic Load Balancing load balancer, causing traffic to be rerouted to them.
- Instances in the original environment are deregistered and can be terminated or kept running for other uses.
- If you use an EC2/On-Premises compute platform, be aware that blue/green deployments work with Amazon EC2 instances only.

Integration with Elastic Load Balancing

During deployments, a load balancer prevents internet traffic from being routed to instances when they are not ready, are currently being deployed to, or are no longer needed as part of an environment.
Blue/Green Deployments
- Allows traffic to be routed to the new instances in a deployment group that the latest application revision has been deployed to (the replacement environment), according to the rules you specify
- Blocks traffic from the old instances where the previous application revision was running (the original environment).
- After instances in a replacement environment are registered with a load balancer, instances from the original environment are deregistered and, if you choose, terminated.
- Specify a Classic Load Balancer, Application Load Balancer, or Network Load Balancer in your deployment group.
In-Place Deployments
- Prevents internet traffic from being routed to an instance while it is being deployed to
- Makes the instance available for traffic again after the deployment to that instance is complete.
- If a load balancer isn't used during an in-place deployment, internet traffic may still be directed to an instance during the deployment process.
- When you use a load balancer with an in-place deployment, instances in a deployment group are deregistered from a load balancer, updated with the latest application revision, and then reregistered with the load balancer as part of the same deployment group after the deployment is successful.
- Specify a Classic Load Balancer, Application Load Balancer, or Network Load Balancer. You can specify the load balancer as part of the deployment group's configuration, or use a script provided by CodeDeploy to implement the load balancer.

Integration with Auto Scaling Groups

When new Amazon EC2 instances are launched as part of an Amazon EC2 Auto Scaling group, CodeDeploy can deploy your revisions to the new instances automatically.
During blue/green deployments on an EC2/On-Premises compute platform, you have two options for adding instances to your replacement (green) environment:
- Use instances that already exist or that you create manually.
- Use settings from an Amazon EC2 Auto Scaling group that you specify to define and create instances in a new Amazon EC2 Auto Scaling group.
If an Amazon EC2 Auto Scaling scale-up event occurs while a deployment is underway, the new instances will be updated with the application revision that was most recently deployed, not the application revision that is currently being deployed.
- Suspend scaling during rolling deploys
- Or redeploy

Register on-premises instances

Configure each on-premises instance, register it with CodeDeploy, and then tag it.
- Can create IAM User per instance
  - Needs configuration file with AK/SAK
  - Use register or register-on-premises-instances command
  - Best for only few instances
- Can create IAM Role
  - Needs credentials to call STS with
  - Best for many instances, also more secure
  - Setup more complicated
  - Use register-on-premises-instances command together with STS token service
Need to install CodeDeploy agent, obviously
On-prem instances cannot blue/green, as CodeDeploy cannot create new infrastructure

↖↑↓ To Lambdas

Step	Comment
Create application	.
Specify deployment group	Only a name, Lambdas are specified in appspec
Specify deployment configuration	`LambdaCanary10Percent5Minutes`/10/15/30 `LambdaLinear10PercentEvery1Minute`/2/3/10 `LambdaAllAtOnce` only Blue/green
Specify an AppSpec file	S3 (local with AWS CLI)
Deploy	.
Check results	.
Redeploy as needed	.

Lambda deploys a new version under an alias, and traffic is shifted between old and new version
- (Versions and Aliases are native Lambda features)

Integration with AWS Serverless

Deploys new versions of your Lambda function, and automatically creates aliases that point to the new version.
Gradually shifts customer traffic to the new version until you're satisfied that it's working as expected, or you roll back the update.
Defines pre-traffic and post-traffic test functions to verify that the newly deployed code is configured correctly and your application operates as expected.
Rolls back the deployment if CloudWatch alarms are triggered.

DeploymentPreference:
 Type: Canary10Percent10Minutes
 Alarms:
   # A list of alarms that you want to monitor
   - !Ref AliasErrorMetricGreaterThanZeroAlarm
 Hooks:
   # Validation Lambda functions that are run before & after traffic shifting
   PreTraffic: !Ref PreTrafficLambdaFunction
   PostTraffic: !Ref PostTrafficLambdaFunction

↖↑↓ To ECS

Step	Comment
Create ECS service	Set its deployment controller to CodeDeploy
Create application	.
Specify deployment group	Specify * ECS cluster and service name * production listener, optional test listener, and target groups * deployment settings, such as when to reroute production traffic to the replacement ECS task * optional settings such as triggers, alarms and rollback behaviour
Specify deployment configuration	`ECSCanary10Percent5Minutes`/15 `LambdaLinear10PercentEvery1Minute`/3 `ECSAllAtOnce` only Blue/green
Specify an AppSpec file	S3 (local with AWS CLI)
Deploy	.
Check results	.
Redeploy as needed	.

CodeDeploy reroutes traffic from the original version of a task set to a new, replacement task set
Target groups specified in the deployment group are used to serve traffic to the original and replacement task sets
After the deployment is complete, the original task set is terminated.
Can specify an optional test listener to serve test traffic to your replacement version before traffic is rerouted to it

↖↑↓ CodePipeline (Core Service)

Overview
Scenarios

↖↑↓ Overview

AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define. This enables you to rapidly and reliably deliver features and updates. You can easily integrate AWS CodePipeline with third-party services such as GitHub or with your own custom plugin. With AWS CodePipeline, you only pay for what you use. There are no upfront fees or long-term commitments.

On AWS: Service - FAQs - User Guide

↖↑↓ Benefits

Rapid delivery
Configurable workflow
Get started fast
- No CI/CD infastructue needs to be provisioned
Easy to integrate
- Plugin concept to integrate with other components
Integrates with CloudWatch Events
CodeStar Notifications integration

↖↑↓ Components

stage
- action group (run in sequence)
  - action (run in sequence or parallel)
    - Various action providers provide functionality
    - runOrder allows to decide about sequential and parallel
    - region replicates source bucket into target region. This enables multi-region deploys
    - Can invoke Lambda as an almost arbitrary pipeline action
Stages create artifacts
- Stored in S3, passed on to the next stage
  - Default setting for artifact store would create one bucket per pipe, can also specify Custom location
  - Store must be in the same region as pipeline
  - Always encrypted, default KMS or CMK
- Artifacts are the way different pipeline stages 'communicate' with each other
- CodePipeline artifacts are sightly different to CodeBuild artifacts

↖↑↓ Pipeline Actions

Source
- S3
- CodeCommit
  - -> Must have one pipeline per branch!
  - Change detection CloudWatch (recommended), or CodePipeline (poll periodically)
  - Creates CloudWatch Events rule in the background
- Github
- ECR
Build
- CodeBuild
- Jenkins
- TeamCity
- etc
Test
- CodeBuild
- Jenkins
- Ghost Inspector
Deploy
- CodeDeploy
  - Can deploy into different region
- CloudFormation
- Beanstalk
- ECS
  - ServiceCatalog
- etc
Invoke
- Lambda
Approval
- SNS

↖↑↓ Scenarios

CodePipeline with

Amazon S3, AWS CodeCommit, and AWS CodeDeploy
Third-party Action Providers (GitHub and Jenkins)
AWS CodeStar to Build a Pipeline in a Code Project
Compile, Build, and Test Code with CodeBuild
Amazon ECS for Continuous Delivery of Container-Based Applications to the Cloud
Elastic Beanstalk for Continuous Delivery of Web Applications to the Cloud
AWS Lambda for Continuous Delivery of Lambda-Based and Serverless Applications
AWS CloudFormation Templates for Continuous Delivery to the Cloud

↖↑↓ CodeStar

Overview
Benefits
Under the hood

↖↑↓ Overview

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. AWS CodeStar provides a unified user interface, enabling you to easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, allowing you to start releasing code faster. AWS CodeStar makes it easy for your whole team to work together securely, allowing you to easily manage access and add owners, contributors, and viewers to your projects. Each AWS CodeStar project comes with a project management dashboard, including an integrated issue tracking capability powered by Atlassian JIRA Software. With the AWS CodeStar project dashboard, you can easily track progress across your entire software development process, from your backlog of work items to teams’ recent code deployments.

CodeStar provides a central console where you can assign project team members the roles they need to access tools and resources. These permissions are applied automatically across all AWS services used in your project, so you don't need to create or manage complex IAM policies.
- owner, contributor, viewer
On AWS: Service - FAQs - User Guide

↖↑↓ Benefits

Start developing on AWS in minutes
Manage software delivery in one place
Work across your team securely
Choose from a variety of project templates

↖↑↓ Under the hood

Uses cfn-transform to generate cfn from template.yml

↖↑↓ Config (Core Service)

Overview
Config Rules
Automation
Aggregation

↖↑↓ Overview

AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. With Config, you can review changes in configurations and relationships between AWS resources, dive into detailed resource configuration histories, and determine your overall compliance against the configurations specified in your internal guidelines. This enables you to simplify compliance auditing, security analysis, change management, and operational troubleshooting.

Evaluate your AWS resource configurations for desired settings.
Get a snapshot of the current configurations of the supported resources that are associated with your AWS account.
Retrieve configurations of one or more resources that exist in your account.
Retrieve historical configurations of one or more resources.
Receive a notification whenever a resource is created, modified, or deleted.
View relationships between resources. For example, you might want to find all resources that use a particular security group.
On AWS: Service - FAQs - User Guide

↖↑↓ Config Rules

Evaluate the configuration settings of AWS resources
A Config rule represents your ideal configuration settings
Predefined rules called managed rules to help you get started
Can also create custom rules
AWS Config continuously tracks the configuration changes that occur among your resources
- Checks whether these changes violate any of the conditions in your rules.
- If a resource violates a rule, AWS Config flags the resource and the rule as noncompliant.
Can remediate using AWS Systems Manager Automation Documents

↖↑↓ Automation

SNS notification on all Config events (cannot configure which events)
CloudWatch Events to observe specific events/rules

↖↑↓ Aggregation

An aggregator is an AWS Config resource type that collects AWS Config configuration and compliance data from the following:

Multiple accounts and multiple regions.
Single account and multiple regions.
An organization in AWS Organizations and all the accounts in that organization.
Is limited to 50 per account
- "We are unable to complete the request at this time. Try again later or contact AWS Support"

↖↑↓ DynamoDB

Overview
Keys and indexes
- PK & Sort key
Secondary indexes
Capacity provisioning
DynamoDB Accelerator (DAX)
DynamoDB Streams
Global Tables

↖↑↓ Overview

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multi region, multi master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.

Fully managed NoSQL database
HA through different AZs, automatically spreads data and traffic accross servers
- 3 geographically distributed regions per table
Can scale up and down depending on demand (no downtime, no performance degradation)
Automatic or user-controlled read/write capacity provisioning
No joins - create references to other tables manually (table1#something)
Conditional updates and concurrency control (atomic counters)
Option between eventual consistency or strongly consistency
Built-in monitoring
Big Data: Integrates with AWS Elastic MapReduce and Redshift
Can configure TTL to expire table entries
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2018
See also: AWS Geek 2018

↖↑↓ Keys and indexes

Partition key is also called hash attribute or primary key
Must be unique, used for internal hash function (unordered)
Used to retrieve data
You should design your application for uniform activity across all logical partition keys in the Table and its secondary indexes.

↖↑↓ PK & Sort key

Composite PK: index composed of hashed PK (unordered) and SK (ordered)
Sort key is also called range attribute or range key
Different items can have the same PK, must have different SK

↖↑↓ Secondary indexes

Associated with exactly one table, from which it obtains its data
Allows to query or scan data by an alternate key (other than PK/SK)
Only for read operations, write is not supported.

↖↑↓ Projected attributes

Attributes copied from the base table into an index
Makes them queryable
Different projection types
- KEYS_ONLY - Only the index and primary keys are projected into the index
- ALL - All of the table attributes are projected into the index
- INCLUDE - Only the specified table attributes are projected into the index

↖↑↓ Local secondary index

Local as in "co-located on the same partition"
Uses the same PK, but offers different SK
Every partition of a local secondary index is scoped to a base table partition that has the same partition key value
Local secondary indexes are extra tables that DynamoDB keeps in the background
Can only by created together with the base table
Can choose eventual consistency or strong consistency at creation time
Can request not-projected attributes for query or scan operation
Consumes read/write throughput from the original table.

↖↑↓ Global secondary index

Global as in "over many partitions"
Uses different PK and offers additional SK (or none).
PK does not have to be unique (unlike base table)
Queries on the global index can span all of the data in the base table, across all partitions
Can be created after the base table has already been created.
Only support eventual consistency
Have their own provisioned read/write throughput
Global secondary keys are distributed transactions across multiple partitions
Cannot request not-projected attributes for query or scan operation

↖↑↓ Capacity provisioning

Unit for operations:
- 1 strongly consistent read per second (up to 4KB/s)
- 2 eventual consistent read per second (up to 8KB/s)
- 1 write per second (up to 1KB)
Algorithm

.	.
.	300 strongly consistent reads of 11KB per minute
Calculate read/writes per second	`300r/60s = 5r/s`
Multiply with payload factor	`5r/s * (11KB/4KB) = 15cu`
If eventual consistent, devide by 2	`15cu / 2 = 8cu`

↖↑↓ DynamoDB Accelerator (DAX)

Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management. Now you can focus on building great applications for your customers without worrying about performance at scale. You do not need to modify application logic, since DAX is compatible with existing DynamoDB API calls. You can enable DAX with just a few clicks in the AWS Management Console or using the AWS SDK. Just as with DynamoDB, you only pay for the capacity you provision.

↖↑↓ DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.

A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.

Typically targets Lambda function
- Only up to 2 Lambda functions on the same stream, throttling issues otherwise
Underlying implementation is a Kinesis Stream
Can enable Global Table
- Require DynamoDB streams and empty table
- Replicate table in different region (read and write)

↖↑↓ Global Tables

Amazon DynamoDB global tables provide a fully managed solution for deploying a multi region, multi master database, without having to build and maintain your own replication solution. With global tables you can specify the AWS Regions where you want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.

A global table is a collection of one or more replica tables, all owned by a single AWS account.
A replica table is a single DynamoDB table that functions as a part of a global table. Each replica stores the same set of data items.
When you create a DynamoDB global table, it consists of multiple replica tables (one per Region) that DynamoDB treats as a single unit. Every replica has the same table name and the same primary key schema. When an application writes data to a replica table in one Region, DynamoDB propagates the write to the other replica tables in the other AWS Regions automatically.
You can add replica tables to the global table so that it can be available in additional Regions.

↖↑↓ ECS

Overview
- Benefits
Components
Auto Scaling
Logging
ECR
Fargate

↖↑↓ Overview

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, fast, container management service that makes it easy to run, stop, and manage Docker containers on a cluster. You can host your cluster on a serverless infrastructure that is managed by Amazon ECS by launching your services or tasks using the Fargate launch type. For more control you can host your tasks on a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances that you manage by using the EC2 launch type.

On AWS: Service - FAQs - User Guide
See also: AWS Geek 2020
See also: AWS Geek 2017
See also: AWS Geek 2017

↖↑↓ Benefits

Containers without servers
Containerize Everything
Secure
Performance at Scale
AWS Integration

↖↑↓ Components

[Cluster
  [Services
    [Task Definitions
      [Family]
      [Task role/execution role]
      [Network mode]
      [Container Definitions
        [Name/Image]
        [Memory/Port Mappings]
        [Health Check]
        [Environment]
        [Network Settings]
        [Storage and Logging]
        [Security]
        [Resource Limit]
        [Docker labels]
      ]
    ]
  ]
]

Cluster
- Logical grouping of EC2 instances that you can place tasks on
- Instances run ECS agent as a Docker container
- Cluster is an Auto Scaling Group with a Launch Configuration using a special ECS AMI
Service
- Runs and maintains a specified number of tasks simultaneously
- Created on cluster-level, launch type EC2 or Fargate
- Can be linked to ALB/NLB/ELB
- Service type
  - Replica - places and maintains the desired number of tasks across your cluster
  - Demon - deploys exactly one task on each active container instance that meets all of the task placement constraints
    - Good for e.g. monitoring that should run on every container instance
- Deployment type
  - Rolling
    - Controlled by Amazon ECS
    - Service scheduler replacing the current running version of the container with the latest version
  - Blue/Green
    - Controlled by CodeDeploy
    - Allows to verify a new deployment of a service before sending production traffic to it
Task Definition
- ECS allows to run and maintain a specified number containers in a task definition
  - Group by responsility, e.g. separate task definitions for frontend and backend
- The task definition is a text file, in JSON format, that describes one or more containers, up to a maximum of ten, that form your application
- Specify various parameters, eg:
  - Container image to use
  - Port to be opened & networking
  - Data volumes
- Either for ECS or Fargate
Container Definitions
- Can mark container as essential - if that container fails or stops for any reason, all other containers that are part of the task are stopped
Task
- A task is the instantiation of a task definition within a cluster
- If a task should fail or stop, the ECS scheduler launches another instance of the task definition to replace it and to maintain the desired count of tasks in service
- Static host port mapping: Only one task per container instance allowed, e.g. mapping host port 80 to container port
- Dynamic host port mapping: Uses randomized host ports, can work together with ALB to run multiple task instances per container instance
- Tasks can have individual IAM roles

↖↑↓ Auto Scaling

Use Service Auto Scaling for
- Target Tracking Scaling Poilicy
- Step Scaling Policy
- Sdcheduled Scaling
For ECS, we also need to scale the cluster
- This is really tricky, but in essence there's an ASG around the EC2 instances that form the cluster
- Could use Fargate, obviously
- Or even Elastic Beanstalk

↖↑↓ Logging

For tasks, configure logging agent with task definition
- Typically CloudWatch, also supports Splunk
For cluster instances, install CloudWatch Agent
Various ECS-specific CloudWatch metrics available
Various ECS-specific CloudWatch Events available
Can enable CloudWatch Container Insights
- Sends per-container metrics into CloudWatch metrics

↖↑↓ ECR

Needs login aws ecr get-login --no-include-email --region ap-south-2
docker pull aws_account_id.dkr.ecr.us-west-2.amazonaws.com/amazonlinux:latest

↖↑↓ Fargate

Don't need to provision cluster
- Does not need EC2 instance roles to create cluster
Requires VPC

↖↑↓ Elastic Beanstalk (Core Service)

Overview
Concepts
Limits

↖↑↓ Overview

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

Allows to deploy, monitor and scale applications quickly
Focuses on components and performance, not configuration and specification
Aims to simplify or even remove infrastructure management
Provides different options for low cost and high availability quick starts
Underlying instances can be automatically patched
Platform-specific application source bundle (e.g. Java war for Tomcat)
- Go
- Java SE/with Tomcat
- .NET on Windows Server with IIS
- Node.js
- PHP
- Python
- Ruby Stanalone/Puma
- Docker Single-/Multicontainer - runs on ECS
- Docker Preconfigured Glassfish/Python/Go
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2019

↖↑↓ Concepts

↖↑↓ Components

Application
- A logical collection of Elastic Beanstalk components, including environments, versions, and environment configurations.
- An application is conceptually similar to a folder.
- Either application in the traditional sense
- Or application component, e.g. service, frontend, backend
Application Version
- Refers to a specific, labeled iteration of deployable code for a web application
- Unique package that represents a version of the application
- Uploaded as a zipped application source bundle
- Each application can have many application versions
- Can be deployed to one or more environments within an application
- Limit of 1000 version -> can configure application version lifecycle management
Environment
- Collection of AWS resources running an application version
- Isolated, self-contained set of components of infrastructure
- Single-instance or multi-instance scalable
- Application can have multiple environments
  - Either represent development stages (PROD, STAGING, ...)
  - ...or application component, e.g. service, frontend, backend
- Environment has one application
- Can be cloned as a whole environment
- Can rebuild environment - complete delete and rebuild
Environment tier
- Designates the type of application that the environment runs, and determines what resources Elastic Beanstalk provisions to support it
  - Type Web server environment tier
    - Represents a web application
  - Type Worker environment tier
    - Act upon output created by another environment - pulls work from SQS queue
      - Elastic Beanstalk is running a daemon process on each instance that reads from the queue
    - Ideal for long running workloads
    - Should only be loosely coupled to web server environments, eg. via SQS
    - Can also be invoked on a schedule (cron.yml)
Environment configuration
- Identifies a collection of parameters and settings that define how an environment and its associated resources behave
Saved configuration
- A template that you can use as a starting point for creating unique environment configurations
Platform
- Combination of an operating system, programming language runtime, web server, application server, and Elastic Beanstalk components

In CloudFormation stack

Web Application (non-docker)	Web Application (docker, runs on ECS)	Worker
`AWS::AutoScaling::AutoScalingGroup`	`AWS::AutoScaling::AutoScalingGroup`	`AWS::CloudFormation::WaitConditionHandle`
`AWS::AutoScaling::LaunchConfiguration`	`AWS::AutoScaling::LaunchConfiguration`	`AWS::DynamoDB::Table`
`AWS::AutoScaling::ScalingPolicy`	`AWS::CloudFormation::WaitCondition`	`AWS::EC2::SecurityGroup`
`AWS::CloudFormation::WaitCondition`	`AWS::CloudFormation::WaitConditionHandle`	`AWS::SQS::Queue`
`AWS::CloudFormation::WaitConditionHandle`	`AWS::EC2::EIP`	.
`AWS::CloudWatch::Alarm`	`AWS::EC2::SecurityGroup`	.
`AWS::EC2::SecurityGroup`	.	.
`AWS::EC2::SecurityGroupIngress`	.	.
`AWS::ElasticLoadBalancing::LoadBalancer`	.	.

↖↑↓ Configuration precedence

Configuration options, sorted by precedence:

Settings applied directly to the environment

Via console, eb-cli, ...

Existing configuration saved into `.elasticbeanstalk`

Can use eb config ... to save/snapshot a configuration of an application
Saved into .elasticbeanstalk
- Only non-default values are being saved
- Format is *.cfg.yml
This can be modified, uploaded and applied
- Good to backup existing applications
- Can also be applied elsewhere, e.g. in a different region

`.ebextensions` in project

Folder is part of application source bundle
Allows granular configuration & customisation of the EB environment and its resources
Contains *.config in YAML format with different sections like
option_settings
- Global configuration options
resources
- Specify additional resources, allows granular configuration of these
- Put CloudFormation here, export outputs into beanstalk environment
  - Good for e.g. DDB table, SNS topic, ...
- However, those resources are part of the environment and would be deleted with it
  - If this is not desired, then create resources independently
  - Usually best for e.g. databases
commands
- Executed on ec2 instance
- Run before application and web server are being set up
container_commands
- Executed on ec2 instance
- Affects application source code
- Run after application archive has been extracted, but before application version has been deployed
- Can use leader_only to only run on a single instance. E.g. migrate database
- packages, sources, files, users, groups, service * Various package managers are being support (yum, rubygems, ...)
- Applied with next eb deploy

Default values

As the name says.

↖↑↓ Deployment Types

Single instance deploys
HA with Load Balancer
- All at once
- Rolling update
- Rolling with additional batches
- Immutable
  - Adds new Auto Scaling Group to existing environment
- Blue/Green - not really supported, however
  - Can manually create new environment
  - Either use swap URL feature or manually create DNS
- Traffic Splitting via Application Load Balancer
Can configure Elastic Beanstalk to ignore health checks during deployments

↖↑↓ Limits

.	.
Applications	75
Application Versions	1000
Configuration Templates	2000
Environments	200

↖↑↓ Elasticsearch Service

Overview
Elk

Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale. You can build, monitor, and troubleshoot your applications using the tools you love, at the scale you need. The service provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying. Amazon Elasticsearch Service lets you pay only for what you use – there are no upfront costs or usage requirements. With Amazon Elasticsearch Service, you get the ELK stack you need, without the operational overhead.

↖↑↓ Overview

Managed version of Elasticsearch
Runs on servers
Use cases:
- Log Analytics
- Real-time application monitoring
- Security analysis
- Full text search
- Clickstream analytics
- Indexing
Not a good choice for record processing
On AWS: Service - FAQs - User Guide

↖↑↓ ELK

Elasticsearch
- Provides search and indexing capabilities
Logstash
- Log ingestion mechanism
- Agent-based
Kibana
- Provides real-time dashboards on top of data in ES

DynamoDB -> DynamoDB Stream -> Lambda -> AWS ES CloudWatch Logs -> Subscription Filter -> Lambda -> AWS ES (real time) CloudWatch Logs -> Subscription Filter -> Kinesis Firehose -> AWS ES (near real time)

↖↑↓ GuardDuty (Core Service)

Overview

↖↑↓ Overview

Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads. With the cloud, the collection and aggregation of account and network activities is simplified, but it can be time consuming for security teams to continuously analyze event log data for potential threats. With GuardDuty, you now have an intelligent and cost-effective option for continuous threat detection in the AWS Cloud. The service uses machine learning, anomaly detection, and integrated threat intelligence to identify and prioritize potential threats. GuardDuty analyzes tens of billions of events across multiple AWS data sources, such as AWS CloudTrail, Amazon VPC Flow Logs, and DNS log. With a few clicks in the AWS Management Console, GuardDuty can be enabled with no software or hardware to deploy or maintain. By integrating with Amazon CloudWatch Events, GuardDuty alerts are actionable, easy to aggregate across multiple accounts, and straightforward to push into existing event management and workflow systems.

Integrates with CloudWatch Events
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2020

↖↑↓ Kinesis (Core Service)

Overview
Limits

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

[various data sources]->Kinesis Streams->Kinesis Analytics->Kinesis Firehose->[S3]

On AWS: Service - FAQs - User Guide

↖↑↓ Overview

↖↑↓ Kinesis Data Stream

Enables you to build custom applications that process or analyze streaming data for specialized needs
Real-time data delivery
Streams are devided into ordered shards/partitions
- Shards can evolve over time (reshard/merge)
- Records are ordered per shard (but not across shards)
Data retention is 1 day (default), up to 7 days
Ability to process/replay data
- Real-time processing
Multiple applications can consume the same stream
Once date is inserted into Kinesis, it can't be deleted (immutability)
Records
- Data Blob, up to 1MB
- Record Key - helps grouping into shards, should be highly distributed
- Sequence Number - unique identifier for each record put into shards
Producers
- Kinesis SDK, Kinesis Producer Library (KPL), Kinesis Agent, CloudWatch Logs
- 3rd party libraries: Spark, Log4j Appenders, ...
Consumers
- Kinesis SDK, Kinesis Client Library (KCL), Kinesis Connector Library, AWS Lambda
- 3rd party libraries: Spark, Log4j Appenders, ...

↖↑↓ Kinesis Data Firehose

The easiest way to load streaming data into data stores and analytics tools
Near Real-time data delivery (~60 seconds)
Automatic Scaling
Can do data transformation through Lambda
Supports compression for S3
Pay for the amount of data going through Firehose
Producers
- Kinesise Data Streams, CloudWatch Logs & Events, ...
Consumers (data receivers)
- Redshift, S3, ElasticSearch, Splunk

Kinesis Data Streams	Kinesis Firehose
Must manage scaling	Fully managed
Real time	Near real time
Data storage	No data storage
Can write custom code for consumers/producers	Serverless Lambda

For real time delivery Kinesis data streams are the only option.

↖↑↓ Kinesis Data Analytics

The easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time
Performing real time analytics on Kinesis Streams using SQL
Managed, auto-scaling
Can create Kinesis Streams in real-time

↖↑↓ Limits

.	.	..
Kinesis Streams	.	.
.	Producer	1MB/s or 1000 messages/s write per shard (->`ProvisionedThroughputException`)
.	Consumer Classic	2MB/s write per shard
.	.	5 API calls per second per shard
.	Data Retentions	7 days

↖↑↓ Lambda (Core Service)

Overview
Managing Functions
- Versions
- Aliases
- Layers
- Network
- Database
Invoking Functions

↖↑↓ Overview

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

Features
- No servers
- Continuous scaling
  - Cold Start - if no idle container is available to run the Lambda
- Very cheap
- Can give more RAM which will proportionaly increase CPU as well
Supported languages
- nodejs, Java, C#/PowerShell, Python, Golang, Ruby
Can pass in environment variables
- These can be KMS-encrypted as well (need SDK to decrypt)
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2020
See also: AWS Geek 2020

↖↑↓ Managing Functions

triggers -> function & layers -> destinations

↖↑↓ Versions

If you work on a Lambda function, you work on $LATEST
The system creates a new version of your Lambda function each time that you publish the function. The new version is a copy of the unpublished version of the function.
Version is code and configuration
Versions are immutable, you can change the function code and settings only on the unpublished version of a function.
Each version gets its own ARN

↖↑↓ Aliases

You can create one or more aliases for your AWS Lambda function. A Lambda alias is like a pointer to a specific Lambda function version.
Aliases are mutable
Users can access the function version using the alias ARN.
Can create e.g. dev, test and prod.
- Aliases can point to multiple versions with a weight - for canary-style deployments

↖↑↓ Layers

You can configure your Lambda function to pull in additional code and content in the form of layers.
A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies.
With layers, you can use libraries in your function without needing to include them in your deployment package.

↖↑↓ Network

You can configure a function to connect to private subnets in a VPC in your account.
Use VPC to create a private network for resources such as databases, cache instances, or internal services.
Connect your function to the VPC to access private resources during execution.
Provisioning process for Lambda takes longer

↖↑↓ Database

You can use the Lambda console to create an RDS database proxy for your function.
A database proxy manages a pool of database connections and relays queries from a function.
This enables a function to reach high concurrency levels without exhausting database connections.

↖↑↓ Invoking Functions

↖↑↓ Synchronous/Asynchronous/Event Source Invocation

When you invoke a function synchronously, Lambda runs the function and waits for a response.
- -> API Gateway, ALB, Cognito, Lex, Alexa, CloudFront (Lambda@Edge), Kinesis Data Firehose
When you invoke a function asynchronously, Lambda sends the event to a queue. A separate process reads events from the queue and runs your function.
- Lambda manages the function's asynchronous invocation queue and attempts to retry failed events automatically. If the function returns an error, Lambda attempts to run it two more times
- When all attempts to process an asynchronous invocation fail, Lambda can send the event to an Amazon SQS queue or an Amazon SNS topic.
- -> S3, SNS, SES, CloudFormation, CloudWatch Logs & Events, CodeCommit, Config
An event source mapping is an AWS Lambda resource that reads from an event source and invokes a Lambda function.
- -> Kinesis, DynamoDB, SQS

↖↑↓ Function Scaling

The first time you invoke your function, AWS Lambda creates an instance of the function and runs its handler method to process the event.
- When the function returns a response, it sticks around to process additional events.
- If you invoke the function again while the first event is being processed, Lambda creates another instance.
- This continues until there are enough instances to serve all requests, or a concurrency limit is reached.
- When the number of requests decreases, Lambda stops unused instances to free up scaling capacity for other functions.
Concurrency is invocations/s * runtime (eg. 10/s * 4s = 40)
- Can configure reservered concurrency
  - No other function can use that concurrency
- Can configure provisioned concurrency before an increase in invocations
  - Can ensure that all requests are served by initialized instances with very low latency.
Default Burst concurrency limits
- 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
- 1000 – Asia Pacific (Tokyo), Europe (Frankfurt)
- 500 – Other Regions

↖↑↓ Monitoring and troubleshooting

AWS Lambda automatically monitors Lambda functions on your behalf and reports metrics through Amazon CloudWatch. To help you monitor your code as it executes, Lambda automatically tracks the number of requests, the execution duration per request, and the number of requests that result in an error.
It also publishes the associated CloudWatch metrics.
Need custom metric for memory usage

↖↑↓ License Manager

Overview

↖↑↓ Overview

AWS License Manager makes it easier to manage your software licenses from software vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. AWS License Manager lets administrators create customized licensing rules that emulate the terms of their licensing agreements, and then enforces these rules when an instance of EC2 gets launched. Administrators can use these rules to help prevent licensing violations, such as using more licenses than an agreement stipulates. The rules in AWS License Manager enable you to help prevent a licensing breach by stopping the instance from launching or by notifying administrators about the infringement. Administrators gain control and visibility of all their licenses with the AWS License Manager dashboard and reduce the risk of non-compliance, misreporting, and additional costs due to licensing overages.

On AWS: Service - FAQs - User Guide

↖↑↓ Macie

Overview

↖↑↓ Overview

Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, and provides you with dashboards and alerts that give visibility into how this data is being accessed or moved. The fully managed service continuously monitors data access activity for anomalies, and generates detailed alerts when it detects risk of unauthorized access or inadvertent data leaks. Amazon Macie is available to protect data stored in Amazon S3.

Data Sources
- AWS CloudTrail event logs, including Amazon S3 object-level API activity
- S3
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2019

↖↑↓ Managed Services

Overview

↖↑↓ Overview

As enterprise customers move towards adopting the cloud at scale, some find their people need help and time to gain AWS skills and experience. AWS Managed Services (AMS) operates AWS on your behalf, providing a secure and compliant AWS Landing Zone, a proven enterprise operating model, on-going cost optimization, and day-to-day infrastructure management. By implementing best practices to maintain your infrastructure, AWS Managed Services helps to reduce your operational overhead and risk. AWS Managed Services automates common activities, such as change requests, monitoring, patch management, security, and backup services, and provides full-lifecycle services to provision, run, and support your infrastructure. AWS Managed Services unburdens you from infrastructure operations so you can direct resources toward differentiating your business.

On AWS: Service - FAQs - User Guide

↖↑↓ OpsWorks Stacks (Core Service)

Overview
Components
Lifecycle Events
Under the hood

↖↑↓ Overview

AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet. Chef and Puppet are automation platforms that allow you to use code to automate the configurations of your servers. OpsWorks lets you use Chef and Puppet to automate how servers are configured, deployed, and managed across your Amazon EC2 instances or on-premises compute environments.

Declarative desired state engine
- Automate, monitor and maintain deployments
AWS' implementation of Chef
- Original Chef
- AWS-bespoke orchestration components
- Cookbooks define recipes
OpsWorks has three offerings:
- AWS OpsWorks Stacks (<- exam relevant)
- AWS Opsworks for Chef Automate
- AWS OpsWorks for Puppet Enterprise
For Chef 11, BerkShelf is often used
- Allows to use external cookbooks
On AWS: Service - FAQs - User Guide

↖↑↓ Components

Stack
- Set of resources that are managed as a group * Need to enable custom cookbooks, this has to be enabled on stack level
Layer
- Represent and configure components of a stack
- Share common configuration elements
- E.g. load balancer layer, app layer, db layer
- Type: OpsWorks, ECS or RDS
- If a layer has auto healing enabled—the default setting—AWS OpsWorks Stacks automatically replaces the layer's failed instances * After 5 minutes a non-responsive instance becomes failed.
Instance
- Units of compute within the platform
- Must be associated with at least one layer
  - Linux or Windows, but not both
- Can run
  - 24/7
  - Load-based
  - Time-based
    - Servers have to exist and to be pre-assigned to be e.g. load-based. So they are not created on demand.
App
- Applications that are deployed on one or more instances
- Deployed through source code repo or S3
Deployments
- Deploy application code and related files to application server instances
- Deployment operation is handled by each instance's Deploy recipes, which are determined by the instance's layer
  - Can rollback up to 4 versions

↖↑↓ Lifecycle Events

Each layer has a set of five lifecycle events, each of which has an associated set of recipes that are specific to the layer
When an event occurs on a layer's instance, AWS OpsWorks Stacks automatically runs the appropriate set of recipes

.	.
Setup	Occurs after a started instance has finished booting
Configure	Occurs on all of the stack's instances when one of the following occurs: * An instance enters or leaves the online state. * You associate an Elastic IP address with an instance or disassociate one from an instance. * You attach an Elastic Load Balancing load balancer to a layer, or detach one from a layer.
Deploy	Occurs when you run a Deploy command.
Undeploy	Occurs when you run a Undeploy command
Shutdown	Occurs after you direct AWS OpsWorks Stacks to shut an instance down but before the associated Amazon EC2 instance is actually terminated.

↖↑↓ Under the hood

CloudWatch Events integration
- Can configure event rules to trigger alarms
Under the hood
- OpsWorks agent
  - Configuration of machines
- OpsWorks automation engine
  - Create, update & delete of various AWS components
  - Handles load balancing, auto scaling and auto healing
  - Supports lifecycle events
BerkShelf
- Addresses an OpsWorks shortcoming from old versions - only one repository for recipes
- Was added in OpsWorks 11.10 and allows to install cookbooks from many repositories

↖↑↓ Organizations

Overview
- Benefits
Service Control Policies (SCP)
Tag Policies
Limits

↖↑↓ Overview

AWS Organizations offers policy-based management for multiple AWS accounts. With Organizations, you can create groups of accounts, automate account creation, apply and manage policies for those groups. Organizations enables you to centrally manage policies across multiple accounts, without requiring custom scripts and manual processes.

Using AWS Organizations, you can create Service Control Policies (SCPs) that centrally control AWS service use across multiple AWS accounts. You can also use Organizations to help automate the creation of new accounts through APIs. Organizations helps simplify the billing for multiple accounts by enabling you to setup a single payment method for all the accounts in your organization through consolidated billing. AWS Organizations is available to all AWS customers at no additional charge.

On AWS: Service - FAQs - User Guide

↖↑↓ Benefits

Centrally manage policies across multiple accounts
Control access to AWS services
Automate AWS account creation and management
Consolidated billing
- One paying account linked to many linked accounts
- Pricing benefits (Volumes, Storage, Instances)
Create account hierachy with Organizational Units (OUs)
Apply SCPs across the hierachy
Apply Tag Policies across the hierachy

↖↑↓ Service Control Policies (SCP)

Service control policies (SCPs) are one type of policy that you can use to manage your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization, allowing you to ensure your accounts stay within your organization’s access control guidelines. SCPs are available only in an organization that has all features enabled. SCPs aren't available if your organization has enabled only the consolidated billing features. SCPs do not apply for the master account itself.

↖↑↓ Tag Policies

Tag policies are a type of policy that can help you standardize tags across resources in your organization's accounts. In a tag policy, you specify tagging rules applicable to resources when they are tagged.

↖↑↓ Limits

.	.
Maximum linked accounts	20

↖↑↓ Personal Health Dashboard

Overview

↖↑↓ Overview

AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you. While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.

The dashboard displays relevant and timely information to help you manage events in progress, and provides proactive notification to help you plan for scheduled activities. With Personal Health Dashboard, alerts are triggered by changes in the health of AWS resources, giving you event visibility, and guidance to help quickly diagnose and resolve issues.

↖↑↓ QuickSight

Overview

↖↑↓ Overview

Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.

As a fully managed service, QuickSight lets you easily create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.

Data sources
- Amazon Athena
- Amazon Aurora
- Amazon Redshift
- Amazon S3
- Various SQL databases. Snowflake
On AWS: Service - FAQs - User Guide

↖↑↓ Redshift

Overview

↖↑↓ Overview

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.

Queries are written in SQL

↖↑↓ Relational Database Service

Overview
Backups
Multi-AZ deployments
Read replicas

↖↑↓ Overview

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.

Amazon RDS is available on several database instance types - optimized for memory, performance or I/O - and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. You can use the AWS Database Migration Service to easily migrate or replicate your existing databases to Amazon RDS.

↖↑↓ Backups

Automated backups
- Enabled by default
- Allow to recover database within a retention period (1d - 35d)
  - Transactional storage engine recommended as DB engine
  - Will take daily snapshot and use transaction logs
  - PITR down to 1 second (within retention period)
- Backups are taken within defined window
  - Degrades performance if multi-AZ is not enabled!
  - Taken from slave if multi-AZ is enabled
- Backups are stored internaly on S3
  - Free storage space equal to DB size
  - Deleting an instance deletes all automated backups
- Cannot be shared with other accounts (need to be turned into manual snapshots first
Database Snapshots
- Only manually, always user initiated
- Won't be deleted with DB instance

↖↑↓ Multi-AZ deployments

Provide enhanced availability for database instances within a single AWS Region.

Meant for disaster recover, not for performance improvement (-> Read Replica)
Configure RDS for multi-AZ deployments and turn replication on
- Keeps a synchronous standby replica in a different AZ
- Automatic failover in case of planned or unplanned outage of the first AZ
  - Most likely still has downtime
  - Can force failover by rebooting
- Other benefits
  - Patching
  - Backups
- Aurora can replicate accross 3 AZs

↖↑↓ Read replicas

Read queries are routed to read replicas, reducing load on primary db instance (source instance)
To create read replicas, AWS initally creates a snapshot of the source instance
- Multi-AZ failover instance (if enabled) is used for snapshotting
- After that all read queries are then asynchronously copied to read replica
- Implies data latency, which typically is acceptable.
  - ReplicaLag can be monitored and CloudWatch alarms can be configured
- No AWS charges for data replication in same region
A single master can have up to 5 read replicas
- Can be in different regions
- Can have Multi-AZ enabled themselves
Read replicas are not the same as multi-AZ failover instances which
- are synchronously updated
- are designed to handle failover
- don't receive any load unless failover actually happens
Often it is beneficial to have both read replicas and multi-AZ failover instances
Read replicas can be promoted to normal instances
- E.g. use read replica to implement bigger changes on db level, after these have been finished promote to master instance
- This will break replication

↖↑↓ Route 53

Overview
- Terminology
How it works

↖↑↓ Overview

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service. You can use Route 53 to perform three main functions in any combination: domain registration, DNS routing, and health checking. If you choose to use Route 53 for all three functions, perform the steps in this order:

Register domain names
- Your website needs a name, such as example.com. Route 53 lets you register a name for your website or web application, known as a domain name.
Route internet traffic to the resources for your domain
- When a user opens a web browser and enters your domain name (example.com) or subdomain name acme.example.com) in the address bar, Route 53 helps connect the browser with your website or web application.
Check the health of your resources
- Route 53 sends automated requests over the internet to a resource, such as a web server, to verify that it's reachable, available, and functional. You also can choose to receive notifications when a resource becomes unavailable and choose to route internet traffic away from unhealthy resources.

↖↑↓ Terminology

Hosts - Computers or services accessible within a domain
Name Server - Translates domain names into IP addresses
Zone File - Text file that contains mappings between domain names and IP addresses
Records - Entries in zone file, mappings beween resources and names

↖↑↓ How it works

↖↑↓ Basic Flow

Root Server -> TLD Server -> Domain-Level Name Server -> Zone File

↖↑↓ Zone File & Records

Zone file stores records. Various records exists:

Type	Definition	Example
SOA	State of Authority - Mandatory first entry, defines various things, eg name servers & admin contact	`ns1.dnsimple.com admin.dnsimple.com 2013022001 86400 7200 604800 300`
A	Map host name to ip4 address	`px01.vc.example.com. 198.51.100.40`
AAAA	Map host name to ip6 address	`px01.vc.example.com. 2a00:1450:4014:80c:0:0:0:2004`
CNAME	Defines alias for host name (maps one domain name to another)	`www.dnsimple.com. dnsimple.com.`
MX	Defines mail exchange	`example.com. 1800 MX mail1.example.com. 10`
PTR	Maps ip4 address to host name (inverse to A record)	`10.27/1.168.192.in-addr.arpa. 1800 PTR mail.example.com.`
SVR	Points one domain to another domain name using a specific destination port	`_sip._tcp.example.com. 86400 IN SRV 0 5 5060 sipserver.example.com.`

Route53 specific:

Alias record
- Amazon Route 53 alias records provide a Route 53–specific extension to DNS functionality. Alias records let you route traffic to selected AWS resources, such as CloudFront distributions and Amazon S3 bucket. They also let you route traffic from one record in a hosted zone to another record.
- Unlike a CNAME record, you can create an alias record at the top node of a DNS namespace, also known as the zone apex. For example, if you register the DNS name example.com, the zone apex is example.com. You can't create a CNAME record for example.com, but you can create an alias record for example.com that routes traffic to www.example.com
- Preferred choice over CNAME (TODO: why?)

↖↑↓ Route53 Routing Policies

Simple
- Default policy, typically used if only a single resource performs functionality
Weighted
- Control distribution of traffic with DNS entries
  - This can be based on a certain percentage
  - Set routing policy to weighted (instead of failover)
Latency
- Control distribution of traffic based on latency.
Failover
- Can set up health checks for endpoints or domains from within Route53
  - Route 53 has health checkers in locations around the world. When you create a health check that monitors an endpoint, health checkers start to send requests to the endpoint that you specify to determine whether the endpoint is healthy.
  - evaluate target health
- DNS entries are then being associated with health checks and can be configured to failover as well (1 primary and n secondary recordsets)
Geolocation
- Geolocation routing lets you choose the resources that serve your traffic based on the geographic location of your users, meaning the location that DNS queries originate from. For example, you might want all queries from Europe to be routed to an ELB load balancer in the Frankfurt region.
Geoproximity Routing (Traffic Flow Only)
- Geoproximity routing lets Amazon Route 53 route traffic to your resources based on the geographic location of your users and your resources. You can also optionally choose to route more traffic or less to a given resource by specifying a value, known as a bias. A bias expands or shrinks the size of the geographic region from which traffic is routed to a resource.
Multivalue Answer Routing
- Multivalue answer routing lets you configure Amazon Route 53 to return multiple values, such as IP addresses for your web servers, in response to DNS queries. You can specify multiple values for almost any record, but multivalue answer routing also lets you check the health of each resource, so Route 53 returns only values for healthy resources. It's not a substitute for a load balancer, but the ability to return multiple health-checkable IP addresses is a way to use DNS to improve availability and load balancing.

↖↑↓ S3

Overview
Versioning
Logging
Cross-Region Replication
Storage classes
Access Control
- Defaults
- IAM
- Bucket policies
- ACLs

↖↑↓ Overview

Amazon Simple Storage Service (S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 11x9 durability and scale past trillions of objects worldwide.

Key-value storage (folder-like structure is only a UI representation)
Bucket size is unlimited. Objects from 0B to 5TB.
HA and scalable, transparent data partitioning
- Data is automatically replicated to at least 3 separate data centers in the same region
Bucket lifecycle events can trigger SNS, SQS or AWS Lambda
- New object created events
- Object removal events
- Reduced Redundancy Storage (RRS) object lost event
Buckets are per region, but AWS console is global (displaying all bucket for that account)
Bucket names have to be globally unique, should comply with DNS naming conventions.
On AWS: Service - FAQs - User Guide
See also: AWS Geek 2018

↖↑↓ Versioning

Works on bucket level (for all objects)
Versioning can either be unversioned (default), enabled or suspended
Version ids are automatically assigned to objects

↖↑↓ Logging

AWS CloudTrail logs S3-API calls for bucket-level operations (and many other information) and stores them in an S3 bucket. Could also send email notifications or trigger SNS notifications for specific events.
S3 Server Access Logs log on object level.
- Provide detailed records for the requests that are made to a bucket
- Needs to be enabled on bucket level

↖↑↓ Cross-Region Replication

Buckets must be in different regions
- Can replicate cross-account
Must have versioning enabled
Only new/changed objects will be replicated

↖↑↓ Storage classes

.	Durability	Availability	AZs	Costs per GB	Retrieval Fee	.
S3 Standard	11x9	4x9	>=3	$0.023	No	.
S3 Intelligent Tiering	11x9	3x9	>=3	$0.023	No	Automatically moves objects between two access tiers based on changing access patterns
S3 IA (infrequent access)	11x9	3x9	>=3	$0.0125	Yes	For data that is accessed less frequently, but requires rapid access when needed
S3 One Zone IA (infrequent access)	11x9	99.5	1	$0.01	Yes	For data that is accessed less frequently, but requires rapid access when needed
Glacier	11x9	.	>=3	.	Yes	For archival only, comes as expedited, standard or bulk
Glacier Deep Archive	11x9	.	>=3	.	Yes	Longer time span to retrieve
~~S3 RRS (reduced redundancy storage)~~	4x9	4x9	>=3	$0.024	.	Deprecated

↖↑↓ Access Control

Effect – This can be either allow or deny
Principal – Account or user who is allowed access to the actions and resources in the statement
Actions – For each resource, S3 supports a set of operations
Resources – Buckets and objects are the resources
Authorization works as a union of IAM & bucket policies and bucket ACLs

↖↑↓ Defaults

Bucket is owned by the AWS account that created it
- Ownership refers to the identity and email address used to create the account
  - Bucket ownership is not transferable
Bucket owner gets full permission (ACL)
The person paying the bills always has full control.
A person uploading an object into a bucket owns it by default.

↖↑↓ IAM

IAM policies (in general) specify what actions are allowed or denied on what AWS resources
Defined as JSON
Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users)
Use if you’re more interested in “What can this user do in AWS?”

↖↑↓ Bucket policies

Specify what actions are allowed or denied for which principals on the bucket that the policy is attached to
Defined as JSON
Attached only to S3 buckets. Can however effect object in buckets.
Contain principal element (unnecessary for IAM)
Use if you’re more interested in “Who can access this S3 bucket?”
Easiest way to grant cross-account permissions for all s3:* permission. (Cannot do this with ACLs.)

↖↑↓ ACLs

Defined as XML. Legacy, not recomended any more.
Can
- be attached to individual objects (bucket policies only bucket level)
- control access to object uploaded into a bucket from a different account.
Cannot..
- have conditions
- cannot explicitely deny actions
- grant permission to bucket sub-resources (eg. lifecycle or static website configurations)
Other than object ACLs there are bucket ACLs as well - only for writing access log objects to a bucket.

↖↑↓ Secrets Manager

Overview
Automatically Rotating Your Secrets

↖↑↓ Overview

AWS Secrets Manager helps you protect secrets needed to access your applications, services, and IT resources. The service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. Users and applications retrieve secrets with a call to Secrets Manager APIs, eliminating the need to hardcode sensitive information in plain text. Secrets Manager offers secret rotation with built-in integration for Amazon RDS, Amazon Redshift, and Amazon DocumentDB. Also, the service is extensible to other types of secrets, including API keys and OAuth tokens. In addition, Secrets Manager enables you to control access to secrets using fine-grained permissions and audit secret rotation centrally for resources in the AWS Cloud, third-party services, and on-premises.

Allows for easier rotation than SSM Parameter Store
Can trigger Lambda
Deeply integrates into RDS
On AWS: Service - FAQs - User Guide

↖↑↓ Automatically Rotating Your Secrets

Define and implement rotation with an AWS Lambda function
- Creates a new version of the secret.
- Stores the secret in Secrets Manager.
- Configures the protected service to use the new version.
- Verifies the new version.
- Marks the new version as production ready.

↖↑↓ Service Catalog (Core Service)

Overview
Components

↖↑↓ Overview

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog allows you to centrally manage commonly deployed IT services, and helps you achieve consistent governance and meet your compliance requirements, while enabling users to quickly deploy only the approved IT services they need.

Ensure compliance with corporate standards
Help employees quickly find and deploy approved IT services
Centrally manage IT service lifecycle
Connect with ITSM/ITOM software
Self-service for user
- Integrates with self-service portals like ServiceNow
Users of Service Catalog only required IAM permissions for the product, but not the underlying services
On AWS: Service - FAQs - User Guide

↖↑↓ Components

Admins define
- Product
  - Defined in CloudFormation
  - Can be versioned
- Portfolio
  - Collection of products
  - IAM permissions to govern access
Users choose
- from product list
- launches automatically

↖↑↓ Step Functions

Overview
States
Input and Output processing
Error handling
Best Practices

↖↑↓ Overview

AWS Step Functions is a web service that enables you to coordinate the components of distributed applications and microservices using visual workflows. You build applications from individual components that each perform a discrete function, or task, allowing you to scale and change applications quickly.

Step Functions provides a reliable way to coordinate components and step through the functions of your application. Step Functions offers a graphical console to visualize the components of your application as a series of steps. It automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected, every time. Step Functions logs the state of each step, so when things go wrong, you can diagnose and debug problems quickly.

Step Functions manages the operations and underlying infrastructure for you to ensure your application is available at any scale.

On AWS: Service - FAQs - User Guide

↖↑↓ States

.	.	.
`Pass`	Passes its input to its output, without performing work	.
`Task`	Represents a single unit of work performed by a state machine	Can `Retry` after error
`Choice`	Adds branching logic	.
`Wait`	Delays the state machine from continuing for a specified time	.
`Succeed`	Stops an execution successfully	.
`Fail`	Stops the execution of the state machine and marks it as a failure	.
`Parallel`	Create parallel branches of execution	Can `Retry` after error
`Map`	Run a set of steps for each element of an input array	.

↖↑↓ Input and Output processing

InputPath
- Selects which parts of the JSON input to pass to the task of the Task state
OutputPath
- Filter the JSON output to further limit the information that's passed to the output
ResultPath
- Selects what combination of the state input and the task result to pass to the output.
Parameters
- Collection of key-value pairs that are passed as input

↖↑↓ Error handling

By default, when a state reports an error, AWS Step Functions causes the execution to fail entirely.
Task and Parallel states can have a field named Retry, whose value must be an array of objects known as retriers.
An individual retrier represents a certain number of retries, usually at increasing time intervals.
- ErrorEquals (Required)
- IntervalSeconds (Optional)
- MaxAttempts (Optional)
- BackoffRate (Optional)

↖↑↓ Best Practices

Use Timeouts to Avoid Stuck Executions
- Specify a reasonable timeout when you create a task in your state machine
Use ARNs Instead of Passing Large Payloads
Avoid Reaching the History Quota
- Hard quota of 25,000 entries in the execution history. To avoid reaching this quota for long-running executions, implement a pattern that uses an AWS Lambda function that can start a new execution of your state machine to split ongoing work across multiple workflow executions
Handle Lambda Service Exceptions
- Lambda can occasionally experience transient service errors - proactively handle these exceptions in your state machine
Avoid Latency When Polling for Activity Tasks
Choosing Standard or Express Workflows
- Choose Standard Workflows when you need long-running, durable, and auditable workflows,
- Choose Express Workflows for high-volume, event processing workloads.

↖↑↓ Systems Manager (Core Service)

Overview
Components

↖↑↓ Overview

AWS Systems Manager gives you visibility and control of your infrastructure on AWS. Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and allows you to automate operational tasks across your AWS resources. With Systems Manager, you can group resources, like Amazon EC2 instances, Amazon S3 buckets, or Amazon RDS instances, by application, view operational data for monitoring and troubleshooting, and take action on your groups of resources. Systems Manager simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale.

Group resources -> Visualize data -> Take action

Manage EC2 and on-prem instances at scale
- On-prem requires generation of secret activation code/activation id
Get operational insights of infrastructure
Easily detect problems
Patching automation for enhanced compliance
Both Linux and Windows
Tightly integrated with CloudWatch, AWS Config
Free service
SSM Agent
- Installed on instances
- Need correct IAM permissions, then shows up on SSM dashboard
On AWS: Service - FAQs - User Guide

↖↑↓ Components

↖↑↓ Resources groups

Organize your AWS resources.
Make it easier to manage, monitor, and automate tasks on large numbers of resources at one time.
- Define groups based on tags or on CloudFormation stacks
  - Same region only

↖↑↓ Insights

Insights dashboards
- Automatically aggregates and displays operational data for each resource group
Inventory
- Discover and audit the software installed
Configuration Compliance
- Scan your fleet of managed instances for patch compliance and configuration inconsistencies

↖↑↓ Parameter store

Centralized store to manage your configuration data, whether plain-text data such as database strings or secrets such as passwords

↖↑↓ Action & Change

Automation
- Simplifies common maintenance and deployment tasks of EC2 instances and other AWS resources.
- Build Automation workflows to configure and manage instances and AWS resources.
- Create custom workflows or use pre-defined workflows maintained by AWS.
- Receive notifications about Automation tasks and workflows by using Amazon CloudWatch Events.
- Monitor Automation progress and execution details by using the Amazon EC2 or the AWS Systems Manager console.
- Can integrate manual approval step
- Complete list of tasks, unlike run command which is a one-off
- E.g. create Golden AMi
Maintenance windows
- Define a schedule for when to perform potentially disruptive actions on your instances
Change Calendar
- Set up date and time ranges when actions you specify may or may not be performed in your AWS account

↖↑↓ Instances & Nodes

Run command
- Lets you remotely and securely manage the configuration of your managed instances
- Commands are in document format
- Can run on resource group, individually or tag-based
Session manager
- Fully managed AWS Systems Manager capability that lets you manage your EC2 instances, on-premises instances, and virtual machines (VMs) through an interactive one-click browser-based shell or through the AWS CLI.
- Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys
Patch manager
- Automates the process of patching managed instances with both security related and other types of updates
- AWS predefined patch baselines per operating system
  - Can also define own patch baselines
    - Patch items in approved or rejected list, e.g. 'CVE-2020-1234567'
    - Can define own patch source
- Define Maintenance Window when patches are possibly executed
- Use AWS-RunPatchBaseline run command
- Can also evaluate compliance without applying patches
State manager
- Secure and scalable configuration management service that automates the process of keeping your Amazon EC2 and hybrid infrastructure in a state that you define
SSM Documents
- JSON format
- Different types:
  - Command
  - Automation
  - Policy
  - Session

↖↑↓ Trusted Advisor (Core Service)

Overview

↖↑↓ Overview

AWS Trusted Advisor is an online tool that provides you real time guidance to help you provision your resources following AWS best practices. Whether establishing new workflows, developing applications, or as part of ongoing improvement, take advantage of the recommendations provided by Trusted Advisor on a regular basis to help keep your solutions provisioned optimally.
Global service
Creates recommendations for
- Cost optimization
- Performance
- Security
- Fault tolerance
- Service limits
Trusted Advisor check results are raised as CloudWatch Events
- Automate by triggering Lambdas
- Events are only available in us-east-1
Checks are refreshed on visits to the dashboard (max every 5 minutes)
- Otherwise weekly
- Can trigger refresh via API
On AWS: Service - FAQs - User Guide

↖↑↓ X-Ray (Core Service)

Overview

↖↑↓ Overview

AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture. With X-Ray, you can understand how your application and its underlying services are performing to identify and troubleshoot the root cause of performance issues and errors. X-Ray provides an end-to-end view of requests as they travel through your application, and shows a map of your application’s underlying components. You can use X-Ray to analyze both applications in development and in production, from simple three-tier applications to complex microservices applications consisting of thousands of services.

X-Ray demon would run on EC2-instances/Elastic Beanstalk instances/ECS
X-Ray SDK to send signals
X-Ray API collects information
- Automation could base on regular polling of GetServiceGraph
X-Ray Console displays information in service map
On AWS: Service - FAQs - User Guide

↖↑↓ Etc

↖↑↓ Random Information from practice questions

Aurora
CloudFront
CloudWatch Metrics
CodeBuild
CodeCommit
CodePipeline
Cognito
Direct Connect
EBS
EC2
Elastic Beanstalk
Elastic Load Balancing
ECR
Fargate
GitHub
IAM
Kinesis Data Streams
Personal Health Dashboard
RDS
S3
Secrets Manager
Server Migration Service
SQS
Trusted Advisor
SSO

↖↑↓ Aurora

Can have up to 15 read replicas
Can have a Global Database, spawning across multiple regions
- Available for both MySQL and PostgreSQL
Can have 'Reader Endpoint' - load-balances connections to available Aurora Replicas in an Aurora DB cluster

↖↑↓ CloudFront

You must ensure that the certificate you wish to associate with your Alternate Domain Name is from a trusted CA, has a valid date and is formatted correctly. Wildcard certificates do work with Alternate Domain Names providing they match the main domain, and they also work with valid Third Party certificates. If all of these elements are correct, it may be that there was an internal CloudFront HTTP 500 being generated at the time of configuration, which should be transient and will resolve if you try again.

↖↑↓ CloudWatch Metrics

aws cfn put-metric-data - Publishes metric data points to Amazon CloudWatch
Use EstimatedCharges metric to track your estimated AWS charges

↖↑↓ CodeBuild

CODEBUILD_SOURCE_VERSION - For CodeCommit, it is the commit ID or branch name associated with the version of the source code to be built.
MSBuild container image is required for building .NET applications

↖↑↓ CodeCommit

Value for local ssh-config needs to match SSH key id from IAM Security Credentials

↖↑↓ CodePipeline

CodePipeline Pipeline Execution State Change. is the detail-type of the CloudWatch Event raised for pipeline failures
AWS DeviceFarm is an action provider

↖↑↓ Cognito

Amazon Cognito provides authentication, authorization, and user management for your web and mobile apps. Your users can sign in directly with a user name and password, or through a third party such as Facebook, Amazon, Google or Apple.

↖↑↓ Direct Connect

Direct Connect is the only way to access your AWS resources from a Data Center without traversing the internet

↖↑↓ EBS

In order to encrypt a EBS snapshot, copy the unencrypted snapshot and tick the checkbox to encrypt the target
Amazon Data Lifecycle Manager (DLM) for EBS Snapshots provides a simple, automated way to back up data stored on Amazon EBS volumes. You can define backup and retention schedules for EBS snapshots by creating lifecycle policies based on tags.

↖↑↓ EC2

In order to get access to the CPU sockets for billing purposes, you need to use EC2 Dedicated Hosts
To maximise networking performance, Jumbo frames (9001 MTU) allow more than 1500 bytes of data

↖↑↓ Elastic Beanstalk

Dockerrun.aws.json v2 is the file to configure a multi-container Docker environment

↖↑↓ Elastic Load Balancing

IPv6 is only supported by Application Load Balancers, not NLB, not Classic
Network Load Balancers do not use security groups. This is different from Classic Load Balancer or Application Load Balancer.

↖↑↓ ECR

Adding the SHA256 to a docker image URL makes sure that ECS get the latest images. Otherwise, it might still get the previous :latest.

↖↑↓ Fargate

If a container image requires many network connections (e.g. Websocket) it's better installed as multiple tasks across an ECS Cluster
- One ENI per task
- ECS: ENIs come from underlying instance

↖↑↓ GitHub

The number of OAUTH tokens is limited and CodePipeline might stop working with older tokens

↖↑↓ IAM

Create and configure an IAM SAML Identity Provider, create a role with a SAML Trusted Entity, Configure AD, Configure ADFS with Relay Party, Create Custom Claim Rules
Can store server certificates - only recommended for regions that don't support ACM
Access Advisor show last services accessed per OU

↖↑↓ Kinesis Data Streams

When using KCL, make sure getRecords is not throwing unhandled exceptions
Ensure the maxRecords value for the GetRecords call isn't set below the default setting
When resharding, sometimes a small shard is left over
- This occurs when the width of a shard is very small in size in relation to other shards in the stream. This is resolved by merging with any adjacent shard.

↖↑↓ Personal Health Dashboard

The AWS_RISK_CREDENTIALS_EXPOSED is exposed by the Personal Health Dashboard service.
Integrates with CloudWatch Events, but cannot send notifications directly

↖↑↓ RDS

EngineVersion - The version number of the database engine to use.

↖↑↓ S3

When encrypting at rest, SSE-S3 is more performant as SSE-KMS, as the latter gets throttled above 10,000 objects per seconds
The Amazon S3 notification feature enables you to receive notifications when certain events happen in your bucket.

↖↑↓ Secrets Manager

Length of a secret - 65,536 bytes
You should ask the external party for a DB user with at least two credential sets or the ability to create new users yourself. Otherwise, you might encounter client sign-on failures. The risk is because of the time lag that can occur between the change of the actual password and - when using Secrets Manager - the change in the corresponding secret that tells the client which password to use.

↖↑↓ Server Migration Service

The Server Migration Service replication job generates an AMI after the job is finished. However, it does not automatically launch EC2 instances.

↖↑↓ SQS

You can use Amazon S3 and the Amazon SQS Extended Client Library for Java to manage Amazon SQS messages. This is especially useful for storing and consuming messages up to 2 GB in size.

↖↑↓ Trusted Advisor

In CloudWatch Events, use check item refresh status to only observe some events

↖↑ SSO

You can use the User Principal Name (UPN) or the DOMAIN\UserName format to authenticate with AD, but you can't use the UPN format if you have two-step verification and Context-aware verification enabled.
AWS Organisations and the AWS Managed Microsoft AD must be in the same account and the same region
AD Connector is a directory gateway with which you can redirect directory requests to your on-premises Microsoft Active Directory without caching any information in the cloud.