AWS Exam Notes
.
. Groups Identity-Policy
.
. IAM-User Principal
.
. Role Session-Policy STS
.
. Resources Resource-ACLs
.
. Actions Resource-Policies
.
. Permission-Boundaries Trust-Policy
.
. SCP
.
. SSO SAML OIDC IAM-Identity-Center
.
. Federated User
.
Policies start with empty permissions. If there is any explicit Deny rule, that takes precedence.
- Attach managed and inline policies to IAM identities (users, groups or roles).
- Grants permissions to an identity.
- The keywords in the policy are: Effect: Allow | Deny, Action, Resource.
- The user to which this policy is applied is the implicit Principal.
- Attach inline policies to specific resources. (No managed policies) e.g. S3 buckets, SQS Queues, VPC Endpoints,
- Not all services support resource level policies for their resources.
- IAM role trust policy is a kind of Resource based policy (resource being Role)
- Grants permissions to the principal that is specified in the policy.
- Principals can be in the same account as the resource or in other accounts.
- Principal can be User, Role, Group, Service or AnonymousUser.
- The keywords in the policy are: Effect, Action, Resource, "Principal", Condition.
- The resource being attached is the implicit Resource for the policy.
- If "Resource" keyword present, it should not conflict. i.e. could be * or same resource.
- Defines the maximum permissions but does not grant permissions.
- Use a managed policy as permissions boundary for an IAM entity (user or role).
- Permissions boundaries does not limit permissions given by resource-based policy.
- Note: e.g. Resource based S3 policy can allow though user perm policy does not! Ofcourse, there should not be any explicit Deny in user policy.
Use an AWS Organizations service control policy (SCP) to define the max permissions.
You can attach SCP to either of the following:
- Root account - Does not apply to Root Account, but applies to all children OUs and ACs.
- organizational unit - Applies to OU root and member accounts.
- Member Account.
SCPs limit permissions that identity-based policies or resource-based policies grant to entities (users or roles) within the account, but do not grant permissions.
Note: SCP is more strict than Permission boundary applied to IAM identity.
- Use ACLs to control which principals in other accounts can access the attached resource. ACLs are similar to resource-based policies but uses non-JSON and for remote accounts only. ACL can not be used to grant permissions for entities in same account.
- Note: S3 ACL can be attached to both bucket and objects.
- Inline Policy on Session Creation. Max Permission. Role Session or Federated Use Session.
- Session policies limit permissions for a created session, but do not grant permissions.
- Applies to dynamic sessions only either (1) Role Session or (2) Federated User Session.
- Role session is created by AssumeRole* API with inline session policy JSON document and maxiumm 10 managed session policy ARNs. Can be invoked by IAM user or from role session. Note that session policy is optional.
- Federated user session is created using GetFederationToken API. This can only be invoked by IAM user and not from role session. Must pass atleast one session policy.
The following Trust Policy allows 2 services to assume the associated role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [ "elasticmapreduce.amazonaws.com", "datapipeline.amazonaws.com" ]
/* "AWS": "arn:aws:iam::987654321098:root" to allow external account */
},
"Action": "sts:AssumeRole"
}
]
}
Principal mainly refers to an "Actor" as opposed to "resource" being acted upon:
. Action
. Principal ----------------------> Resource
.
. { User }
. { Role or Service }
. { AnonymousUser }
. { Role Sessions }
. { Federated User Sessions }
. { AnonymousUser }
It is explicitly referanced from "Resource Policy".
Group is just permissions and does not represent the identity, so it is not allowed.
There is subtle difference between the Principal being Role or Role-Session:
/* Role: */
"Principal": { "AWS": "arn:aws:iam::AWS-account-ID:role/role-name" } /* Role */
/* Assumed-role session */
"Principal": { "AWS": "arn:aws:sts::AWS-account-ID:assumed-role/role-name/role-session-name" }
Just specifying role, all role sessions based on that roles qualify. But specifying role-session, only that role session based on that role qualify.
Other example Principals:
/* Federated OIDC Provider Principal. Works with OAuth. IdP is OIDC */
"Principal": { "Federated": "accounts.google.com" }
"Principal": { "Federated": "cognito-identity.amazonaws.com" }
/* Assumed Role session principal for AssumeRoleWithWebIdentity is similar to AssumeRole. */
"Principal": { "AWS": "arn:aws:sts::AWS-account-ID:assumed-role/role-name/role-session-name" }
/* Federated SAML Provider Principal IdP is SAML. e.g. Active Directory */
"Principal": { "Federated": "arn:aws:iam::AWS-account-ID:saml-provider/provider-name" }
/* Assumed role session Principal for SAML is similar but with no session name */
"Principal": { "AWS": "arn:aws:sts::AWS-account-ID:assumed-role/role-name" } /* SAML Session */
"Principal": { "AWS": "arn:aws:iam::AWS-account-ID:user/user-name" } /* Regular IAM User */
/* STS Federated User session. GetFederationToken API.
* IAM Center multi-account permissions.
*/
"Principal": { "AWS": "arn:aws:sts::AWS-account-ID:federated-user/user-name" }
/* Service Principal. */
"Principal": { "Service": "s3.amazonaws.com" }
/* Region name may be required if cross region access is involved */
"Principal": { "Service": "s3.us-east-1.amazonaws.com" }
Consider:
Resource Policy Permissions => Resource User Policy Permissions => User Role Policy Permissions => Role (Identity based Permission = User or Role) Session Policy Max Permission = Inline Json Policy / Managed Session Policies = Session
Scenarios:
IAM User Access to any Resource. := (Resource + User) Permissions AssumeRole User Access to any Resource. := (Resource + Role) And Max Session Perms Federated User Access to any Resource. := (Resource + User) And Max Session Perms If Permission Boundary Exists for User or Role := Above Permission And Max Perm Boundary If SCP (Service Control Policy) Exists := Any scenario is limited by max SCP Perms.
Here is Admin Identity based policy :
{
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : "*",
"Resource" : "*"
}
]
}
Policies can contain variables tags e.g. ${aws:username} AWS specific aws:CurrentTime, Service specific s3:prefix, Tag based, iam:ResourceTag/key-name, aws:PrincipalTag/key-name, ...
.
. Service Linked Role is not same as Service Role.
.
::
# To create service linked role ... aws iam create-service-linked-role
--aws-service-name SERVICE-NAME.amazonaws.com ...
# To create service role with the trust policy. aws iam create-role --role-name Test-Role --assume-role-policy-document file://Test-Role-Trust-Policy.json
You just need "iam:CreateRole" action permission to create Service role.
To create service linked role, you need: "iam:CreateServiceLinkedRole":
arn:aws:iam::*:role/SERVICE-ROLE-NAME # This is Service Role. Below is Linked Role!
arn:aws:iam::*:role/aws-service-role/SERVICE-NAME.amazonaws.com/LINKED-ROLE-NAME-PREFIX*
Some services support multiple service roles.
The linked service also defines how you create, modify, and delete a service-linked role.
A service might automatically create or delete the role.
It might allow you to create, modify, or delete the role as part of a wizard or process in the service.
Or it might require that you use IAM to create or delete the role.
Regardless of the method, service-linked roles simplify the process of setting up a service.
Concepts:
STS AssumeRole
Federation
Role-Session
Session-Policy SAML2 OIDC
STS is useful in following scenarios:
Federation with SAML 2.0 and OIDC and access control:
To compare AWS STS APIs, see: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts-comparison.html
--------------------------------------------------------------------------------------------
API Who can call and Comments
--------------------------------------------------------------------------------------------
AssumeRole Caller: IAM user or IAM Role. (Role chaining allowed. Expires in 1 hr)
Session tags are transitive and persist after role chaining.
Optional Session Policy.
--------------------------------------------------------------------------------------------
AssumeRoleWithSAML Caller: Any user;
SAML setup with mutual trust must already done with SAML Idp.
SAML IdP (like AD FS) will issue the claims required by AWS using assertions.
App must pass a SAML assertions to STS to assume preferred role.
Use Case:
Map (SAML) external users to AWS Role. No link to local IAM users.
Inline session policies can apply user specific restrictions.
--------------------------------------------------------------------------------------------
AssumeRoleWithWebIdentity
Caller: Any user;
Must pass an OIDC or OAuth 2.0 compliant JWT token from a known IdP
You can make the IdP trust relationship with AWS.
e.g. Github, account.google.com already trusts AWS.
Role's trust policy should point to the external IdP.
Use aud condition in role trust policies to verify that the tokens used
to assume roles are intended for that purpose e.g. AppName.
Use Case:
Github actions to access AWS resources.
Map (OIDC) external users to AWS Role. No link to local IAM users.
Inline session policies can apply user specific restrictions.
--------------------------------------------------------------------------------------------
GetFederationToken Caller: IAM user or AWS account root user. Not by role-session.
The resulting session can not call AssumeRole.
Supports session policy to restrict permissions.
Use Case: Grant proxy application limited temp credentials.
Application internal or federated users need credentials.
Can create this session on behalf of internal or federated user.
Different session policies per user.
--------------------------------------------------------------------------------------------
GetSessionToken Caller: IAM user or AWS account root user. Not by role-session.
Session Policy not supported.
Use Case: Protect long term credentials and use temp credentials as
proxy to IAM user.
--------------------------------------------------------------------------------------------
.
. AssumeRole AssumeRole
. IAM User ------------------> Role-Session ---------------> Role-Session
. or [Policy] [Expires 1 hr]
. Role Session [Tags]
. [Session-Name]
. [external-Id]
.
Session tags are transitive so they persist when roles are chained.
Optional session policies can further restrict permissions.
Role session with temp credentials is valid between 15 mins to 12 hours.
With Role chaining max valid time is 1 hour only.
You can have role permission policy restricting based on tags/sessionName/ExternalId.
The externalId is optional and typically used to specify the application name.
You can restrict permission based on session Name and ExternalId as well in that role policy:
{
...
"Effect": "Allow",
"Action": "sts:AssumeRole",
...
"Condition": { "StringEquals": {"sts:ExternalId": "my-app-name"} }
// { "stringEquals": { "aws:PrincipalTag/Dept" : "HR" }
}
aws sts assume-role \
--role-arn arn:aws:iam::123456789012:role/myRole \
--role-session-name my-session
--tags Key=dept,Value=dev
--external-id my-app-name
--policy-arns <arn1>,<arn2>
Output:
{
"AssumedRoleUser": {
"AssumedRoleId": "AROA3XFRBF535PLBIFPI4:my-session",
"Arn": "arn:aws:sts::123456789012:assumed-role/myRole/my-session"
},
"Credentials": {
"SecretAccessKey": "9drTJvcXLB89EXAMPLELB8923FB892xMFI",
"SessionToken": "...",
"Expiration": "2016-03-15T00:05:07Z",
"AccessKeyId": "ASIAJEXAMPLEXEG2JICEA"
}
}
# Note: Principal Of Session: "arn:aws:sts::123456789012:assumed-role/myRole/my-session"
#
# Session Policy is optional and can further restrict permissions.
# Role Policy can restrict permissions based on Tags and external-Id
# (also by session name but difficult and not recommended)
.
.
.
. AssumeRoleWithSAML
. IAM User -------------------------> Role-Session
. or [Session-Policy]
. Role Session [Saml-Assertion]
. [Saml-Provider-ARN]
.
. Note: Tags are passed through SAML assertion by IdP using PrincipalTag attribute.
.
.
. Note: Role should Trust Saml-Provider Principal.
.
aws sts assume-role-with-saml \
--role-arn arn:aws:iam::123456789012:role/my-role \
--principal-arn arn:aws:iam::123456789012:saml-provider/my-onpremise-saml-Idp \
--policy-arns arn1,arn2
--saml-assertion "..."
Output:
{
"Issuer": "https://my-onpremise.example.com/idp/shibboleth", # SAML Idp. e.g. On-premise AD
"AssumedRoleUser": {
"Arn": "arn:aws:sts::123456789012:assumed-role/my-role",
"AssumedRoleId": "ARO456EXAMPLE789:my-role" # Internal RoleId:rolename
},
"Credentials": {
"AccessKeyId": "ASIAV3ZUEFP6EXAMPLE",
"SecretAccessKey": "8P+SQvWIuLnKhh8d++jpw0nNmQRBZvNEXAMPLEKEY",
"SessionToken": "...",
"Expiration": "2019-11-01T20:26:47Z"
},
"Audience": "https://signin.aws.amazon.com/saml", // Service Provider: i.e. AWS
"SubjectType": "transient",
"PackedPolicySize": "6",
"NameQualifier": "SbdGOnUkh1i4+EXAMPLExL/jEvs=",
"Subject": "my-onpremise-user-john"
}
For more information about SAML, See: * https://awskarthik82.medium.com/saml-faq-frequently-asked-questions-ba0ab447e3f5
.
.
.
. IAM User AssumeRoleWithWebIdentity
. Or Role ------------------------------> Role-Session
. [Session-Policy]
. [Session-Name]
. [Identity-Token]
. [Provider-Id-like-google-cognito]
.
. Note: Principal_Tags and Transitive_Tags are passed through claims in Token only.
.
. Note: Cognito could be Web IdP but not a SAML IdP.
aws sts assume-role-with-web-identity \
--duration-seconds 3600 \
--role-session-name "my-app-session" \
--provider-id "www.amazon.com" \ # Or google.com, facebook.com, or cognito etc.
--policy-arns arn1,arn2
--role-arn arn:aws:iam::123456789012:role/my-role-for-web \
--web-identity-token "..."
Output:
{
"SubjectFromWebIdentityToken": "amzn1.account.AF6RHO7KZU5XRVQJGXK6HB56KR2A",
// Subject is Unique Id in provider. e.g. your-email in google.com
"Audience": "client.5498841531868486423.1548@apps.example.com",
// Audience is either service provider or client application that must be registered
// with OIDC Idp for requesting login and return claims.
"AssumedRoleUser": {
"Arn": "arn:aws:sts::123456789012:assumed-role/my-role-for-web/my-app-session",
"AssumedRoleId": "AROACLKWSDQRAOEXAMPLE:my-app-session"
}
"Credentials": {
"AccessKeyId": "AKIAIOSFODNN7EXAMPLE",
"SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY",
"SessionToken": "...",
"Expiration": "2020-05-19T18:06:10+00:00"
},
"Provider": "www.amazon.com"
}
# Note Assumed Role Principal Format:
# "arn:aws:sts::123456789012:assumed-role/my-role-for-web/my-app-session
#
.
.
.
. GetFederationToken
. IAM User ------------------------------> Federated User Session (Not a Role-Session)
. only [Federated-User-Name]
. [Session-Policy]
. [Tags]
.
.
.
aws sts get-federation-token \
--name Bob \
--policy file://myfile.json \
--policy-arns arn=arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--duration-seconds 900
--tags Key=dept,Value=dev
Output:
{
"Credentials": {
"AccessKeyId": "ASIAIOSFODNN7EXAMPLE",
"SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"SessionToken": "...",
"Expiration": "2023-12-20T02:06:07+00:00"
},
"FederatedUser": {
"FederatedUserId": "111122223333:Bob",
"Arn": "arn:aws:sts::111122223333:federated-user/Bob"
},
"PackedPolicySize": 36
}
# Note Principal of the User session (does not include Caller IAM user info):
# "Arn": "arn:aws:sts::111122223333:federated-user/Bob"
.
.
.
. GetSessionToken
. IAM User ------------------------------> Temporary User Session
. only [Duration-12hrs-Default]
.
.
aws sts get-session-token --duration-seconds 900 --serial-number "YourMFADeviceSerialNumber"
--token-code 123456
Output:
{
"Credentials": {
"AccessKeyId": "ASIAIOSFODNN7EXAMPLE",
"SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY",
"SessionToken": "...",
"Expiration": "2020-05-19T18:06:10+00:00"
}
}
# Note principal ARN of session is same as the calling user's IAM user ARN.
# "AWS": "arn:aws:iam::AWS-account-ID:user/user-name"
.
. Assertions Protocol Bindings
.
. Trusts
. IdP <-----------------> FederationServer ServiceProvider(AWS)
. ^ ^
. | Auth Access Service |
. +--------- User/App ------------------------------------------+
.
OpenID Connect is an interoperable authentication protocol based on OAuth 2.0 framework.
Use Cases:
Concepts :
. OP - OpenID Provider or IdP
. Client - Client Software (Must be registered with OP)
. User - User uses the Client to initiate authentication with OP.
. RP - Relying Party (aka SP - Service Provider e.g. AWS or Web Application)
.
Flow:
.
. OP(IdP) -----> Client (WebApp) -----> RP/SP (AWS)
. User
.
Technology :
.
. Protocols: Discovery Dynamic-Client-Registration
. Session Management
. --------------------------------------------
.
. OAuth2.0: Core Bearer Assertions JWT-Profile
.
IAM Identity Center is a service which is enabled at management account.
The target of integration for IAM Identity Center is multiple AWS accounts (SSO) or Applications (e.g. back-end for a mobile application).
It can be used for SSO or just federated identity broker for your mobile or web application.
Identity Center is a recommended solution for integrating on-premise AD and even SAML mobile applications but it is not required if you do not require SSO. You can integrate with plain simple IAM also.
One login (single sign-on) for all your:
Identity Providers:
AWS Directory Services:
AWS Managed Microsoft AD :
| auth Trust Mutual AWS Managed auth
| <-----> On-Prem-AD <---------------> MS AD <------->
|
| Note: MFA Supported.
AD Connector: Proxy for on-premise AD in AWS :
| Proxy auth
| On-Prem-AD <------------ AD Connector <------->
|
| Note: MFA Supported.
Simple AD: AD Compatible managed directory on AWS :
| auth
| Simple AD <------->
|
| Note: MFA Not Supported.
IAM identity center can be configured with TTI (Trusted Token Issuer) for integrating with OIDC Idps so that the token from OIDC Idp can be exchanged with token from AWS.
Keycloak, superTokens are good opensource alternatives to Cognito.
IAM Identity Center does not support OIDC. Only SAML. Not for mobile application integration. Only for business and browser based integration.
Just for integrating mobile applications it is better to use Cognito identity pools with external IdP instead of Identity Center. Or you can use plain IAM with external IdP if you do not need SSO.
.
. Identity-Pools User-Pools Hosted-UI SignIn Refresh-Tokens
.
. Renew Using
. SignIn/SignUp Idp Issues Cognito Issues Refresh Token -
. User --------------> Authenticate OR ---> IdToken ---> AccessToken ----> New IdToken
. HostedUI Redirect IdP JWT (Short Lived) New AccessToken
. UserPool (OIDC) RefreshToken |
. (Long Lived) |
. (AccessKey, |
. SecretKey, <----- Identity Pool <----------+
. Session Token) STS
.
.
. Login to AssumeRoleWithWebIdentity Temp AWS Credential
. User -----------> IdentityToken ----------------------------> (AccessKey, SecretKey, SessionToken)
. IdP STS
.
.
Use Access Analyzer
When you assume role (user, app or service) you give up your original permissions and take the permissions of the role.
When you have resource based policy the principal does not need to give up original permission.
IAM Permissions Boundaries: Set max permission for IAM entity. Can be used along with AWS Organization SCP (Service Control Policy) identity based policy (e.g. original user permissions)
Assume Role typically called by IAM user or externally authenticated user (SAML or OIDC) already using a role.
AssumeRole can also be chained. i.e. A role can assume another role.
Root user can not call AssumeRole.
Suppose you want user A not to be able to terminate EC2 instances by default. But allow the user to explicitly assume Role R which can terminate EC2 instances. You can protect that role using MFA, if need be. It is like sudo in unix. You can do it by actively performing "AssumeRole" operation but you can't accidentally delete anything. Also when assuming a Role, you lose your original prvileges. These are all audited using CloudTrail logs. :
{User A} ----AssumeRole-->{Role A}-->{Can Terminate EC2}
You can let services to assume role on behalf of you. You create a role and make that service as a trusted service for this role. And when you initiate certain action like EC2 Run Instance, this iam:PassRole privileges is used. Note that Service also should have AssumeRole permission in addition.:
{User A} ---PassRole-->{EC2-Service}---AssumeRole-->
For Cross account access from say origin account (Dev) to destination (production), you create a role in destination account and associate a "trust policy" which identifies the origin (Dev) account as trusted entity. Also Dev account admin should allow selected IAM users to assume that remote role in Production account. Not all users in trusted account (Dev) get access to the role in trusting (Production) account.:
{ Production Account } { Dev Account }
{ Role: UpdateProdS3 } <---Req Access to Role--- { Group: Dev }
{ Admin allows remote Dev Group } -----STS Credentials----> { }
{ to assume UpdateProdS3 Role }
{ S3Bucket: ProductionS3 }
Providing 3rd Party Access:
Zone of Trust = Accounts, Organizations that you own.
For granting access to 3rd Party you need:
SSO and SAML: Security Assertion Mark-up Language (SAML) is an authentication standard for federated identity management and can support single sign-on (SSO). SSO allows a user to log in with one ID/password to other federated software systems. SAML is an XML-based open-standard for transferring identity data between two parties: an identity provider (IdP) and a service provider (SP) like web application or SSO provider.
SAML supports both authentication and authorization where as OAuth is primarily for authorization.
STS Important APIs:
AssumeRole: access role within your account or cross-account.
AssumeRoleWithSAML: return credentials for users logged in with SAML:
AssumeRoleWithWebIdentity: return creds for users logged with an IdP:
GetSessionToken: for MFA, from a user or AWS account root user
GetFederationToken: obtain temp creds for a federated user. Calling IAM user credentials
are used as the basis. Unlike AssumeRole, there is no Role involved
here.
Notes on LDAP vs Active Directory:
Example SAML based SSO service providers are:
Amazon Single Sign-on (AWS SSO) Federation is new recommended method compared to old SAML 2.0 Federation.
Tagging of resources:
Root User Considerations:
There are various ways on-premise server connects to AWS: * The first being using access key. * The other being getting instance role by installing SSM Manager agent. Only SSM related basic permissions. * Using CodeDeploy to deploy your initial STS temp permissions along with Application. More later. * There is one more -- IAM Anywhere Service -- by using certificates.
. Trust
. Private CA <-------------> On-premise
. ------------> Workstation
. Temp Credential
.
. Certificate
. IAM Role <--------------- On-premise
. Cert Condition CN=onprem Workstation
.
.
Commands:
# Create trust anchor first using AWS Certificate manager private CA.
aws iam create-role --role-name ExampleS3WriteRole \
--assume-role-policy-document file://<path>/rolesanywhere-trust-policy.json
# You can optionally use condition statements based on the attributes of X.509 certificate
# to further restrict the trust policy.
aws iam put-role-policy ... --policy-document file://<path>
aws_signing_helper credential-process \
--certificate /path/to/certificate.pem \
--private-key /path/to/private-key.pem \
--trust-anchor-arn <TA_ARN> \
--profile-arn <PROFILE_ARN> \
--role-arn <ExampleS3WriteRole_ARN>
.
.
. On-Premise Server
. Code Deploy Agent (/etc/codedeploy-agent/conf)
. IAM User/Role Credential
. SSM Agent (Role for SSM to Assume)
.
.
Main Use Case: Generate policy for IAM user/role based on past activity across accounts.
IAM feature works across AWS Organization.
All features:
# View the list of users, their last used access keys, age, last used service etc.
aws iam generate-credential-report
aws iam get-credential-report
Condition Keys classified as:
Note: The keys won't be present when it is not applicable.
Condition keys can be used in policies such as the following:
{
"Version": "2012-10-17",
"Id": "ExamplePolicy",
"Statement": [
{
"Sid": "AllowGetRequestsReferer",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/*",
"Condition": {
"StringLike": {
"aws:Referer": "https://www.example.com/*"
}
}
}
]
}
Properties of Principal:
aws:PrincipalArn aws:PrincipalAccount aws:PrincipalOrgPaths aws:PrincipalOrgID
aws:PrincipalTag/tag-key aws:PrincipalIsAWSService aws:PrincipalServiceName
aws:PrincipalServiceNamesList aws:userid aws:username
aws:PrincipalType (Account | User | FederatedUser | AssumedRole | Anonymous)
Properties of network:
aws:SourceIp aws:SourceVpc aws:SourceVpce aws:VpcSourceIp
Properties of resource:
aws:ResourceAccount aws:ResourceOrgID aws:ResourceOrgPaths aws:ResourceTag/tag-key
Properties of Request:
aws:RequestTag/tag-key aws:TagKeys
aws:CalledVia aws:CalledViaFirst aws:CalledViaLast aws:ViaAWSService
aws:CurrentTime aws:EpochTime aws:referer aws:RequestedRegion
aws:SecureTransport aws:SourceArn aws:SourceAccount
aws:SourceOrgPaths aws:SourceOrgID aws:UserAgent
Properties of Role Session:
aws:FederatedProvider - (e.g. cognito-identity.amazonaws.com)
aws:TokenIssueTime
aws:MultiFactorAuthAge - (Time elapsed since MFA)
aws:MultiFactorAuthPresent
aws:ChatbotSourceArn -
aws:Ec2InstanceSourceVpc
aws:Ec2InstanceSourcePrivateIPv4
ec2:SourceInstanceArn
lambda:SourceFunctionArn
ssm:SourceInstanceArn
identitystore:UserId - IAM Identity Center workforce identity user id.
.
. Backup-Plan Tags-Based Backup-Policy Backup-Service-Role
.
. Incremental-Backups Cold-Warm-Backups Backup-Gateway-For-VMWare
.
. Vault-Uses-Internal-S3-buckets.
. Auto Run (Cross-Region Backup OK)
. Frequency Backup Job
. Backup-Plan -------------------> Assign --------------> Vault
. Retention Period Resources Incr/full (Encrypted)
. (Recovery Point)
.
. Attach to
. Backup-Policy -------------> AWS Organization or Account.
. Backup Plan
.
. Backup Audit Manager ---> Generate/View Compliance Reports and Resources
.
aws backup create-backup-plan --cli-input-json file://path/to/backup-plan.json
# Assigning resources to backup plan.
aws backup create-backup-selection --backup-plan-id <backup-plan-id> \
--backup-selection '{"SelectionName":"MyTagBasedAssignment", "IamRoleArn":"arn:*:AWSBackupRole",
"ListOfTags":[{"ConditionKey":"Environment", "ConditionValue":"Production",
"ConditionType":"STRINGEQUALS"}]}'
DLM is another simpler (less featured) alternative to AWS Backup -- It automates the creation, retention, and deletion of Amazon EBS snapshots and EBS-backed AMIs. (The destination is AWS internal EBS snapshot storage. However you can move to S3 or Glacier.)
AWS Backup does all that DLM does and can also lock your backup using backup vault.
# Create DLM Policy (if not created)
aws dlm create-lifecycle-policy ... (Daily, etc.)
# Create On-Demand Snapshot
aws ec2 create-snapshot --volume-id vol-XXXXXXXX --description "On-demand snapshot"
aws ec2 create-volume --volume-type io1 --iops 1000 --snapshot-id snap-066877671789bd71b
--availability-zone us-east-1a
# Copy to your S3 bucket.
aws s3 cp ...
AWS Organizations helps you centrally manage and govern your environment.
.
. Root(OU) +---> Management (Ac) == First Top Level Account ---> root IAM user + Admin Users
. |
. +---> Security (OU) +---> Audit aka Security Tooling (Ac) Read-Only
. | +---> Log Archive (Ac) CloudTrail Logs etc.
. |
. +---> Infrastructure OU (Empty by default. Can use Network Mgmt etc Accounts here)
. |
. +---> Sandbox OU (Dev/Test Accounts)
. |
. +---> Production OU (Production Accounts)
. |
. +---> Exceptions OU
.
.
.
Use Cases:
- Create/Group accounts into OU (Organizational Units) to easily Govern.
(Using Console, CLI or Cloudformation Stacks)
- Apply policies (SCP - Service Control Policy) to all or selectively.
- Many Features easily apply to Organizations such as GuardDuty, CloudTrail, Backup,
AWS Config, MS Active Directory, etc.
- Easily share resources within Organization using RAM.
- Manage Consolidated Billing and Costs
Contains Audit aka Security Tooling Account:
Contains Log Archive Account:
.
. Master Account --> Send Invite --> Member Account -> Accept Invite -> Grant Access to Master.
.
. For Control Tower created new accounts, master account has auto access.
.
. AWS Control Tower > Account Factory > Enroll Account > Choose OU
.
If the existing member account joins master account by accepting invitation, it does not have all access automatically. Member account should create OrganizationAccountAccessRole and grant access to master account.
You can move account between OUs. :
aws organizations move-account \
--account-id 111122223333 \
--source-parent-id r-a1b2 \
--destination-parent-id ou-a1b2-f6g7h111
.
. Orchestration-Tool Organization GuardRail LandingZone AccountFactory
.
. Compliance-Check
.
Summary :
| Guardrail
| Control Type
|
| Preventive Deny using SCP
| Detective Record using AWS Config event
| Proactive Deny during creation using Cloudformation Hook.
|
Feature Guardrails (Control Tower) AWS Config Managed Rules Scope Multi-account Organizational Account and resource-level compliance Application Organizational Units (OUs) Individual AWS resources Enforcement SCPs for preventive; AWS Config for detective; Detective with optional remediation Visibility Compliance dashboard in Control Tower. Resource-level reporting in Config Best Use Case Organizational compliance Granular config compliance and auditing
A control is a high-level rule for governance. It's expressed in plain language. Example controls are:
Some more:
Three kinds of controls exist:
Three categories of guidance apply to controls:
.
.
. Admin
. resource-share-invitation Accept
. Owner VPC ------------------------------> Another AC ----------> Visible
. Subnet, IAM User/Role Resource
. PHZ, TGW, ... Service
.
.
. Implicit-Read-Permission No-explicit-Resource-policy-needed
.
. enable-sharing-with-aws-organization (Bypass invitation)
.
.
You can share resources with another account, IAM User/Role or Service.
Some resources can only be shared at account level
some can be shared between both at account level and user level.
Some resources can use customer managed permissions.
Using RAM share eliminates the need for creating/managing explicit resource policies in the owner account. But you still need explicit IAM policies on the external account!
You can share resources like the following (but not limited to):
VPC subnets can be shared from same AWS Organizationa or external accounts:
# By pass invitation-accept requirement across accounts within org only.
aws ram enable-sharing-with-aws-organization
aws ram create-resource-share --name MyNewResourceShare \
--no-allow-external-principals --principals "arn:<orgn_ou_arn>"
--resource-arns "resource-arn1 arn2 arn3"
# Same organization sharing auto-accepted. External accounts generate invitation.
aws ram accept-resource-share-invitation --resource-share-invitation-arn <arn>
# Suppose SSM parameters are shared with OU. The OU admin can associate this with all accounts in OU.
aws ram associate-resource-share
--resource-share-arn arn:*:resource-share/xyz --principals arn:...:ou/...
# Sometimes resource share is associated with account, sometimes with VPCs, sometimes with Users.
aws ram disassociate-resource-share --resource-arns "arn1 arn2"
Sharing gives usually only implicit Read access. You can not write or delete. The kind of default permissions depend on the resource. There is no fixed rule.
Some more Examples:
# Share SSM Parameter with another account in same org or outside org.
aws ram create-resource-share --name "ShareSSMParameter" --resource-arns arn:..*::parameter/my-db/my-password
--principals arn:aws:iam::other-account-id:root \ # Shared with root === Account.
--allow-external-principals # This resource supports customer managed permission.
aws ram list-resources --resource-owner SELF # List shared resources owned by me.
aws ram list-resources --resource-owner OTHER-ACCOUNTS # List shared resources owned by others
aws ssm describe-parameters --shared # SSM supported command to list shared parameters.
You can create and use customer managed permission while sharing itself. The terminlogy differentiates the implicit AWS managed default permissions for resource sharing. :
aws ram create-permission --name "ReadOnlySSMParameterPermission" --resource-type "ssm:Parameter"
--policy "{... }" --client-token "unique-client-token" --description "Allow custom access to SSM Parameter"
aws ram create-resource-share --name "MySSMParameterShare" --resource-arns "arn:*:parameter/my-parameter" \
--principals "arn:aws:iam::external-account-id:root" \
--permission-arn "arn:aws:ram:region:account-id:permission/ReadOnlySSMParameterPermission" \
--allow-external-principals
You can share many things like:
Other things that can be shared include:
Sharing does not mean full access.
You can share all your resources with all the accounts in same organization:
aws ram enable-sharing-with-aws-organization
# AWS RAM creates a service-linked role called AWSServiceRoleForResourceAccessManager.
# and makes all accounts in same organization as trusted entities for this role.
You can view all resources available for you, shared by other accounts:
aws ram get-resource-shares --region us-east-1 --resource-owner OTHER-ACCOUNTS
You can also share VPC subnets with other accounts (in same org) from VPC console.
.
.
. CloudTrail -----> S3 -----> Athena
. CloudWatch Logs -----> Alarms
. 90 Days -----> Data Firehose
. EventBridge -----> Lambda
.
. Note: AWS config uses it to record history of resources.
.
. Organization Trail - Single trail for all Organization member accounts.
.
.
. Org Member Account1 ----> Organization Trail ----> S3 Bucket (Enable trust policy)
. Account2 (Multi-Region-Optional) (For Organization principal)
. Account3
.
.
.
# This command will validate the integrity of log files within the specified date range.
aws cloudtrail validate-logs --trail-name <YourTrailName> --start-time <StartTime> --end-time <EndTime>
.
. Concepts
.
. Most Services Routing Rules
. -----------------> Default-Event-Bus ------------------> Lambda | Step Function | ECS Task
. Custom-Event-Bus SNS | SQS | FireHose | DataStreams
. Partner-Event-Bus HTTP | Remote Event Bus!
. API GW | Batch Job | ...
.
. Schema Registry, Infer Schema Optional EventArchive and Replay
. Schedule Jobs
.
. Note: For Lambda target, You can better use Event Source Mapping.
.
Event Format:
{
version: 0,
id: "..." ,
source: "aws.s3",
...
resources: [ "arn:aws;s3:::my-s3-bucket" ]
detail-type: "Object Created",
detail : {
/* Your json or Service Specific Json here ... */
}
}
EventBridge pipe:
SQS -----> EventBridge Pipe ---> StepFunction
Filter and Enrich Data
# SQS can not invoke StepFunction directly. Pipe supports multiple sources and single target.
# This way, most AWS services can invoke other services if native integration not available.
ECS service specifies the deployment type associated. The Deployment types supported are:
.
. AWS Service Catalog === Restricted Product Catalog
.
. Product - Predefined product by CloudFormation
.
. Product Product-Portfolios LaunchConstraints Share-With-OU
.
. Import-Portfolio Role-For-Launch
.
Define your cloud infrastructure using: Javascript, Python, Java or .Net
The code is "compiled" into CloudFormation template (JSON/YAML)
You can deploy infrastructure and application runtime code together:
AWS Security Hub # Integrated Dashboard for Compliance. Costs little.
# Prepackaged Security Std checks (like Payment Card Industry PCI) available.
# Receive and Consolidate findings from GuardDuty, Inspector, Config, all
AWS Inspector # Continous Scaning of Lambda, EC2, ECS, ECR push images.
# Report to Security Hub and Events Bridge
AWS Config # Use Managed Config rules (over 75) to check compliance. Cheap.
# View compliance (Green/Red) for resource in timeline.
# View CloudTrail API calls! (Auditing)
# Can even remediate using SSM Documents!
# AWS Config is used to implement Guardrails by AWS Tower.
# Implements specific events recording for auditing purposes.
AWS Firewall Manager # Manage rules in all accounts of Organization. WAF, Shield, SG,
# AWS network firewall (VPC Level), Route 53 DNS firewall.
# Costs $100 per protection policy!
AWS Network Firewall # VPC Level network firewall. Stateful inbound/outbound inspection.
AWS WAF # Protects CloudFront, ALB or API Gateway. Meant for HTTP.
# Can also be used for white/black listing, custom header checking
AWS Shield # Mainly DDoS protection. Premium is 3k $ per month per Org!
# CloudFront, Route 53 protected by default by shield.
# Protects ElasticIP, ELB, Global Accelerator, etc.
AWS GuardDuty # Auto threat discovery using Logs and ML, Anomaly Detection, Real time!
# By default, Analyzes: CloudTrail event Logs, VPC Flow logs, S3 Data events,
# DNS Query Logs, EKS control plane logs.
# Send to Security Hub, EventBridge, Amazon Detective.
# Also useful in PCI DSS Compliance (in addition to Security Hub)
Amazon Detective # Analyze and visualize security data to investigate root of security issues.
# Uses Logs, Guard Duty, Security Hub, Inspector, CloudTrail, etc.
# Bit Expensive $2 per GB digested.
Amazon Macie # Continous scan S3, using ML, detect personal data, Do periodic full scan.
# Send finding to Security Hub.
Note:
1. Security Hub and Amazon Detective are primarily aggregation service.
2. AWS Control Tower is mainly for Governance, Accounts Provisioning and Policy enforcements.
It enforces policy compliance using GuardRail (which uses set of AWS Config Rules).
3. AWS Audit manager helps you with auditing your compliance wrt prebuilt frameworks.
Some automatic evidence collection built-in but it does not do strict compliance check.
For example, you can upload your evidence from on-premises resources, no check is done.
4. AWS Artifacts help you with access Reports (demonstrating AWS compliance of standards)
and Agreements (between you and AWS) management to comply with GDPR, PCI, etc.
.
.
. HSM Cluster WebServer LoadBalancers (SSL)
. Digital Signature
.
. HSM-AZ1 HSM-AZ2 HSM-AZ3 Bulk Encryption PKI
.
CloudHSM - Cloud Hardware Security Module - is hardware module for encryption.
CloudHSM can be used to store encryption keys and also perform encryption operations. The key storage is protected by Hardware and even AWS can not access it.
In dev environment, You can create/delete/recreate CloudHSM cluster across availability zones on need basis. No need to keep it running all the time unless you use it for storing keys.
By using CloudHSM, you can manage your own encryption (not AWS).
AWS KMS is FIPS 140-2 Level 2 compliant. But CloudHSM provides Level 3 compliance. FIPS - Federal Information Processing Standards
If you want to encrypt using your own key before writing into S3, then use cloudHSM using SSE-C encryption (server-side encryption using customer provided keys).
CloudHSM can be deployed and managed in VPC.
In FIPS mode - Only selected FIPs approved algorithms are allowed.
AWS CloudHSM M of N access control allows minimum quorem of CO (Crypto Officers) to authorize sensitive operations. (e.g. minimum 3 CO authorizing out of total 5)
CloudHSM can be integrated into Webservers and Load Balancers for SSL termination. The key can be stored inside CloudHSM itself.
Uses Public Key Infrastructure (PKI) for managing certificates and keys in a secure environment.
You can use cloudHSM to generate key that can be imported to KMS. The cloudHSM is on-premise HSM alternative. BYOK (Bring your own Key) and maintaining own HSM (Hardware Security Module) is a common pattern used in multi-cloud organizations. :
.
. Import
. CloudHSM/On-premise HSM ------------> KMS -----> Use with S3-KMS
. CMK
.
. Note: Create empty KMS key with external type and import key by generating a token.
.
.
Web Application Firewall (WAF) filters specific requests based on rules:
AWS Fiewall Manager manages rules in all accounts of an AWS organization.:
.
. Source
.
. VPC | Subnets | Instances | ENI | TGW | EndPoints | ELB-ENI
.
Best Practices:
. Configuration Management, Conformance, History Tracking
.
. Managed-Config-Rules SSM-Documents-Remediate
.
. Conformance-Pack-Apply-Across-Organization
.
Analyze and visualize security data to investigate potential and root of security issues. Uses Logs, events from Guard Duty, Security Hub, Inspector, Access Analyzer, CloudTrail, VPC Flow Logs.
Bit expensive $2 per GB Logs digested.
Secrets are encrypted using AWS managed KMS key aws/secretsmanager; If you want to use your own KMS key for encryption, you can do so using update-secret operation.
Commands:
aws secretsmanager create-secret \
--name MyTestSecret \
--description "My test secret created with the CLI." \
--secret-string "{\"user\":\"diegor\",\"password\":\"EXAMPLE-PASSWORD\"}"
aws secretsmanager put-secret-value \
--secret-id MyTestSecret \
--secret-string "{\"user\":\"diegor\",\"password\":\"EXAMPLE-PASSWORD\"}"
aws secretsmanager get-random-password --require-each-included-type --password-length 20
# Reencrypt the secret using my own KMS key.
aws secretsmanager update-secret --secret-id MyTestSecret
--kms-key-id arn:aws:kms:.*:key/* // << Re-encrypt using My own KMS key!
// instead of std AWS managed key.
aws secretsmanager list-secrets
# By default secrets deletion does not happen for 7 days.
aws secretsmanager delete-secret --secret-id MyTestSecret --recovery-window-in-days 7
aws secretsmanager delete-secret --secret-id MyTestSecret --force-delete-without-recovery
# Restore secret that was previously scheduled for deletion.
aws secretsmanager restore-secret --secret-id MyTestSecret
# RDS offers managed rotation where it updates DB password as well.
aws secretsmanager rotate-secret \
--secret-id MySecret \
--rotation-rules "<cron-expression>"
# If DB credentials also need to be updated, you can specify lambda along with rotation.
aws secretsmanager rotate-secret \
--secret-id MyTestDatabaseSecret \
--rotation-lambda-arn arn:.*
--rotation-rules "<cron-expression>"
DDOS - Distributed Denial of Service -- types:
.
. KMS-Key Key-Policy Symmetric Asymmetric
.
. Encrypt Decrypt Sign-Verify Generate-Data-Key HMAC
.
. Multi-Region-Key
.
KMS lets you create, manage, and control cryptographic keys across your applications and AWS services.
KMS can not encrypt/decrypt more than 4k size data. It is intended to encrypt your key itself not data.
The encrypted data is stored along with the encrypted key. While decrypting, the data key is decrypted. And openssl or standard library is used to decrypt the data using plain key.
master_key_id = aws kms create-key of type symmetric # This is CMK - Customer Master Key.
(my_plain_key, my_encrypted_key) = aws kms generate-data-key and encrypt using master_key_id
my_encrypted_data = openssl encrypt input_data + Append master_key_arn + my_encrypted_key
For decryption:
my_plain_key = kms.decrypt(my_encrypted_key, master_key_id)
my_decrypt_data = openssl decrypt (my_encrypted_data, my_plain_key)
# KMS encrypted Data Block (e.g. SSE-KMS )
.
. {--Encrypted-Data--} {KMS-key-id} {encrypted-data-key}
.
. Data-Key = Decrypt(kms-key-id, encrypted-data-key)
.
For symmetric key usage, The KMS key maintains an internal unique key (previously called Customer Master Key) which will never be revealed to you. You generate symmetric key and encrypt that key itself using CMK and store the encrypted-key in your application along with data.
Use Cases:
Type KeyUsage and (KeySpec)
Symmetric - SYMMETRIC_DEFAULT
Asymmetric - ENCRYPT_DECRYPT (RSA_4096,etc)
SIGN_VERIFY (ECC_NIST_P521) ECDSA (elliptic curve key pair) for sign and verify.
This can not be used for encryption/decription.
KEY_AGREEMENT (ECC_NIST_P521) Pair of ECDH keys used to derive shared key.
Also used for sign/verify. Not for encrypt/decrypt.
"SigningAlgorithms": [ # For SIGN_VERIFY with keyspec of RSA_2048
"RSASSA_PKCS1_V1_5_SHA_256",
"RSASSA_PKCS1_V1_5_SHA_384",
"RSASSA_PKCS1_V1_5_SHA_512",
"RSASSA_PSS_SHA_256",
"RSASSA_PSS_SHA_384",
"RSASSA_PSS_SHA_512"
]
"KeyAgreementAlgorithms": [ # For SIGN_VERIFY with keyspec of ECC_NIST_P521
"ECDH" # Elliptic key pair to derive shared secret
],
"EncryptionAlgorithms": [ # For symmetric usage, keyspec also SYMMTERIC_DEFAULT
"SYMMETRIC_DEFAULT"
]
Single Symmetric key used for encrypt/decrypt operation is the default type and most Common.
secret_key_id = aws kms create-key --key-spec HMAC_512 --key-usage GENERATE_VERIFY_MAC
# You get ARN and keyId and algorithm etc info in output. GenerateMac operation.
Hash_value = hmac(data, secret_key_id, algorithm) # data: max 4k; Hash fixed length.
# The HMAC KMS key keeps the secret_key secret. Same key for sign and verify.
# algorithm = HMAC_SHA_224 or 256 or 384 or 512; key spec is HMAC_224, etc.
After receiving data, you recalculate Hash_value using same secret_key_id and verify the hash.
Command operations are: GenerateMac and VerifyMac
Commands :
aws kms create-key
[--policy <value>] # Resource policy for the key
[--description <value>]
[--key-usage <value>] # Omit for default symmetric key.
SIGN_VERIFY
ENCRYPT_DECRYPT
GENERATE_VERIFY_MAC
KEY_AGREEMENT
[--customer-master-key-spec <value>]
[--key-spec <value>]
[--origin <value>] # Use "External" for later to import key.
[--custom-key-store-id <value>]
[--tags <value>]
[--multi-region | --no-multi-region]
[--endpoint-url <value>]
[--output <value>]
[--query <value>]
[--profile <value>]
[--region <value>]
[--no-sign-request]
[--ca-bundle <value>]
aws kms generate-data-key --key-id alias/ExampleAlias --key-spec AES_256
{
"Plaintext": "VdzKNHGzUAzJeRBVY+uUmofUGGiDzyB3+i9fVkh3piw=",
"KeyId": <arn>
"CiphertextBlob": "AQEDA..."
}
aws kms get-public-key --key-id alias/my_RSA_3072 # Get pulic key portion of asymmetric key
aws kms describe-key --key-id alias/my_RSA_3072 # Note KeySpec, KeyUsage in output
# To prepare for import, you need to get wrapping public key and import token and
#
# Note: Wrapping Key Means temporary key used to encrypt your key during import.
# use that to encrypt your key.
aws kms get-parameters-for-import ....
# Encrypt key material using openssl
openssl pkeyutl -encrypt -in PlainKeyMaterial.bin ...
# Import your own key material. Save your copy. You can never export it from KMS.
aws kms import-key-material --key-id <key_id>
--encrypted-key-material fileb://EncryptedKeyMaterial.bin \
--import-token fileb://ImportToken.bin ...
# To keep your keys in cloudHSM hardware module ...
aws kms create-custom-key-store
--custom-key-store-name ExampleCloudHSMKeyStore \
--cloud-hsm-cluster-id cluster-1a23b4cdefg \
--key-store-password kmsPswd \
--trust-anchor-certificate <certificate-goes-here>
aws kms describe-custom-key-stores
# External keystore is also supported.
# Both cloudHSM and External key store is used only for Symmetric Key.
You can add key policy to KMS Key.
You can allow specific role (preferred) or external account.
Allowing external account means specify the root principal or specific role:
Principal: { "AWS": "arn:aws:iam::444455556666:root" } /* External Account Allow */
Principal: { "AWS": "arn:aws:iam::444455556666:role/ExampleRole"} /* External Role Allow */
In addition, the external account IAM policy must allow that as well.
External account can further restrict the original access given by owner account but can not give more permission:
{
Effect: "Allow",
Action: [ ... ],
"Resource": "arn:aws:kms:us-west-2:111122223333:key/xxx" # Source Key
}
Attach policy to KMS key to allow cloudfront.amazonaws.com service to read specific KMS key if the distribution origin is S3 bucket which uses server side KMS key encryption.
You need a multi-region key, if you want to manage encrypted backups across regions.:
. Replicate
. Primary-KMS-Key -----------> Replicated-KMS-Key
. Region-1 Region-2
. KeyId-1 KeyId-2
Create the primary multi-Region KMS key in the source region. :
aws kms create-key --description "Primary multi-Region key for my application" --region us-east-1
--multi-region # This key is multi-region capable!
# Note the KeyId from output!
Replicate the key in the target region:
aws kms replicate-key --key-id "<PrimaryKeyId>" --replica-region us-west-2
aws kms describe-key --key-id "<ReplicatedKeyId>" --region us-west-2
# Examine the details of the replicated key, its MultiRegion status
Notes:
aws iam upload-server-certificate --server-certificate-name ExampleCertificate
--certificate-body file://Certificate.pem
--certificate-chain file://CertificateChain.pem
--private-key file://PrivateKey.pem
[--path /cloudfront/test For access with cloudfront ]
FIPS 140-2 defines a cryptographic module as “the set of hardware, software, and/or firmware that implements approved security functions and is contained within the cryptographic boundary. It is approved std from NIST.
aws kms replicate-key \
--key-id <primary-key-id> \
--description "Replica KMS key in eu-west-1" \
--region eu-west-1
Traffic mirroring is used to analyze inbout/outbound/both traffic asynchronously by other appliances or network monitoring tools.
.
.
. EC2 Traffic Mirroring
. ENI -------------------------> NLB | Another ENI | Monitor-Appliance-IP
.
.
. S3TA SSE-S3 SSE-KMS SSE-C CRR
.
. Bucket-Policy ACL Access-Point S3-Partitioning Parquet Datalake
.
. UploadFree (Minimal for PUT Request) Download:9c/GB
.
. Transfer-Acceleration S3-LifeCycle-Rules-per-bucket
.
. Storage-Class-Per-Object Default-Storage-Class-For-Bucket
.
. Bucket-Owner-Enforced Cache-Control
.
Object Storage, serverless, unlimited storage, pay-as-you-go
Flat object storage service.
Looks like /bucket/myfolder/mysubfolder/myfile
Note that /bucket/myfolder/ is a zero-length object with that folder name. S3 does not recognize folders.
Good for static content. Image/video, etc.
Access objects by key, no indexing.
Anti patterns:
Supports Multi-part upload. Recommended for files >100MB. Tools available to scan incomplete S3 objects created with multi-part upload.
S3 Transfer Acceleration:
S3 Pre-signed URLs used to download/upload s3 files valid for 1hour by default.
S3 Storage Classes and Access Tiers:
- S3 Standard - General Purpose
- S3 Standard - Infrequent Access (IA) - Access time instant. Storage fee less. Access fee more.
- S3 One Zone- Infrequent Access - 99.5% Availability vs 99.99 for others.
- S3 Intelligent Tiering - Access Tiers: Frequent, Infrequent (30+ days),
Archive Instant (90+ days), (Default, automatic)
Archive, (Optional, days configurable, 3-5 hours)
Deep Archive. (optional, days configurable, 9-12 hours)
(Note: Small monthly auto-tiering fee.)
- S3 Glacier Instant Retrieval - Instant Access. More storage cost vs Glacier. Less access fee.
- S3 Glacier Flexible Retrieval - Retrieval Options and Access Times:
(aka Glaicer) Expedited: Typically 1-5 minutes
Standard: Takes about 3-5 hours
Bulk: Takes 5-12 hours
(Less Storage cost)
- S3 Glacier Deep Archive - Retrieval Options and Access Times:
Standard: 12 hours
Bulk: up to 48 hours
(Least Storage cost)
S3 Storage cost:
S3 Standard : .023 $/GB-Month (4x cheapther than GP2) 1TB - 23 USD
( 0.10 $/GB-Month)
S3 IA : .0125 $/GB-Month (2x cheapther than S3 std) 1TB - 12 USD
S3 Glacier Instant : .004 $/GB-Month (3x cheaper than S3 IA) 1TB - 4 USD
S3 Deep Archive : .00099 $/GB-Month (4x Cheaper than Glacier) 1TB - 1 USD
vs
GP2 EBS : 1 TB - 100 USD (10c / GB-Month)
GP3 EBS : 1 TB - 80 USD ( 8c / GB-Month; Per 1000 IOPS, 5 USD more above 3000; )
EFS : 1 TB - 300 USD (30c / GB-Month)
io1 : 1 TB - 125 USD (12.5c / GB-Month)
io2 : 1 TB - 125 USD (Provisioned IOPS costly - Every 1000 IOPS 50 USD more.)
Minimum storage duration cost is applied for IA and Glacier types. IA minimum is 30 days and Glacier is 90 days, Glacier Deep is 180 days.
Restore Fees:
For S3 Deep Archive, std retrieval time is 12 hrs and bulk is 48 hrs.
There's no data transfer fee to upload data to Glacier. But uploading an object is a PUT request. PUT request fees are billed at $.03 per 1,000 requests. Not a huge charge but 6x the PUT request for S3 Standard.
Data transfer Out from Glacier to internet costs approx 9 cents per GB. In addition there is per-object pricing of GET request that is very cheap (like 1 cents for 1000 objects)
The special S3 storage class S3 Intelligent tiering has built-in life cycle management and moves the object into different "access tiering" still in the same storage class.
The storage class is for Object, not bucket. There is "Default Storage Class" for bucket.
S3 put Object can override the default storage class of the bucket.
Storage class of object and Default storage class of bucket can be changed anytime.
S3 Std IA is different storage class and S3 Glacier is a different storage class.
S3 Glacier Flexible Retrieval is same as S3 Glacier storage class.
S3 life cycle policies can be used to move objects between tiers. The policy is based on rules with filters of objects for which the policy applies. The prefix of object keyname is a supported filter.
S3 Event notifications possible:
S3 Cost Saving Tips:
S3 Analytics same as Storage Class Analysis: Helps you to transition to right storage class.
S3 Storage Lens: Analyzes storage usage across your organization and generates report! There are around 28 free usage metrics. Advanced metrics cost extra.
Tip: You can index objects in S3 in DynamoDB and use that index to search and filter!
Durability is very high: 99.9999999999% i.e. eleven nines. for objects across multi-AZ S3 standard Availability is high: 99.99% = not available for 53 minutes a year.
.
. After 1 year Move to Glacier
. Bucket -----------------> Delete
. Tag filter Move to S3 IA
. Name Filter etc.
.
. Note: Does not apply for per object but can use filters in bucket.
. Note: There is no Storage class for bucket.
.
There is no standard managed policy for S3 life cycle policy. Use this for Example :
aws s3api put-bucket-lifecycle-configuration --bucket your-bucket-name --lifecycle-configuration '{
"Rules": [
{
"ID": "TransitionToGlacier",
"Prefix": "",
"Status": "Enabled",
"Transitions": [
{ # Moving From S3 std -> Glacier Can be done in 1 day!
"Days": 30, # Minimum 30 days for Standard_IA
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 730
}
}
] }'
.
. PutObject ----> x-amz-server-side-encryption: AES256
. GetObject ----> No headers needed.
.
.
. PutObject ----> {sse-encryption: KMS, kms-id, optional-encryption-context}
.
. GetObject ----> {optional-encryption-context}
.
.
. PutObject ---> { sse-customer-algorithm: AES256, customer-key, customer-key-md5 }
.
. GetObject ---> { sse-customer-algorithm: AES256, customer-key, customer-key-md5 }
.
At the time of object creation with the REST API, you should specify following header:
Name Description
x-amz-server-side-encryption-customer-algorithm Specify `AES256`
x-amz-server-side-encryption-customer-key provide the 256-bit, base64-encoded encryption key
for S3 to use to encrypt or decrypt your data.
x-amz-server-side-encryption-customer-key-MD5 Specify base64-encoded MD5 digest of the encryption key.
You should remember the key and specify it while reading::
aws s3api get-object --bucket <bucket-name> --key <object-key> <local-file> \
--sse-customer-algorithm AES256 \
--sse-customer-key <Base64-encoded-encryption-key> \
--sse-customer-key-md5 <Base64-encoded-MD5-of-encryption-key>
.
.
. PutObject --> Data Layout
. {--Encrypted-Data--} {encrypted-data-key}
.
. GetObject --> Retrieve Content --> Use SDK (or CloudHSM) to Decrypt.
.
. CMK: Customer Master Key is Secret.
.
. CSE-KMS : Client Side Encryption using KMS (Optional)
.
With CSE, server does not if it is encrypted. You do encryption/decryption yourself.
You need to remember the Customer Master Key for later decryption.
For each object you can use different encryption data key encrypted by CMK.
data_plain_key = Decrypt(encrypted_data_key, CMK)
There are two ways to use CMK:
Using On-premise HSM is a common requirement in some environments.
Can be used with key stored in CloudHSM which is deployed into your VPC. :
# sudo yum install aws-cloudhsm-client aws-cloudhsm-pkcs11
....
# Encrypt the file using the AES key in CloudHSM
pkcs11-tool --module /opt/cloudhsm/lib/libcloudhsm_pkcs11.so \
--login --pin <crypto-user-password> --key <key-handle> \
--encrypt --input-file <file-to-encrypt> --output-file <encrypted-file>
AWS SDKs support client-side encryption library.
By using KMS, you can do:
aws s3api put-bucket-encryption --bucket my-bucket --server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms", /* OR "AES256" for SSE-S3 */
"KMSMasterKeyID": "arn:aws:kms:region:account-id:key/key-id",
"EncryptionContext": { /* optional and applicable only for SSE-KMS */
"AppName": "MyApp"
}
}
}]
}'
.
. Method Description Relevant Headers
..............................................................................................
. SSE-S3 Dynamic Server Side. x-amz-server-side-encryption: AES256
.
. SSE-KMS Specify KMS Key-id. x-amz-server-side-encryption: aws:kms
. Get: No need to specify key. x-amz-server-side-encryption-aws-kms-key-id: arn:*
. Context is optional. x-amz-server-side-encryption-context: base64-json
.
. SSE-C Both Put and Get x-amz-server-side-encryption-customer-algorithm: `AES256`
. Requires 3 headers x-amz-server-side-encryption-customer-key: base64-encoded
. Server forgets keys. x-amz-server-side-encryption-customer-key-MD5: base64
. But aware it is SSE-C.
.
. CSE Client Side Encryption No Headers.
. CMK is secret. Store encrypted datakey along with data.
.
.
. Bucket-Policy AccessPoint-Policy IAM-Policy Endpoint-Policy ACLs (Deprecated)
.
.
. VPC (Bi-Directional S3-CRR )
. S3-Gateway-Endpoint ------> Access-Point -------> S3-Bucket-Region-1
. | (Optional Multi-Region) |
. Policy | S3-Bucket-Region-2
. Policy |
. Policy
.
Also See: https://aws.amazon.com/blogs/security/ iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/
There are multiple ways you can control:
Example Access Point Policy:
# This policy allows Jane to use this Access Point to access the bucket.
{
"Version":"2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:user/Jane"
},
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:us-west-2:123456789012:accesspoint/my-access-point/object/Jane/*"
}]
}
# For the above to be effective the bucket policy also must permit Jane to access the bucket!
# This is eqvt to blocking all public access using console for this bucket.
aws s3control put-public-access-block \
--account-id 123456789012 \
--public-access-block-configuration '{"BlockPublicAcls": true, "IgnorePublicAcls": true, \
"BlockPublicPolicy": true, "RestrictPublicBuckets": true}'
# Blocking public access settings Flags:
#
# BlockPublicAcls: Put bucket ACL or Put object ACL blocked if public.
# IgnorePublicAcls: You can have public access ACL, but all ACL will be ignored.
# This is same as "Turn off public Access" button in S3 console for bucket.
# BlockPublicPolicy: Put bucket policy will fail if public.
# RestrictPublicBuckets: Even if public, only local accounts with access and services are allowed.
# Cross-account access not allowed.
.
. Bucket-Owner-Enforced Bucket-Owner-Preferred Object-Writer
.
. Bucket-ACL Object-ACL
.
. S3-Condition-Keys x-amz-acl:read
.
This property has influence on how ACLs are interpreted.
aws s3api put-bucket-ownership-controls --bucket your-bucket-name
--ownership-controls '{ "Rules": [ { "ObjectOwnership": "BucketOwnerEnforced" } ]}'
Note: It is recommended to set bucket ACL to private before disabling ACL.
During creation of S3 object (PutObject), you can specify ACL grants (using headers) which can also be used in S3 conditions to allow/deny operation.
If bucket ownership property set to bucket owner enforced, using these headers will result in error during putobject.
s3:x-amz-grant-read ‐ Read access to specified account id. e.g. "id=1234-account-id"
s3:x-amz-grant-write ‐ Grant Write.
s3:x-amz-grant-read-acp ‐ Read Access Control Policy (acl==acp). Example value: "id=1234-account-id"
s3:x-amz-grant-write-acp ‐ Grant Write ACL to account.
s3:x-amz-grant-full-control ‐ e.g. "s3:x-amz-grant-full-control": "id=AccountA-CanonicalUserID"
s3:x-amz-acl ‐ Valid values:
- private - Owner gets full control. Recommended. (Object/Bucket)
- public-read - Owner gets full control. Public Read. (Object/Bucket)
- public-read-write - Owner gets full control. Public Read-Write. (Object/Bucket)
- authenticated-read - Owner gets full control. Authenticated reads only. (Object/Bucket)
- bucket-owner-read - Object owner FULL_CONTROL. Bucket owner READ. (For Object Only)
- bucket-owner-full-control - Both Object/Bucket Owner gets FULL_CONTROL (For Object Only)
. 1:N
. One S3 Bucket ------- Multiple Access Points
.
. DataLake-Application ------> S3-Accesspoint-Per-Application
.
. Client ------> S3-Accesspoint-Per-VPC (Restricts within that VPC)
.
. Every accesspoint has user friendly name and policy.
.
. Simplifies Permission management without messing with Global bucket policy and IAM policies.
.
aws s3control create-access-point --account-id 123456789012 --bucket business-records
--name finance-ap
[ --vpc-configuration '{ "VpcId": "my-vpic-id" }' ]
# Following commands done at object level. use aws s3api vs aws s3 for API level control.
aws s3api get-object --key my-image.jpg
--bucket arn:aws:s3:us-west-2:123456789012:accesspoint/prod download.jpg
aws s3api put-object --bucket my-access-point-xyz-s3alias --key my-image.jpg --body my-image.jpg
# acl private, public, etc are canned policies.
aws s3api put-object-acl --key my-image.jpg
--bucket arn:aws:s3:us-west-2:123456:accesspoint/prod --acl private
S3 Paritioning is usually internal to AWS. When total objects in buckets increases internally it partitions the data.
However, you can explicitly partition S3 for better performance when you use S3 + Glue + Athena. You can have abstraction like "Table" with S3 contents.
When you use Athena and create table using Athena IDE, it creates a table with S3 underneath with proper partitioning.
You can just enable this option for your bucket. For a small price, the uploads are accelerated using edge locations.
Uploading to s3 through cloudfront is technically possible but S3 transfer acceleration is the preferred method.
It is compatible with SSE-S3 and with some limitations with SSE-KMS (due to KMS being regional and permissions). It is not compatible with SSE-C.
.
. Service Name S3 S3-Glacier
. -----------------------------------------------------
. Container Bucket Vault
. Files Object Archive
. API S3-API Glacier-API
. Access Policy Bucket Policy Vault Access Policy
. Lock Mechanism Object Lock Vault Lock Policy
.
aws s3 glacier create-vault # Create vault like a bucket in S3.
aws s3 glacier upload-archive # Upload Archive
aws s3 glacier initiate-job # Prepare Archive to read
aws s3 glacier get-job-output # Read Archive
aws s3 glacier initiate-vault-lock # You lock the vault container, not individual archives.
# Specify Vault lock policy. e.g. Deny DeleteArchive for 10 years.
aws s3 glacier complete-vault-lock # Finalize the lock. You can abort till you complete it.
aws s3 glacier set-data-retrieval-policy # Specify upper limits to control cost.
aws s3 glacier set-vault-access-policy
aws s3api create-bucket
aws s3api put-bucket-policy
aws s3api put-bucket-lifecycle-configuration
aws s3api put-bucket-versioning
aws s3api put-bucket-cors
aws s3api put-bucket-accelerate-configuration
aws s3api put-bucket-encryption
aws s3api put-bucket-replication
aws s3api put-object
aws s3api put-object-acl # object level acls are not recommended
aws s3api put-bucket-acl # bucket level acls also not recommended
aws s3api put-public-access-block
aws s3api put-object-legal-hold
aws s3api put-object-lock-configuration # Specify Retention mode (governance vs compliance) and period.
aws s3api put-object-retention # Change object lock
aws s3api restore-object
.
. WORM Rention-Modes
.
. Lock = Rention Mode (Governance|Compliance) + Retention Period
.
.
WORM - Write once read many
Object lock involves 2 things: Lock Retention mode and Lock retention period.
Retention Modes:
Retention Period: Specify on object creation.
With s3:PutObjectLegalHold permission you can apply/remove legal hold on object to prevent delete.
Use Cases for S3 Object Lock: Regulatory Compliance, Data Protection
Bucket-level configuration: When creating a new S3 bucket, enable Object Lock for entire bucket. You can also enable it on existing bucket.
Object-level configuration: You can apply Object Lock on individual objects after enabling Object Lock on the bucket. Each object can have its own retention mode and period.
Once Object lock enabled, it can be disabled/deleted only if the mode is Governance (not Compliance):
aws s3api put-object-retention --bucket <bucket_name> --key <object_key>
--bypass-governance-retention --retention '{"Mode":"NONE"}'
Monitor and Manage: You can view the retention and legal hold settings using the S3 console
Bucket Versioning: S3 Object Lock requires bucket versioning to be enabled.
The object lock is supported on all storage classes except S3 Intelligent Tiering and Glacier Deep Archive.
S3 supports Replication of buckets within or across regions.
|
| Region1 Region2
| CRR
| Bucket --------------------------------------> Bucket
| | (Re-encrypt as per Destination)
| | SRR Live or Batch (aka On-Demand)
| |
| Bucket
|
| (Enable Versioning) Meta-Data Permissions
|
| (One Time S3 Batch Operations to copy is Required For Replication of Existing Objects)
|
| aws s3api put-bucket-replication --bucket <source> --replication-configuration file://config.json
| # config.json: Specify Dest, Role, Filters, desired Storage class.
|
Enabling CRR or SRR does not auto-copy existing objects. You should (batch) copy the objects one-time first. S3 Batch Operations requires an input manifest file that explicitly lists the objects you want to operate on. :
aws s3api list-objects --bucket my-bucket --query 'Contents[].[Bucket, Key]'
--output text | awk '{print $1","$2}' > manifest.csv
my-bucket,object1.txt (manifest.csv file)
my-bucket,object2.jpg
my-bucket,folder/object3.pdf
# Alternatively you can use S3 inventory report to generate this manifest.
aws s3control create-job ... (with manifest file and copy operation)
Cross Region Replication (CRR)
Same Region Replication (SRR)
The tiers and life cycle rules apply independently on source and destination buckets.
There are two types: Live Replication and On-Demand Replication (aka Batch Replication).
SRR does not cost data transfer fee but costs PUT requests.
CRR costs data transfer rate around 2c/GB which is 4x times cheaper than internet download.
Active replication status does not cost, only the Put Requests and Data transfer charges apply.
Different encryption settings between source and destination buckets - The destination encryption policy prevails.
Encryption metadata (such as encryption method SSE or KMS ) will be updated as needed.
Ensure proper KMS key permissions if using SSE-KMS on either side to prevent replication failures.
The SSE-C objects can not be auto replicated.
S3 Replication Time Control (S3 RTC) can be enabled to guarantee most of objects will be replicated in seconds and 99.99% will be within 15 mins. It costs some additional cost per GB replication transfer (around 1.5 cents/GB ??)
The cache control headers for S3 object can be controlled using the object meta data as "max-age=xxxx" (seconds).
aws s3 cp myfile.txt s3://my-bucket/myfile.txt --cache-control "max-age=3600"
curl -I https://your-cloudfront-url/myfile.txt
HTTP/1.1 200 OK
Cache-Control: max-age=86400 # max-age=1 (expire early) max-age=0 no-cache (cache but recheck using ETag)
# no-store (Don't cache at all)
EBS Volume types gp2/gp3 (general purpose), io1/io2 (High performance Provisioned SSD) st1 (HDD) Throughput Optimized HDD, sc1 (Cold HDD).
IOPS - IO Operations Per Second. Typical blocksize is 256 KB. 1000 IOPS means 256 MB/s throughput.
Boot volumes must be SSD, not HDD.
Bigger volumes (like 16TB offer bigger max IOPs).
EBS gp2 3000 to 16K IOPS: (gp2 is not provisioned IOPS; depends on disk size)
gp3 3000 to 16K IOPS; Supports Provisioned IO. (Bigger disk is not biggger IOPS)
io1 and io2 IO Optimized: Faster than general purpose gp2/gp3.
io1 Max 32K IOPS. Max 64 TB disk(Provisioned IOPS)
io2 Block Express Max 256K IOPS;
Compare io1, io2, io2 block express price:
. gp2 gp3 io1 io2 io2-express
-----------------------------------------------------------------------------------
Max IOPS 16K 16K 32K 64K 256K
-----------------------------------------------------------------------------------
Max Throughput 250MB/s 1GB/s 1GB/s 1GB/s 4GB/s
-----------------------------------------------------------------------------------
Storage Price
per 100 GB $10 $8 $12.5 $12.5 $12.5
-----------------------------------------------------------------------------------
Max IOPS
per 100 GB 300 5K 50K 50K
per GB 3 50 500 500
-----------------------------------------------------------------------------------
Min IOPS 3K
-----------------------------------------------------------------------------------
1K IOPS price $5 $65 $48 $20
-----------------------------------------------------------------------------------
Durability 99.9 99.999 99.999
-----------------------------------------------------------------------------------
st1 HDD (throughput optimized):
Local instance store IOPS can go from 100K to millions.
By default EBS volumes are not encrypted.
Account level setting to encrypt new EBS volumes by default.
EBS Snapshots are taken using incremental backup. You can create AMI from snapshot.
FSR Feature - Fast Snapshot Restore - reads your SSD and pre-warms it for performance after restore.
EBS Multi-attach supported (can attach to multiple EC2 instances) based on io1/io2 disks possible. But the filesystem should support. (not normal ex4)
Data Lifecycle Manager (DLM) - for when you want to automate the creation, retention and deletion of EBS snapshots. It is free.
Attached to one AZ.
Root EBS volume type is auto deleted on instance termination by default.
To migrate an EBS volume across AZ, take a snapshot, restore on another AZ.
EBS volume is marked optionally as "Encrypted volume" on creation. If so, encryption happens transparently and all snapshots are also encrypted.
AWS Backup- to manage & monitor backups across all AWS services (including EBS volumes), from a single place. It is more recent and advanced service. It can backup environment (like subnet) etc whereas DLM does not.
.
. MaxSize: PBs (Unlimited)
.
. EFS On-Premise
. Remote NFS Mount
. Mount-Target-ENI ----------------------> Server (Linux)
. SG DirectConnect
. 1 Mount Per AZ
.
.
. Also Note: EFS Access Point, EFS CRR
.
. Pay per use. Performance Mode; Throughput Mode
Note: EFS does not work on Windows. Use FSx for Windows instead.
sudo yum -y install amazon-efs-utils # Installs nfs-utils, efs-utils, openssl, etc
sudo yum install -y gcc openssl-devel tcp_wrappers-devel
# Install stunnels for encryption on transit
brew upgrade stunnel
# Mount using DNS name of the EFS -- Resolves to Same AZ closest EFS server.
mount -t nfs4 -o krb5p <DNS_NAME>:/ /efs/
mount -t efs -o tls,iam,accesspoint=fsap-abcdef0123456789a fs-abc0123def456789a: /localmountpoint
# EFS file system driver understands the accesspoint.
# Local user has POSIX user privileges (UID, mount permission, etc)
.
. High-Performance SSD Max-Size-Varies
.
. Specify-Volume-Size-(unlike EFS)
.
. FSx File Gateway;
FSx - High performance file system on AWS. Classified as:
FSx For Windows File Server:
FSx For Lustre:
FSx For NetApp ONTAP:
FSx for OpenZFS:
Uses SSD. Max capacity of File system varies per type:
FSx for Windows File Server: Up to 64 TB.
FSx for Lustre: Up to 1 PB (persistent file systems).
FSx for NetApp ONTAP: Up to 192 TB.
FSx for OpenZFS: Up to 64 TiB.
Solution Architecture Tips:
|
| On-Premise AWS Cloud
|
| S3 File Gateway +------------ S3
| |
| Storage FSx File Gateway ------+------------ Fsx For Windows| FSx For Lustre| FSx For OpenZFS
| Gateway |
| Tape Gateway +------------ Virtual Tape
|
| Volume Gateway -------------------> S3
| Send Volume Backup
|
.
. NFS / SMB
. On-Premise <--------------> S3 File Gateway <----> S3 Bucket <--- Cloud Clients
. AD Cache / Sync
. Auto Refresh
.
. Useful for cloud-Native applications as well for NFS/SMB.
.
.
. Snapshot Schedule/Retention
. iSCSI Block Incremental
. On-Premise --------------> Volume Gateway ----> S3 (Volume EBS Snapshots)
. Any Stored / Cached |
. File System +---> Restore to EBS disk.
. (Not EFS)
.
. Note: Max Volume Size: Stored: 512TB; Cached: 1024TB
.
.
. SMB
. On-Premise -----------> FSx For Windows Gateway
. -----------> FSx For Lustre Gateway
. NFS
.
.
# You can mount Lustre File system using DNS name in EC2.
sudo mount -t lustre <fsx-file-system-dns>:<mount-name> /mnt/fsx-lustre
.
. Snowcone : 8 TB (Upto 14 TB SSD) (Appliance Machines)
. Snowball Edge : 80 TB (Upto 210 TB SSD)
. Snow Mobile : 100 PB (45 feet truck) (== 1200 Snowball Edge)
. (Recommended for >10 PB i.e. 125 Edge and more)
.
. 1Gbps - 1 Week - 75 TB (Snowball Edge)
. 100Mbps - 1 Week - 7 TB; 1 Month 28 TB; 2.5 Months 75 TB
. 50Mbps - 1 Week - 3.5 TB; 1 Month 14 TB; 2.5 Months 35 TB; 5 Months 75 TB
.
.
. Snowball -------> S3 ----> Glacier (No direct upload to Glacier)
.
|
| "Lift-and-Shift Rehost" "Replication Template" "Replication Agent"
| "Schema Conversion Tool"
|
| DMS
| Oracle, MsSQL, MySQL, PostGres ========> RDS, Aurora, S3, OpenSearch, Kinesis Datastreams
| MongoDB, SAP, DB2, Azure SQL, S3
|
| Migration Task SCT-Schema Conversion Tool
| Validation Task
|
Use Cases:
- Migrate on-premise VMs by installing AWS Replication Agent on them.
- Migrate on-premise Databases
Migration Validation Task The database tables and every row can be validated after migration using validation task.
You can also use Table Statistics window from Migration Service console to view the tables and rows count migrated with the source database to verify manually.
Replication of RDS MySQL to on-premise server is possible with either using DMS replication task or using MySQL native replication. (mysqldump|RDS Snapshot) + binlog replication start will work. VPN connection is recommended for this replication (for security) but not mandatory.
Migration service for VMs and software applications.
Use Cases:
|
|
| On-Premise AWS Cloud
| Other Clouds =====> EC2 DR
| VMs
| Region1 Region2
|
To migrate a VM, create a job in Application Migration Service to migrate virtual machines. The migration of VMs happens in background from on-premise to AWS and not launched until you do. After that test the VM and do the cut-over. :
# sudo python3 aws-replication-installer-init.py
Choose volumes: /dev/svda ....
Enter Access Key, Secret Key: ....
Your source server Id is ....
Replication Started. Manage using Application Migration Console.
Note: You can use the Active Directory Migration Toolkit (ADMT) along with the Password Export Service (PES) to migrate users from your on-premise Active Directory to your AWS Managed Microsoft AD directory.
Note: This is not part of Application Migration Service.
AWS Migration Hub delivers a guided end-to-end migration and modernization journey through discovery, assessment, planning, and execution.
Docker Concepts:
| (Container Apps)
|
| Container Layer Read/Write C1 C2 C3
| Layer 3 Read Only ------------------------------
| Layer 2 Read Only Bin/Lib Bin/Lib Bin/Lib
| Layer 1 Read Only ------------------------------
| Docker Engine
| Host OS
|
|
| Docker uses Containerd container runtime (Kubernetes uses Containerd, not docker)
|
.
. Build Run Commit
. DockerFile --------> DockerImage ------> DockerContainer ---------> Docker Image
.
. Docker Container can also be inactive.
.
DockerFile defines how the docker image is built. It starts with Layer-1 Parent image. And contains commands which creates upper layers using copy-on-write strategy.
For example:
FROM php:7.4-apache # This is Layer 1 - Parent Image
RUN apt-get update && apt-get upgrade -y # This is Layer 2
COPY code /var/www/html # This is Layer 3
EXPOSE 80
CMD ./my_script.sh # Default command to run if not specified in docker run.
Commands:
docker build -t my-username/my-image . # -t for tagging.
docker image ls
docker run --name my-app -p 80:80 -d my-username/my-image
docker push my-username/my-image
# To push my image to private registry ...
docker login repo.company.com:3456 --username my_username
docker tag 518a41981a6a repo.company.com:3456/myappImage
docker push repo.company.com:3456/myappImage
The SSM Agent is open sourced by AWS. See https://github.com/aws/amazon-ssm-agent This is a classic example of using docker to run make command to build the agent. The docker image contains all GoLang compiler and libraries. Use this to build locally! :
docker build -t ssm-agent-build-image . # Build using ./Dockerfile and tag image.
docker run -it --rm --name ssm-agent-build-container # name of running instance
-v `pwd`:/amazon-ssm-agent # Mount ./ to docker container
ssm-agent-build-image # Docker image name
make build-release
docker cp <containerId>:/file/path/within/container /host/path/target # works even if inactive.
docker start <container_name> # restart inactive container
docker exec -i container_name bash # Attach to running container
If single application needs mutiple docker images to run, then it is a multi-conatiner app.
Suppose if each docker conainter needs to be invoked with certain port mappings and local volumes mounts etc. This could be achieved by docker compose (for example).
version: '3.9'
services:
my-nginx-service:
container_name: my-website
image: my-nginx-image:latest
cpus: 1.5
mem_limit: 2048m
ports:
- "8080:80"
volumes:
- /host/dir/log:/log
my-db-service:
....
# docker compose up -d
.....
All containers will be created and running.
Docker images can be stored in Docker Hub and Amazon Elastic Container Registry.
Amazon ECR - Elastic Container Registery supports both Private and Public repository:
https://gallery.ecr.aws
ECR private registry supports cross-region and cross-account replication.
ECR image is scanned for CVE (common vulnerabilities) or extensive scaning using Amazon inspector.
docker history --no-trunc <image-name> # Find out base image
docker inspect <image-name> # Env variables, entry points
brew install dive
dive <image-name> # Visual info about Layers, files, commands.
To update the docker image with new files, you can create a new Dockerfile using the base docker image as the base and add the files and generate. Or you can activate the container copy new files and docker commit the new image.
|
| Route-53
| |
| | Web App | | ElastiCache
| Client ---> ELB ------> ASG + EC2s |--> Application | ----> Redis Multi-AZ
| Public Private Subnet| Private Subnet | ----> RDS/Database
| Multi-AZ Multi-AZ | Multi-AZ |
|
|
|
.
. 1 Beanstalk == 1 Web Application == N EC2 Instances
.
. Java NodeJS PHP Python Docker
.
. Environment Application
.
.
Web Server Tier vs Worker Tier :
| |
| Web Environment | Worker Environment
| |
| myapp.us-east-1.elasticbeanstalk.com |
| |
| (ASG) | (ASG)
| |
| +--> EC1 in AZ1 | EC1 in AZ1
| ELB | | <--- SQS Queue
| +--> EC2 in AZ2 | EC2 in AZ2
| |
EB Deployment Mode:
- Single Instance : All components on single EC2. Great For Dev.
- HA with LB: For production.
Runtimes support for:
Great to replatform on-premise to the cloud.
Instance config/OS is handled by Beanstalk
Deploy application for Lazy people; Auto-magically creates ELB, ASG, EC2 instances, etc.
Deployment strategy configurable but performed by EB.
Just the application code (.war for Java or .zip for PHP) is responsibility of the developer.
Deployment models:
Decoupling application using webworker + web tiers is common pattern. i.e. one with LB + ASG and another with ASG Only.
Beanstalk is free, only you pay for the underlying resources.
Environment: Collection of AWS resources.
It uses CloudFormation to provision infrastructure underneath. ClouldFormation template primarily defines Environment and Application.
Application defines version and the source .zip file in S3. Environment defines ASG, LB, EC2 instance type, and docker version and such.
HostManager is the agent running in each EC2 machine to help with EB admin tasks such as deploying application and monitoring.
The application is uploaded as a zip or war or etc file depending on the platform. For PHP application it is a zip file of top folder.
You may have to deploy different tiers and submit them to EB separately. Once you create a webtier, you get the application DNS name. For worker env, you don't get it.
It is not serverless solution. It is PaaS solution.
Elastic Beanstalk allows you to choose instance types, configure load balancers, and set scaling parameters.
Following deployment strategies supported:
|
| (ECS Service) (Service Scheduler) (Task Placement Strategy)
|
| VPC (ECS Cluster) (Tasks Launched in Fargate)
| (Uses Shared Pool of ENIs from App Subnet)
| ALB ASG (ALB Target - Use these Target Groups of IP Addresses)
|
| (App Subnet) (Data Subnet) (Task Definition)
| (Task)
| (Service = N Tasks)
| (Task = N Containers)
|
| EC2 (EC2 is an ECS Container instance running ECS container Agent)
|
| (ECS+Fargate Service AutoScaling)
|
Commands:
aws ecs create-cluster --cluster-name MyCluster
# Note: A default ECS cluster with name "default" already exists in your account.
aws ecs create-capacity-provider --name "MyCapacityProvider"
--auto-scaling-group-provider "autoScalingGroupArn=arn:.*"
# create-service defines and starts running the service as well.
# Specify launch-type or capacity provider not both.
aws ecs create-service --cluster MyCluster --service-name MyService \
--task-definition sample-fargate:1 --desired-count 2 --launch-type FARGATE \
--network-configuration
"awsvpcConfiguration={subnets=[xxx],securityGroups=[sg-x],assignPublicIp=ENABLED}"
aws ecs create-service --cluster MyCluster --service-name ecs-simple-service \
--task-definition sleep360:2 --desired-count 1 # Simple single instance task.
# deploy action updates service defintion with new Task definition and initiates CodeDeploy
aws ecs deploy --service <value> --task-definition <value> --codedeploy-appspec <value>
# Stop task. Equivalent to docker stop. Agent sends SIGTERM on task process.
aws ecs stop-task --task 666fdccc2e2d4b6894dd422f4eeee8f8
# Run new task directly. Auto placement
aws ecs run-task --cluster default --task-definition sleep360:1
# Start task with better control on placement. override execution roles, networking etc.
aws ecs start-task --cluster default --task-definition sleep360:1 \
--container-instances "<ec2-instance-ids>"
aws ecs list-clusters
aws ecs list-services --cluster <cluster_arn>
aws ecs list-tasks --cluster <cluster_arn>
aws ecs list-container-instances --cluster <cluster_arn>
Task definition represents a group of docker containers to run that task:
| 1:N
| Task Definition -------------> Docker Containers
It also includes Task execution IAM role, Docker image details, port mappings, env variables,
ECS Task is an instance of a Task Definition.
All docker container instances from single ECS Task run together on single machine:
|
| Task = 1 Task runs N docker container instances together
| on any single machine in cluster.
|
Task can be directly started or indirectly started through service.
Task definition optionally contains Task Launch Type. This is one of:
| Task Launch Type
|
| Fargate - Launch Task in Fargate
| EC2 - Launch in one of the available EC2 container Instance.
| External - Launch in one of the available external container Instance.
|
| Fallback mechanism is Cluster's capacity provider strategy.
Defines min and max number of Tasks to run. e.g. Web application service may define to run 4 to 10 tasks instances in the cluster. It also defines scaling policy like Step Scaling, etc. :
|
| Running ECS Service = N Tasks 1<=N<=max running across many EC2 instances.
|
Also includes Task Definition, desired/min/max no of tasks, VPC, subnet, security group, Load Balancer type (ALB), Container port mapping, EC2 target group, Listener Rules, Scaling Policy (Target Tracking or Step Scaling).
The scaling policy Target Tracking scales based on target value of total tasks.
.
. Target Group == Place Holder(HTTP, port 80, VPC-Id, HealthCheck-path, TargetGroup-Type)
.
. 1:1
. TargetGroup ----------- Service (Multiple Tasks of Single Service)
.
. TargetGroup Type == Instance (EC2) | IP (For Fargate) | Lambda (Not supported for ECS)
.
. +-------- TargetGroup1
. ALB1 ---+-------- TargetGroup2
. ALB2 +-------- TargetGroup3
.
.
The Service tasks automatically gets registered to the ALB and inbound requests are forwarded to that Target group. ECS makes sure Target group is populated with Fargate IP addresses automatically.
.
. (Tasks - TargetGroups)
. Serving WebApplications
. ALB ------ ECS -----------> MicroServices,
. (Host+Path based) Cluster Back-End APIs
. ( Routing )
.
. (Optional ALB2+ )
.
.
. ALB ---> Listener ---> Rule --> TargetGroup --> Register Task --> Create Service
. (Register Targets (e.g. EC2 OR IP) with Target Group)
.
. Note: Tasks get registered with TargetGroup.
. Multiple ALBs can share TargetGroups i.e. Tasks.
. Tasks may run on EC2 or FARGATE.
.
aws elbv2 create-target-group --name my-target-group --protocol HTTP --port 80 --vpc-id vpc-abc12345 \
... --health-check-path /health
aws elbv2 create-load-balancer --name my-load-balancer ...
aws elbv2 create-listener --load-balancer-arn arn:* --protocol HTTP --port 80 \
--default-actions Type=forward,TargetGroupArn=*
aws elbv2 create-rule --listener-arn arn:*
--conditions Field=path-pattern,Values='/app*'
--actions Type=forward,TargetGroupArn=* ...
aws ecs register-task-definition ... # ECS knows container name and container port mappings etc.
aws ecs create-service ... --task-definition my-task-def ...
--load-balancers "targetGroupArn=*:,containerName=my-container,containerPort=80" ...
# Targets like EC2 gets implicitly registered with Target Group by ecs create-service:
# aws elbv2 register-targets --target-group-arn arn:.* --targets Id=i-0598c7d356eba48d7,Port=80 ...
This is EC2 instance running docker agent and ECS Container Agent running on it. Any single task is typically deployed on any available ECS Container Instance.
Eventhough Task can also be deployed on Fargate, that is not a ECS Container Instance by definition. :
|
| ECS Container Instance is EC2 instance running docker and ECS agent.
|
| 1:N
| EC2 -------> May run total 8 tasks coming
| (Docker Agent) from 4 different services.
| (ECS Agent)
|
|
| ECS Cluster contains group of ECS Container Instances.
|
| 1:N
| ECS Cluster -------- EC2
|
|
A default cluster is created in your account (which is empty). You can create additional named clusters.
Cluster Launch Type could be: EC2 | Fargate | External
You should register EC2 instances before launching tasks and services.
For external instances, you should prepare it using ecs-anywhere-install.sh script.
Each cluster can use multiple capacity providers to spread tasks across different ASGs or Fargate using capacity provider strategies.
Capacity provider concept existed even before ASG.
|
| Capacity Provider
|
| -----> Fargate (Predefined Capacity Provider)
| -----> Fargate Spot (Predefined Capacity Provider)
| Capacity Provider -----> EC2 Capacity Provider ---> ASG EC2 | ASG Spot
| (This Capacity Provider is like alias for an ASG)
| -----> External Instances
|
| N:1
| Capacity Provider ------ Cluster (Auto Scaling turned on)
|
| upto 1:6
| Capacity Provider Strategey --------------- Capacity Provider
| Contains
| (Fargate, ASG1, ASG2)
| (50%, 25%, 25%)
|
|
Service uses either Launch Type' or `Capacity Provider strategy not both. If you specify service launch type as "FARGATE", you may not have to explicitly specify capacity provider as "FARGATE".
ECS dynamically allocates ENIs with necessary subnet Private IPs before instantiating Fargate launch for the Task.
ECS clusters can contain a mix of tasks hosted on AWS Fargate, Amazon EC2 instances, or external instances. They can also contain a mix of Auto Scaling group capacity providers and Fargate capacity providers
In Amazon Elastic Container Service (ECS), a capacity provider strategy determines how tasks are spread across the cluster's capacity providers. The strategy is made up of one or more capacity providers, along with a base and weight for each.
ECS may transition container instance to "DRAINING" state in order to prepare to remove it. For example, SPOT instance may have to be released or replaced. Spot instance will receive interrupt and ECS will mark that instance in "Draining" state.
Suppose desiredCount = 4, minimumHealthyPercent=50%, then temporarily only 2 nodes may be available during this transition. Draining instance will not accept new task. If maxHealthyPercent=200%, then temporarily there may be 8 nodes running as part of transition.
SpontInstance draining should be enabled in ECS agent config.
/usr/bin/docker run --name ecs-agent --init
--volume=/var/run:/var/run ...
--net=host # awsvpc in EC2
--env-file=/etc/ecs/ecs.config
--env ECS_DATADIR=/data ... --detach
amazon/amazon-ecs-agent:latest
| On-Premise
| Node
| Runs SSM Agent, ECS Agent
Use Case: Create ECS using on-premises machines.
Commands:
aws ssm create-activation --iam-role ecsAnywhereRole | tee ssm-activation.json
bash /tmp/ecs-anywhere-install.sh
Amazon ECS Anywhere can be used to use your own EC2 instances to your ECS cluster. Just install ECS Container Agent and SSM Agent on on-premises server. Register that EC2 instance with SSM and then with your cluster.
. (Service)
. ECS -------> FarGate Task1 --- Target Tracking: CPU Usage = 70%
. Task2 Enable Service AutoScaling.
. Task3
.
.
The task definition specifies launch type: Fargate | EC2 | External;
In addition ECS service (i.e. Tasks) specifies the deployment type associated.
The Deployment types supported are:
The deployment itself may be initiated from Console or cli or Cloudformation. The cloudformation specifies one of the above deployment type and it drives it.
aws ecs run-task --cluster my-cluster --launch-type FARGATE
--network-configuration "awsvpcConfiguration=*" --task-definition my-fargate-task --count 1
# --overrides ... (Pass env Variables)
# Configure EventBridge Rule to Invoke task every 1 hour
aws events put-rule --name "TriggerFargateTask" --schedule-expression "rate(1 hour)"
aws events put-targets --rule "TriggerFargateTask" --targets "Id"="1","Arn"="arn:aws:ecs:*:task/my-fargate-task"
Key Historical Milestones For Containers:
1990s: Early virtualization technologies emerge (VMware, FreeBSD Jails).
2008: Linux Containers (LXC) introduced.
2013: Docker launched, revolutionizing container usage.
[ Tools, Images, Dockerfile. Revolutionised Adoption ]
2014: Google open-sourced Kubernetes.
2015: OCI formed to standardize containers, and Kubernetes became dominant.
2017: Cloud providers introduce managed Kubernetes services (EKS, GKE, AKS).
2020s: Kubernetes leads the container orchestration space, with an expanding cloud-native ecosystem.
Key Milestones For Serverless Computing:
1990s–2000s: Virtualization, VMs (VMware, FreeBSD Jails)
2006–2014: IaaS (AWS EC2), followed by PaaS solutions (Google App Engine, Heroku).
2014: AWS Lambda (FaaS), serverless Revolutionized
2015–2016: Competitors like Azure Functions and Google Cloud Functions enter the serverless market.
2017–2020: Ecosystem matures with tools. Expands beyond FaaS to include databases, storage, and containers.
2020–Present: Serverless moves into edge computing, AI/ML, and containers.
.
. Control-Plane
.
. AWS Load Balancer Controllers
. Ingress Resources (ALB) Service Resources (NLB)
.
. Node Pods
. EC2
.
.-----------------------------------------------------------------------------------------------------
. EKS ECS
.-----------------------------------------------------------------------------------------------------
.
. Control Plane ECS Control Plane (internal)
.
. Nodes with kubelet Container Instance EC2 with ECS agent.
.
. Pod (mutli containers, Shares Storage) Task (multi Container)
.
. Deployment - Pod Replicas (Pod Instances) Service (Group of Tasks)
. (Also Scaling mechanisms, Rolling updates, etc)
. (Manifest yaml with placement constraints)
. (Includes ReplicaSets - Stateless Set of Pods)
.
. StatefulSet - Persistent Volume, DNS Name, etc. Task is stateless only. No alternatives.
. (For Databases, Kafka, etc)
.
. DaemonSets (Force pod on every/selected Nodes) Fixed ECS Agent. No Alternatives.
. (For Log, monitor etc. Fluentd, Prometheus, etc)
.
. Services (Pods network endpoints) Load Balancer, TargetGroup, Task Definition
. (ClusterIP (internal), NodePort and LB)
.
. Ingress, Ingress Controller (Traffic Routing) Load Balancer
.
. Namespaces (Cluster Partition - Isolation) Use different ECS Clusters.
.
. ConfigMaps Env variables, SSM Parameters
. Secrets (Inject into Pod as env var/files) Secrets Manager, SSM Parameters
.
. Persistent Volumes (PV) and Claims (PVC) EBS, EFS
.
. Helm (Uses Helm charts- Defines Pods, Services. Task Definition. Defines container, CPU, etc.
. Pkg Manager. Defines all resources)
.
. Horizontal Pod Autoscaler (HPA) ECS Autoscaling
.
.
ECS
Docker Swarm
Docker Compose - Very simple, Run multiple containers on single host. For dev.
AKS - Azure Kubernetes Services
GKE - Google Kubernetes Engine
Nomad - HarshiCorp Nomad (Terraform company)
OpenShift - Redhat's Offering
Run containerized tasks in serverless environment.
.
. Serverless Containerized-Task-Only ECS OR EKS Tasks
.
. Autoscaling (with ECS only)
.
. Load Balancing (with ECS only)
.
If you run Fargate Task as a service with ECS with Load balancer, then use HTTP Proxy integration with API Gateway :
API Gateway -----> Http Integration ----> ALB -> Fargate Service
You can directly invoke Fargate task from Lambda:
API Gateway ---> Lambda ---> Invoke Fargate Task.
You specify following while invoking ECS task from lambda:
The overrides (like environment variable) is used as input while invoking the specfic ecs task from Lambda.
Classic Load Balancer - v1 - 2009 - CLB - HTTP/HTTPS, TCP, SSL - Layer 7 or 4 (TCP)
Application Load Balancer - v2 - 2016 - ALB - HTTP/HTTPS, WebSocket
Network Load Balancer - v2 - 2017 - NLB - Layer 4 i.e. at TCP/UDP Level.
Gateway Load Balancer - 2020 - GWLB - Layer 3 Network Layer - IP protocol.
`Cross Zone Load Balancing`:
.
. Cross-Zone Load Balancing === Round-Robin of All Targets. All Targets Gets equal load.
. May Generate (unnecessary) Cross Zone Traffic (Rare)
.
Load balancers support sticky sessions using cookie but using it can imbalance it.
Load balancers do not belong to any specific subnet and just attached to VPC. Physical location (or AZ) of load balancer could be anywhere. ALB gets a DNS name that is dynamically resolved by Route-53, it does not have static IP.
Dispatch algorithms include:
.
. Target Group
. /user
. +-------------------> ASG (TargetGroup) | HealthCheck Task1
. |
. ALB -------| /action
. +-------------------> Lambda | HealthCheck Task2
. App Listener |
. | ?process=reports
. https Listener +-------------------> IP (ECS Tasks) | HealthCheck
. http listner | (Port mapping) (awsvpc net)
. |
. domain, path, | /batchjob
. port based +-------------------> EC2 | HealthCheck
. routing |
. | /chat
. Dynamic +-------------------> IP (Fargate) | HealthCheck
. Public IP | (awsvpc net)
. | /offers
. Security Grp +-------------------> S3 | HealthCheck
. |
. No Elastic-IP | Host=abc.com
. +-------------------> EC2 | HealthCheck Multi-Domain; Host+Path Routing OK.
. Many ENIs |
. | /app1/action
. SSL Certs +-------------------> EC2 | HealthCheck Multi MicroServices/apps;
.
Commands:
# Create ALB enable multi AZ by specifying subnets
aws elbv2 create-load-balancer --name my-alb --subnets subnet-b7d581c0 subnet-8360a9e7
aws elbv2 create-target-group --name my-targets --protocol HTTP --port 80 # source port
--target-type instance --vpc-id vpc-3ac0fb5f
aws elbv2 register-targets \
--target-group-arn arn:.*
--targets Id=i-0598c7d356eba48d7,Port=80 Id=i-0598c7d356eba48d7,Port=766
# target Id could be instance-id, ip or arn of lambda or another alb.
# Note: target group port is source and instance port is destination.
# Add http listener
aws elbv2 create-listener --load-balancer-arn arn:.* --protocol HTTP --port 80 \
--default-actions Type=forward,TargetGroupArn=arn:.*
# default actions could be in JSON for complex specification.
# actionType: forward | authenticate-oidc|authenticate-cognito|redirect|fixed-response
# RedirectConfig: { protocol: "HTTP,HTTPS", ... }
# AuthenticateOidcConfig: {
# "Issuer": "string",
# "AuthorizationEndpoint": "string",
# "ClientId": "string",
# "SessionCookieName": "string", ...
# ....
# }
aws elbv2 create-listener --load-balancer-arn arn:.* --protocol HTTPS --port 443 \
--certificates CertificateArn=arn:.* --ssl-policy ELBSecurityPolicy-2016-08 \
--default-actions Type=forward,TargetGroupArn=arn:.*
# Network load balancers support TCP and TLS as protocols and can do SSL termination as well.
aws elbv2 create-listener ... --protocol TLS --port 443 --certificates ...
aws elbv2 create-rule --listener-arn arn:* --priority 5
--conditions file://conditions-pattern.json # Specifies path e.g. /action
--actions Type=forward,TargetGroupArn=arn:*
.
.
. CloudFront -------------> ALB
. (us-east-1 SSL Cert) (Same SSL custom domain Cert OK only if ALB in us-east-1)
. Custom Domain (Otherwise use new ACM certificate in same Region)
.
.
. TCP Listener Rules
.
. EIP OK. TCP/3306
. Many ENIs Forwards +-------------> TargetGroup MySQL <--> HealthCheck
. TCP and UDP |
. ---> NLB --------------->|
. (One Per AZ) | TCP/80
. (Dynamic IPs) +-------------> Web Applications <--> HealthCheck
. | TargetGroup
. | TCP/8080
. No Sec. Group +-------------> ALB ----> ASG
. NACL Subnet OK.
.
Forwards TCP and UDP packets.
NLB preserves client IP by default!
Rewrites Destination IP from NLB to target EC2 address. On reverse rewrites the Source IP address.
This could pose problem if NLB is used by the target itself where source and destination will have same IP and packet may be dropped! :
.
. Rewrites Dest IP
. Source --------------> NLB -------------------> EC2
. <--------------
. Rewrite Source IP
.
. Note: If Source === Destination, then Problem!
.
The workaround for above problem is just to disable client-IP Preservation or use Proxy Protocol v2 which disables client IP preservation plus prepends TCP stream with client IP information. (equivalent to X-forwarded-for):
PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n # Sends same header even for SSH, FTP
GET / HTTP/1.1\r\n # If App does not expect, it will fail!
Host: 192.168.0.11\r\n
Less Latency ~100ms vs ~400ms for ALB.
Handles millions of requests per seconds.
One public IP per AZ.
Note: NLB Supports Elastic IP unlike ALB!
If you enable NLB for multi AZ (recommended), you get one NLB node per AZ. Then NLB gets a list of static IP addresses to represent it as many IPs as enabled availability zones only -- There is no additional global IP allocated.
If you enable cross-zone load balancing (off by default), the NLB may route traffic across AZ (may not be desired in many cases)
The NLB private IP is auto assigned on creation using any IP address in AZ's CIDR block.
Target Groups could be:
NLB can optionally do SSL termination, if you enable it. You need to install SSL certificates at NLB for that.
NLB typically has DNS name like my-nlb-xxx.us-east-1.amazon.aws.com which resolves to multiple IP addresses one for each AZ. To get zonal DNS name add prefix as AZ. e.g. us-east-1a.my-nlb-xxx.*.aws.com
. GLBE - Gateway Load Balancer Endpoint
.
. Internet +---------------------------------+
. | | |
. | | VPC2 (GLB Service Provider) |
. +-------------------------- IGW------+ | |
. | | | | |
. | AZ | | | |
. | | | | AZ |
. | (Apps) V | | |
. | Subnet-1 <--> Subnet-2 GLB | <========> | GLB <------> EC2-Instances |
. | EndPoint | | (Appliances) |
. | (Consumer)VPC | | (Firewalls) |
. +------------------------------------+ +---------------------------------+
.
Note: Elastic Load Balancer(ELB) is not a typical load balancer but health checker. ELB provides the application-level health check by monitoring the endpoint (a webpage, or a health page in a web application) of an application. It can mark an instance unhealthy so that ASG can terminate that instance.
.
. +----> Lambda-Auth
. REST/HTTP |
. Req/Response (Caching) Lambda | HTTPS | Step Func | S3
. Client <----------------> API Gateway -----> DataStream | SQS | SNS | AppRunner
. Max: 10MB 29s (IAM - SIGV4) DynamoDB | VPC-Link
. API Key/Usage Plan (WebSockets)
. (Private/Pub)
.
Edge optimized is by default. Edge optimized API gateway uses cloudfront. If you use with in AWS or invoke it from cloudfront, then use only Regional optimized deployment.
.
. Fully Managed GraphQL API Service.
.
.
. Mobile -----------> AppSync Endpoint -----> HTTP REST | Lambda | DynamoDB
. Web App <---------- [Schema Introspection] Local Pub/Sub
. Websocket [Resolvers ]
.
.
. Lambda Publish Pub/Sub
. EventBridge -----------> AppSync Pub/Sub API [Serverless WebSockets] <--------> Mobile/Web
.
. AppSync === Sync Mobile and Enterprise Apps using GraphQL - 2 way Websockets.
.
# Example GraphQL Schema
type User {
id: ID!
name: String!
email: String!
}
type Query {
getUser(id: ID!): User
}
# Example Resolver Request mapping template
{
"version": "2017-02-28",
"operation": "GetItem",
"key": {
"id": $util.dynamodb.toDynamoDBJson($ctx.args.id)
}
}
# Example Response mapping template
$util.toJson($ctx.result)
.
. Root DNS Server (ICANN)
.
.
. Client ---> Local DNS ------> TLD DNS Server (ICANN)
. Server (.com)
.
.
. SLD DNS Server
. (mydomain.com)
.
. Records:
.
. mydomain.com A 2.3.4.5 # Maps domain name to IP
. mydomain.com A 5.6.7.8 # Multiple A records are Okay!
. www.mydomain.com CNAME mydomain.com # Subdomain alias. Must be unique.
. # Top domain can't be CNAME'ed.
. # Can't co-exist with A record for same name.
.
. mydomain.com ALIAS xyz.com # Top domain can be aliased.
. # Non-std extn DNS record type.
.
. app.xyz.com ALIAS myalb.amazonaws.com # AWS may do recursive lookup and
. # return the result as A record.
. # ALIAS lookups are free for AWS!
. # No TTL. Because it is not propagated!
.
. ALIAS Record Targets:
. Load Balancers, API Gateway, S3 Websites, VPC Interface EndPoints,
. Global Accelerator accelerator, CloudFront Distributions, Elastic Beanstalk env
.
.
. Public-Hosted-Zones Private-Hosted-Zones DNS-Server DNSSEC
.
. Routing-Policy Health-Check Resolver
.
.
. 1:N Can Associate
. Route 53 Resolver ----- Resolver Rules ----- PrivateHostedZones ---------------- Other VPCs
. |
. +-- External Domains
.
. Implicit Route 53 Resolver == VPC+2 == Amazon Provided DNS == VPC DNS Resolver
.
. RAM-Share-PHZ Rule === DNS Records (PHZ) + Forward Pointer (External Domains)
.
. Target Resources: Cloudfront, ELB, S3, API Gateway, EC2, Global Accelerator
.
.
.
.
. +--- On-Premise to Resolve VPC domains
. |
. V (Local IP, SG)
. DNS Clients in VPC ----> VPC+2 --------- Inbound Resolver Endpoint (On-premise connects to this)
. | (Optional) (Single Endpoint Resolves All PHZ)
. |
. +---- Outbound Resolver Endpoint (Local IP, SG) (Route 53 uses this)
. (One Per external domain)
.
. One Outbound Resolver ---- One external Domain only. (Create multiple Resolver Rules for multiple Domains)
.
. VPC --- (DHCP Option Set)
.
.
. 1 Resolver Rule <-----> 1 Domain only.
. 1 Resolver Rule <-----> Atmost 1 Resolver endpoint Only (For outbound resolver rule only)
.
.
aws route53 list-hosted-zones
aws route53 list-health-checks
# All resolver rules are meant to be active. You don't disable/enable resolver rules.
aws route53resolver list-resolver-rules # List resolver rules in current account.
# Domain name: . ==> Internet Resolver
#
# Domain name: example.com ==> TargetIps: 10.21.1.5:53; (Target DNS Server. External Domain Only)
# RuleType: FORWARD
# ResolverEndpointId: rslvr-out-xxxx (Local IP endpoint.)
# STATUS: COMPLETE | FAILED | ACTION_NEEDED
#
# Domain name: anothervpc.com OwnerId: <Owner-account-id> (For external VPC only)
# ShareStatus: 'SHARED_WITH_ME'
# RuleType: SYSTEM (No resolver endpoint for this)
#
# List VPC Level resolver endpoints. All resolver endpoints need note be associated with Resolver rules.
# Endpoint does not include the domains that it is responsible for.
# You should name the endpoint properly if you plan to associate rules later.
aws route53resolver list-resolver-endpoints
{
"ResolverEndpoints": [
{
"Id": "rslvr-out-1234567890abcdef0",
....
"Name": "OutboundResolverEndpoint",
"Direction": "OUTBOUND",
"IpAddresses": [ {
...
"SubnetId": "subnet-0abc1234def567890",
"Ip": "10.0.1.10", # Local of VPC local IP addresses where DNS server runs.
# This can be used by on-premise servers also depending on SG.
# Usually one Outbound Endpoint for One domain.
}, ... ]
},
{
"Id": "rslvr-in-abcdef1234567890",
...
"Name": "InboundResolverEndpoint",
"Direction": "INBOUND",
}
]
}
.
.
. Active-Active And Active-Passive (Failover) Application-Routing
.
. Target: NLB, ALB, API Gateway Cross-Region
.
. [All Healthcheck + Fallback Support]
.
. +----------> Simple (Single A Record)
. +----------> Fail Over (Health Check) (Forced Passive)
. +----------> Latency Based (Health Check)
. DNS Name ----> Route 53 |----------> Weighted Records e.g. 70% / 30%
. +----------> Geo Location (Check client IP)
. +----------> Geo Proximity (Check client IP)
. +----------> Multi Valued (upto 8) A Records (Client LB)
. +----------> IP Based (Client IP CIDR <--> A Record Mapping)
.
.
Different Routing Policies exist for returning values for resolution:
|
| Private-Hosted-Zone ------ VPC1, VPC2, VPC3 (Same Account)
| |
| +--- External VPC (Using CLI only)
There are public and private hosted zones. The private hosted zones is for using Route-53 from within VPC. You must enable enableDnsHostnames, enableDnsSupport from VPC settings.
The public hosted zones is for use by world for your public DNS name.
You can associate more vpc's in same account to private hosted zones:
associate-vpc-with-hosted-zone --hosted-zone-id <value> --vpc <value>
To associate another account (B) VPC to your (A) private hosted zone:
aws route53 list-hosted-zones
aws route53 list-vpc-association-authorizations --hosted-zone-id <hosted-zone-id>
aws route53 create-vpc-association-authorization --hosted-zone-id <hosted-zone-id>
--vpc VPCRegion=<region>,VPCId=<vpc-id> --region us-east-1
# From Account B
aws route53 associate-vpc-with-hosted-zone --hosted-zone-id <hosted-zone-id>
--vpc VPCRegion=<region>,VPCId=<vpc-id> --region us-east-1
# From Account A
aws route53 delete-vpc-association-authorization --hosted-zone-id <hosted-zone-id>
--vpc VPCRegion=<region>,VPCId=<vpc-id> --region us-east-1
Here is an example to monitor an ALB where IP address is not known:
aws route53 create-health-check --caller-reference unique-alb-check-98765 \
--health-check-config '{
# Use IPAddress to monitor EIP or known endpoint.
"FullyQualifiedDomainName": "my-alb-1234567890.us-east-1.elb.amazonaws.com",
"Port": 80,
"Type": "HTTP", # "TCP" to monitor NLB.
"ResourcePath": "/health", # Or just "/"; It should just return 200
"RequestInterval": 30,
"FailureThreshold": 3
}'
Healthcheck monitors endpoints such as application (ALB), server, other AWS resource. Healthcheck can also monitor other healthchecks (Calculated Health Checks).
Healthcheck that monitors CloudWatch Alarms.
There are about 15 global health checkers available.
Global health checkers are not available from private hosted zones (VPC), so it must rely on custom cloudwatch metric and associate cloudwatch alarm and use that.
.
. Resolver Forwarding Rules
. VPC |
. V
. Route 53 -----> Outbound EndPoint ---> On-Premise Server
. Resolve domain.com ---> Resolver <----- Inbound EndPoint <--- On-Premise Server
. (VPC+2)
.
Multi-Availability Zone recovery:
Commands:
# Cells in ARC define the logical groups of resources
# Readiness Checks validate the health of your resources in each cell.
aws arc create-cell --cell-name PrimaryAppCell --resource-set PrimaryResources
aws arc create-cell --cell-name SecondaryAppCell --resource-set SecondaryResources
aws arc create-readiness-check --readiness-check-name PrimaryAppReadinessCheck --resource-set PrimaryResources
aws arc create-readiness-check --readiness-check-name SecondaryAppReadinessCheck --resource-set SecondaryResources
# Create Routing Control to Manage Traffic
aws arc create-routing-control --routing-control-name PrimaryRegionControl --control-panel PrimaryControlPanel
aws arc create-routing-control --routing-control-name SecondaryRegionControl --control-panel SecondaryControlPanel
# Set Up Route 53 Health Checks and Failover Policies
aws route53 create-health-check --caller-reference primary-health-check \
--health-check-config IPAddress=xx.xx.xx.xx,Port=80,Type=HTTP,ResourcePath="/"
# Perform Manual Failover (Test) Using ARC
aws arc update-routing-control-state \
--routing-control-arn arn:aws:arc:control-panel/primary \
--routing-control-state OFF
aws arc update-routing-control-state \
--routing-control-arn arn:aws:arc:control-panel/secondary \
--routing-control-state ON
.
. Global-Application-Router Global-AWS-Network TCP+UDP
.
. Endpoint-Group-per-Region Health-Check Listeners-with-Port
.
. Intelligent-Routing Static-IPs+DNS-Name
.
. Upto 10 Endpoints
. Global-Accelerator ---------------------> ALB, NLB, EC2, IP
. Listeners
. :80 :443
.
. 1 Endpoint Group Per Region.
.
. Routing and Healthcheck indepdendent of Route 53.
.
It is used for stateless servers like DNS servers to share same IP by multiple servers.
Better auth support and standard well documented.
You can directly integrate with AWS Services using first class integration using HTTP API. Note: There is a payload limit of 10 MB going through API Gateway.
Following services are supported:
You can create VPC Link by specifying the subnets and then integrate using that VPC Link.
Expose third party http service with auth and other support.
|
| CDN Regions(34) AZs(108) Regional-Edge-Caches(13) Edge-Locations(215) POPs(600)
|
| WebSockets Origins OrginGroups OAI OAC Lambda@Edge RestrictViewerAccess
|
| CloudFront-Signed-URL Signed-Cookies Cache-Behaviour Geo-Restrictions-Blocking
|
|
| new HTTP session Supported Origins
| https https/http
| Viewer ---------> Cloudfront ---------------> ALB | EC2 | API GW | HTTP
| us-east-1-SSL-Cert
| Lambda@Edge
|
| +--- Origin Group ------+
| | Fail-Over |
| CloudFront ----------->| Primary Origin |
| | Secondary Origin |
| | Health Check |
| +------------------------+
|
content Delivery Network (CDN)
Content cached at edge
225+ Points ( 215 Edge locations and 13 R egional Edge Caches )
Protection against DDoS attacks and integration with AWS Shield, WAF and Route 53
can talk to internal HTTPS backends
Supports Websockets
Supported Origins:
- S3 Bucket for distributing files.
- Above Works with Origin Access Control (OAC) replacing Origin Access Identity(OAI)
- Can be used as ingress to upload files to S3. (using S3 transfer acceleration)
- S3 Bucket configured as website.
- Mediastore Container to deliver Video on Demand (VOD) using AWS Media Services
- Custom origin HTTP:
+ API Gateway
+ EC2 instance
+ ALB or CLB
+ Any HTTP backend
Custom Origins:
Custom Origins (like EC2 and ALB) need not whitelist client IPs
but should whitelist edge location IPs.
The EC2 (or ALB) must be available using public IP not private.
To prevent others directly acccessing EC2, you can configure cloudfront to add
Custom HTTP Header name=value as a secret.
Filter requests at the backend.
You can also use the security group of EC2 to allow only edge location IPs.
CloudFRont `Origin Groups`:
Origin Groups help to increase HA and failover.
CloudFront ---------> Origin Group (Two EC2s in different regions)
You can specify 2 Origins in a group, e.g. EC2 in 2 different regions.
If the request returns error code, it will be retried in second Origin.
Cloudfront + API Gateway -- Multi-Region Architecture:
. DynamoDB
. Lambda@Edge +--> API GW Region1 -- Lambda --> Global DB
. Client ---> CloudFront ----------------| |
. +--> API GW Region2 -- Lambda -------+
.
def lambda_handler(event, context):
request = event['Records'][0]['cf']['request']
headers = request['headers']
country_code = headers.get('cloudfront-viewer-country', [{}])[0].get('value')
if country_code in ['US', 'CA']:
request['origin'] = {
"custom": {
"domainName": "us-origin.example.com", /* Must be already configured */
"port": 443,
"protocol": "https",
"path": "",
"sslProtocols": ["TLSv1.2"],
"readTimeout": 5,
"keepaliveTimeout": 5,
"customHeaders": {}
}
}
else if ....
Signed URL and Signed cookies achieves same purpose. Following query parameters are reserved for signing, do not use it for application:
Expires
Policy # Canned policy or Custom Policy
Signature
Key-Pair-Id Trusted-signers
.
. https://abc.com
.
. OAC
. Restrict Viewer Access (/private or /public URL based)
.
. Application Signed URL: Sign using Certificate trusted by Cloudfront.
. /private/file?Signature=xxxxx
.
. OAI - Virtual user for Cloudfront. Used to give permission to read S3 from cloudfront.
.
. Viewer-Protocol-Policy Origin-Protocol-Policy
.
. Enable Restrict-Viewer-Access == Require SignedURL
.
CloudFront distribution can be configured with Restricted Viewer access with signed URL only.
In that case, you also need to configure trusted Key Group (public and private RSA Key) used to generate URL signing.
You can use Cache-Behaviour Path to Restric-Viewer-Access only for some paths.
CloudFront Signed URL is generated by API call into CloudFront as a trusted signer.
CloudFront Signed URL applies to any Origin Paths (S3 or not) and leverage caching where as S3 Pre-signed URL applies to only S3 buckets.
Note that S3 Pre-signed URL uses access-key and secret-key and HMAC for signature where as CloudFront uses a special Key-pair attached to cloudfront for signing purpose.
Cloudfront singed URL looks like this (Uses Keypair):
https://d111111abcdef8.cloudfront.net/path/to/file.jpg?
Expires=1669999200&
Signature=EXAMPLESIGNATURE& <-- Algorithm: RSA-SHA256 (RSA-SHA1 is legacy)
Key-Pair-Id=APKAIXXXXXXXXXXXX
S3 Signed URL looks like this (Uses your IAM access Keys):
https://my-bucket.s3.amazonaws.com/my-object?
X-Amz-Algorithm=AWS4-HMAC-SHA256&
X-Amz-Credential=YOUR-ACCESS-KEY-ID/20240329/us-east-1/s3/aws4_request&
X-Amz-Date=20240329T120000Z&
X-Amz-Expires=3600&
X-Amz-SignedHeaders=host& <-- Host: my-bucket.s3.amazonzws.com is also signed.
X-Amz-Signature=EXAMPLESIGNATURE <-- HMAC Hash based Msg Auth Code signature.
Use Cases: Authentication, Geo customizations.
|
| Lambda@Edge, CloudFront Functions can intercept Requests.
|
| +----> External-Auth
| |
| Lambda@Edge
| |
| Viewer Request | Origin Request
| -----------------> | ---------------->
| Viewer <---------------- CloudFront <---------------- Origin(S3)
| Viewer Response Origin Response
|
| (CloudFront Func) (Lambda@Edge 4 Hooks)
| (2 Hooks only)
|
.
. Encrypt-Using Decrypt Using
. Public Key Private Key
. Request ----> Cloudfront ------------------> Origin (S3 or Custom)
.
| Managed Redis or Memcached (Key value Store)
|
| Read/Write CacheMiss
| App <----------------> ElastiCache <----------------> RDS
| SessionData
|
| Backup
| REDIS <------------> Disk Note: HA but no horizontal scaling.
| AZ1 AZ2 Restore
|
|
| Read/Write Sharded Note: Partition scaling. No HA.
| Memcached --------------> Partitions No persistence.
|
|
| Serverless Option
|
.
. Cluster Nodes Shards Primary-node Secondary-nodes
.
. Sources
. Single Region
. Lambda --------------> DynamoDB
. API Gateway Unlimited Storage
. Step Functions WCU/RCU Provisioned/OnDemand or
. Glue WCU/RCU Optional Autoscaling
. IOT Core Rules Strong or Eventual Consistent.
.
. DynamoDB Global Tables
.
. client -----------> Region-1 (Read/Write) (Master-Master)
. (Application Region-2 (Read/Write)
. Auto-failover)
. (Eventual Consistency Only)
.
.
. Last-Writer-Wins-By-Request-Timestamp WCU RCU
.
. PartitionKey + SortKey LSI GSI Stream DAX TTL
.
. Global Tables Items Attributes Storage-Always-Unlimited
.
. Active-Active AutoScaling-AutoScales-WCU-RCU
.
. AutoScaling AdaptiveCapacity
.
.
.
. Last Writer Wins based on request timestamp;
. DAX PartitionKey or (PartitionKey+SortKey) Composite Key
. Storage Autoscaled
. Capacity Provisioned or OnDemand
. Item (Rows), Attributes (Columns), Different rows different attributes OK.
. Multiple Indexes Okay. (Global Secondary Index).
. Global Tables - Cross Region - Active
|
| ELK (ElasticSearch Logstash Kibana)
| Realtime Indexing FullText
|
| Logs --------> OpenSearch -----------> Kibana (Realtime)
| Clickstream
| Cloudwatch Logs
|
. Auto FailOver
. <--------------------> Manual Failover
. 60s 35s
.
. Multi-AZ [Complimentary]
.
. Primary StandBy Readable-Standby Read-Replica AutoFailover Storage
. AZ1 AZ2 AZ3
. RDS yes Sync Sync Upto 15 By standby (60s/35s) Indepdendent
. Aurora yes .... ............... Async (max:15) By Replica (35s) Cluster
. Global A. yes .... ............... 1+5 Regionsx15 By Replica (60s) Cluster
. Srvless A. Automatic Cluster
.
. Aurora implies Cluster and Logical Shared Storage Layer.
.
. RDS-PITR AutoScaling (Read Replicas and Storage)
.
.
. Multi-AZ Deployment and Read Replicas are Complimentary and independent Features.
. Multi-AZ + 1 Standby Instance = Multi AZ RDS Instance Deployment.
. Multi-AZ + 2 Standby Instances = Multi AZ RDS Cluster Deployment. (Standby Is Readable)
.
. PITR == Just enable Backup Retention Time: 1 to 35 days.
.
- KMS encryption at rest for underlying EBS volumes/snapshots.
- Transparent Data Encryption (TDE) for Oracle and SQL server.
- SSL encryption to RDS is possible for all DB (in-flight)
- IAM auth for MySQL, PostgreSQL and MariaDB; Auth still happens within RDS.
Aurora is single region only. Only Aurora Global supports multi-region.
.
. Primary(Writer) Replica (Upto 15) Max: 3 AZs
.
.
.
. AZ1 AZ2 AZ3 Total
. Storage 2 2 2 6 copies using Storage Based Replication
.
. Quorum based writes - 4 out of 6 writes should be complete.
.
. Multi-master Active-Active possible for Aurora MySQL only (not postgres) but deprecated.
.
. Auto-Failover to ReadReplica.
.
.
. RDS Aurora
.
. Independent DB Instances DB Cluster. Always.
. (Multi-AZ+2RR is called cluster but independent.)
. Separate Storage Logical Shared Storage Volume
. Fail-Over with Multi-AZ Standby Auto Fail-Over by Read-Replica
. Multi-AZ + Passive or 2 RR Passive Standby NA. Only Read-Replicas.
. Can increase storage size, type later online Aurora storage is always auto scalable.
. (Just specify min and max storage limits)
. Support Oracle, SQL Server also. Only MySQL and Postgres
. Multi-AZ replication synchronous. Read-Replicas Asynchronous.
.
. Disk: gp2, gp3, io1, io2 (Explicit) 16 TB Max. Disk: Auto managed. Auto IOPS scaling. 64 TB Max.
.
. Example Writer/reader endpoints:
.
. mydbinst.abcdxxx.us-west-2.rds.amazonaws.com my-aurora-cluster.cluster-abcdefghij.us-west-2.rds.amazonaws.com
. mydbinst-ro.abcdxxx.us-west-2.rds.amazonaws.com my-aurora-cluster-ro.cluster-abcdefghij.us-west-2.rds.amazonaws.com
.
- Cluster Endpoint (aka Writer Endpoint): Connects to primary DB instance
- `Reader Endpoint`: List of (ip+port) of all Read Replicas.
- `Custom Endpoint`: use some subsets of DB instances or some purpose. Some instances can be configured as xlarge and some may be just large, etc.
- `Instance Endpoint`: Specific instance endpoint to troubleshoot/fine tune that instance.
- Note: RDS Proxy for Aurora is also available for read-only endpoints.
- `Performance Insights`: find issues by waits, SQL statements, hosts and users.
- `CloudWatch Metrics`: CPU, Memory, Swap Usage
- `Enhanced monitoring metrics`: At host level.
- Slow Query logs
- Automated DB instantiation and auto scaling.
- Proxy fleet
- Data API (no JDBC connection needed). Secure HTTPS endpoint to run SQL statements. Users must be granted permissions to Data API and Secrets manager.
| Aurora Global
|
| Primary-Region Secondary-Region (Up to 5)
| Cluster Cluster
| (Write Forwarding)
|
| 1 + 5 Regions
| 16 Read Replicas (Instances)
|
|
| RPO RTO < 1min
| <--------------><------------------>
| --------CheckPoint------Disaster----------Recovered----------
|
|
| Fail-Over within Region is automatic.
| Fail-Over cross-region requires manual selection of read-replica to promote.
|
|
1 Primary Region and 5 secondary (Read-only) regions with replication lag < 1 sec,
upto 16 Read replicas per secondary region.
Available for both MySQL and PostgreSQL
Need to specify on creation of Aurora instance itself. (e.g. Engine MySQL; Edition: Global, etc)
Recommended only for truly distributed application.
Promoting another region for DR has an RTO (Recovery Time Objective) < 1 minute.
You can manage RPO - Recovery Point Objective - Tolerance for data loss.
Aurora global databases provide managed RPO. For Aurora Postgres you can set RPO as low as 20 seconds. For Aurora Global, you can set RPO as low as 1 second.
Provides Write Forwarding from secondary DB clusters to Primary cluster. It reduces the number of endpoints to manage. Behaves like Active/Active though it is Active/Passive.
Switch over (aka Managed Planned failover) can be used to trigger switch on healthy instance.
You can also trigger "FailOver" to recover from unplanned failures. There may be some data loss.
even with possibly some data loss. There is RPO setting for Aurora Postgres.
. Backup Expires
. RDS --------> Backup-Vault -------> After Retention Period
.
. Manual (RDS Internal) +-----> Delete to remove
. RDS --------> Snapshot ----+ Import
. Snapshot +-----------> S3-Bucket ---------------> RDS/DB Instance
. S3-Export (Parquet Files) Glue ETL
. Custom Scripts
.
.
. RDS (Automated) Backup ===> (RDS-Instance, Backup-Time) (No separate backup Identifier)
.
. RDS (Manual) Snapshot ===> Snapshot-Identifier
.
Following notes are specific to (manual) `snapshots`:
Can not change encryption status during backup/snapshot :
.
. Backup/Snapshot
. RDS Instance ------------------> Encrypted Only
. (Encrypted)
. Backup/Snapshot
. RDS Instance ------------------> UnEncrypted Only
. (UnEncrypted)
.
# For Automated Backups.
aws rds modify-db-instance --db-instance-identifier mydbinstance --backup-retention-period 3
# For manual DB Snapshot
aws rds create-db-snapshot --db-instance-identifier database-mysql --db-snapshot-identifier mydbsnapshot
On the fly encryption during a Restore is supported :
.
. RDS Restore Encrypted
. Backup/Snapshot --------------------> RDS Instance
. (Unencrypted) Specify KMS Key
.
# From manual snapshot, use restore-db-instance-from-db-snapshot or restore-db-cluster-from-snapshot
aws rds restore-db-cluster-from-snapshot --db-cluster-identifier newdbcluster
--db-snapshot-identifier my-db-snapshot # manual snapshot
--kms-key-id xxx # Encrypt/Re-encrypt using this key.
# From automated backup
aws rds restore-db-cluster-to-point-in-time --source-db-cluster-identifier database-4 \
--db-cluster-identifier sample-cluster-clone \
--restore-type copy-on-write \
--use-latest-restorable-time
aws rds restore-db-instance-to-point-in-time --source-db-instance-automated-backups-arn "arn:*"
--target-db-instance-identifier my-new-db-instance \
--restore-time 2020-12-08T18:45:00.000Z [--use-latest-restorable-time]
Decryption of backup/snapshot possible only through native export/import :
.
. De-Encryption Only through mysqldump or native export:
.
. RDS Restore Encrypted Export Restore UnEncrypted
. Backup/Snapshot ---------> RDS -------> mysqldump -------> RDS Instance
. (Encrypted) Instance
.
.
Restoring from encrypted backup/snapshot to across regions involves KMS copy grant operation:
.
. RDS Backup Copy Grant Key
. Region-1 ------------------> Restore From Region-2
. (local KMS Key to Re-encrypt)
.
aws kms create-grant --key-id xxx --grantee-principal arn:*:role/keyUserRole --operations Decrypt
aws kms list-grants [--key-id xxx ]
# Execute following from destination region (us-east-1)
aws rds copy-db-snapshot \
--source-db-snapshot-identifier arn:aws:rds:us-west-2:123456789012:snapshot:mysql-instance1-snapshot-20161115 \
--target-db-snapshot-identifier mydbsnapshotcopy \
--kms-key-id my-us-east-1-key # Re-encrypts using new key in destination region.
aws rds restore-db-instance ... # Restore from the new copied snapshot
You can create on-the-fly encrypted snapshot from unencrypted snapshot by copying :
.
. Copy Snapshot
. Unencrypted-Snapshot ------------------> Encrypted Snapshot
. Encrypt KMS
.
# Possible for snapshots only. Note: You can not copy a backup.
aws rds copy-db-snapshot ... --kms-key-id arn:*:key/my-kms-key
You can not add encrypted Read Replica from unencrypted RDS instance and vice versa. :
.
. Source-RDS ---------> New Read-Replica (Encryption status should match)
. Encrypted ---------> Encrypted
. UnEncrypted ---------> UnEncrypted
.
| Restore
| Backup ------------> New RDS or Aurora
| (RDS/Aurora)
|
| MySQL On-Premises Upload Restore
| Backup --------> S3-Backup-File ----------> RDS/Aurora
|
| Restore
| RDS -----> Snapshot -----------> New Aurora DB
|
| Create New New Promote
| RDS -----------------------> Aurora ----------> New Aurora DB
| Aurora Read Replica Replica
| Clone
| Aurora-DB1 -------------------> cloned-DB
| Copy-on-write
Fast cloning and creation of new Aurora cluster supported. The original volume data is reused until write happens. Copy on write.
.
. Enable Logging using
. RDS MySQL ------------------------> File | Table | CloudWatch
. DB Parameter Group
.
aws rds describe-db-log-files --db-instance-identifier <your-instance-id>
aws rds download-db-log-file-portion ... --log-file-name <name> --output text > logfile.txt
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;
Use Cases: Product Recommendation. Fraud Detection. Ads targeting. sentiment analysis.
Amazon Aurora machine learning (ML) enables you to add ML-based predictions to your applications via the familiar SQL programming language.
It provides secure integration between Aurora DB and AWS ML services without having to build custom integrations or move data around.
.
.
.
. SQL Query
. Application ------------------------------------> Aurora ML
. Recommended Products?
.
. SageMaker AWS Comprehend
. (ML Modeling)
.
. DocumentDB Cluster == 1 Primary (Writes) + upto 15 Read Replicas.
.
With AWS Step Functions, you can create and run complex workflows based on state machine. Serverless Solution.
Max Duration: 1 Year (std workflow); 5 mins (express workflow)
Alternatives: Run Batch Job or Simple Lambda
.
. Invoke
. EventBridge Or ------> StepFunction --->Task1 -> Lambda--> Task3 --> ...
. CloudWatch Alarm+Lambda Workflow-in-Workflow (Parallel)
.
. Task = HTTP Call | Glue:StartJobRun | AWS SDK | AWS Batch Job | Athena
.
.
. Batch-Input OK.
.
|
| Max Unlimited (std) and 300 Msgs/sec; 3000 Msgs/sec with batching of 10 msgs for FIFO;
| Std or FIFO Max-Msg-Size:256KB
|
| ----> SQS -----> Trigger Lambda or
| 120K Max Inflight Msgs (std) Long Poll
| 20K Max Inflight Msgs (FIFO)
|
| DeadLetterQueue VisibilityTimer DelayQueue RetentionPeriod
|
| Delivery-Delay MaxReceive (For DLQ)
|
. Parameter Values Comments
.---------------------------------------------------------------------------------------------------------------------
. Queue Type Std/FIFO Standard Queue (unlimited Rate) or FIFO (limited Rate, exactly once)
.---------------------------------------------------------------------------------------------------------------------
. Visibility Timeout 0 to 12 hours Max Processing Time. Otherwise go back to queue. Consumer must delete.
.---------------------------------------------------------------------------------------------------------------------
. Message Retention
. Period 1 min - 14 days
.---------------------------------------------------------------------------------------------------------------------
. Max Msg Size 256KB
.---------------------------------------------------------------------------------------------------------------------
. Delivery Delay 0 to 15 mins Delay before becoming visible. DelaySeconds Attribute of Message.
.---------------------------------------------------------------------------------------------------------------------
. Receive Message
. Wait Time 0 to 20 secs For Long polling max time ReceiveMessage() will wait.
. Default can be configured at Queue level or at API call level.
.---------------------------------------------------------------------------------------------------------------------
. MaxReceive 10 or any Redrive Policy: Maximum times received before going to Dead-Letter-Queue
.---------------------------------------------------------------------------------------------------------------------
. Content-Based Deduplication For FIFO queues, enable content-based deduplication.
.---------------------------------------------------------------------------------------------------------------------
.
. +-------------------------------> Dead letter Queue
. | Too many tries
. |
. SQS Queue ---> Process -----------+---> Delete From Queue
. ^ |
. | Visibility Timeout |
. +---------------------------+
. Return to Queue
.
supports optional Dead Letter Queue. If consumer fails to process a message within Timeout, it goes back to queue MaxReceives threshold times, then goes to DLQ. You have to create dead letter queue (std or fifo) and associate with main queue during creation.
Typically set the retention to 14 days in DLQ.
It is possible to redrive msgs from DLQ to source queue. (using policies)
Messages should be idempotent. (could be consumed twice by consumer)
"Lambda Event Source Mapping" feature of Lambda allows triggering lambda on various conditions including when SQS queue fills up with N entries. This is useful for batch processing.
Lambda supports newer feature Lambda Destinations (for chain processing?) When event processing fails Lambda can either insert into DLQ or use this Lambda destination feature to send it to another Lambda or SNS etc.
Example pattern architecture for better decoupling and load balancing:
. SQS Request Queue
. Client <--> <--> Work Processor
. SQS Response Queue
Message timers can set delay for individual messages upto 15 mins.
Delay queues can delay the message delivery upto 15 mins. The delay parameter set on the queue.:
. Wait till message timer
. Message ----------------------------> Deliver (Individual message level wait)
.
. Wait On Delay Queue
. Message ----------------------------> Deliver after Delay queue wait
SNS allows you to create almost unlimited topics; The std topics does not guarantee FIFO order. If you need FIFO order processing, the throughput limits are much lower.
.
. Topic Subscription Notification Email SMS
.
. EventBridge
. SDK ---> SNS Topic ----> Lambda | SMS | Email | HTTPS | FireHose | SQS
. GCM | APNS (Mobile Platform Endpoints)
.
.
. Max Publish Rate: 30K messages/second; FIFO: 3K messages or 20MB /second/topic
. Max-Topics: 100K Max-FIFO: 1K per account Subscriptions: 12.5M
. Max FIFO-Subscriptions: 100 per topic
. Max SMS: 20 /second for Promotional
. Max Msg Size: 256KB
.
. SNS is a lambda trigger
.
Subscription Workflow:
. Create Subscription
.
. using Email | | Confirm Email click
. -------------------------------->| |---------------------> Subscription Pending -> Confirmed
. Using https Endpoint | | Receive { SubscribeURL: "..." }
. | | at the https endpoint. Visit URL
. | |
. Using Firehose | | (Same account firehose -- No confirmation required)
. (Provide ARN, service role) | Topic |
. | |
. Using SQS | | Msg Format: { ..., TopicArn: <arn>, Subject: "...",
. | | Message: ".txt.or.json.." }
. | |
. Using Lambda | | From Lambda console, SNS is called a trigger.
. (SNS console or Lambda console) | | From SNS console, Lambda is called a subscription.
. | | Auto confirmed.
. | |
. Resource in other Account | | Confirm
.
.
. Many WCU Auto
. Producers ---------> Kinesis DataStream -----------> Consumers
. (PartitionKey, data) Shards-based-on-PartitionKey Firehose
. (One Writer Endpoint) Can Replay Apps (KCL, SDK)
. (VPC Endpoint) Data Analytics
.
. Max 1MB/s in or Max 2MB/s out
. 1000 records/s 2000 records/s
. -----------------> Shard --------------> Default: 4 Shards / Stream
. Dynamic Auto Scaled.
.
. Shards Auto Scaling supported. Provision: Max 500 Shards/account
.
. 2 Shards may handle 10 different PartitionKeys!
. Used as source of truth for input stream events and persisted.
. Best use for Homogenous input records.
. Imagine Kinesis datastream is for a single topic Kafka.
.
. Shard ---Contains--- Partitions
. Record ---Includes--- Partition-Key and Sequence Number (with in the partition)
.
. Enhanced Fanout +--------> Consumer 1 (Dedicated)
. Kinesis DataStream ------------------> +--------> Consumer 2 (2MB/s Read Rate)
. +--------> Consumer 3 (Data Fanout)
.
. Note: Supports Multiple writers but single input stream endpoint only.
.
.
. +------------> Lambda --> Multi Destinations | SNS
. | (Transform Lambda)
. +--------------------+
. | [Parquet Convert] |
. Kinesis DataStream Records | [Lambda Transform] | Max 1 destination.
. Kinesis Agent ----------------> | FireHose |----> S3 | ElasticSearch |
. Cloudwatch Logs JSON | | | Custom HTTP | Redshift
. MSK Max One Stream +--------------------+ | Kinesis Data Analytics
. SDK, PutRecord ( No support for Reader Lambda )
.
. Max Delivery Streams: 5000 per account
. Max dynamic Partitions: 500
. Max Rate of Put: 2000 per second per stream
. Max Rate of Data 5 MB/sec per Stream
. Max Rate of Records 500K records/sec per Stream
.
.
. Multiple Sources -------------> FireHose -----------> Single Target
. (Subscribed/PutRecord) (32 MB Buffer, 10s Buffer Time)
.
.
. Records Dynamic Partitioning
. MSK (Kafka) -------------> FireHose ------------------------> S3
. (Partitions) (Using S3 Prefix)
.
.
. CloudWatch Log ---------------> FireHose Delivery Stream
. (Local/Remote) Subscription
.
.
. Note: There is Writer Endpoint to Write. But No Reader Endpoint to read.
. Target pre-configured and can not dynamically read output.
.
Mainly for routing data, does not persist.
Source could be applications, Kinesis DataStreams, SDK, Kinesis Agent, Client, Cloudwatch Logs and Events.
In order to get input, Firehose Delivery Stream "subscribes" into cloudwatch logs.
Firehose
There can be only one active source stream bound to Firehose:
. 1:N 1:1
. Kinesis DataStream ------------ Kinesis Firehose ----------> Output Target
.
. 1 Datastream to multiple Firehose stream is shared (unless enhanced fanout enabled)
.
. Firehose supports single target only. Use Lambda or KCL lib for multi-target fanout.
.
Can configure to write to destination, without writing code!
`Supported Destinations`:
- S3 (Most Common)
- Redshift
- ElasticSearch (For Realtime Visualization)
- Custom HTTP
- Third Party: MongoDB, NewRelic, Datadog
- Note: DynamoDB, RDS etc not supported as target.
Destination could be S3, Redshift, OpenSearch, Custom HTTP endpoint, 3rd Party Partner destinations such as MongoDB, NewRelic, Datadog, etc possible.
Data manipulation using Lambda possible.
Batchwrites support
Firehost accumulates records in buffer and flushes it on reaching maxsize or timeout:
Firehose latency time is high because of min buffering time required for some service integrations. e.g. S3 integration requires min 1 minute buffer time.
Note that there is no Lambda integration to invoke on each record as it will be too costly. But Lambda can transform data to be delivered to other destinations -- for this buffering at Firehose needs to be enabled. It allows Lambda to finish within 5 mins.
.
. Source SQL Lambda-Preprocess
. | | Flink Studio
. Kinesis DataStream V V
. ---------> KDA Application ------> Sinks (Firehose|Kafka|S3|Lambda, etc)
. Kinesis Firehose ^
. |
. S3 Reference Data
.
Use Cases:
Kinesis Analytics is Legacy service is now part of Kinesis Data Analytics. Apache Flink SQL support replaces it.
Input Stream could be Kinesis Datastream or Firehose.
Processes input Streams and optional Reference Table (may be in S3), Run SQL kind of query like below:
SELECT STREAM (ItemID, count(*) FROM SourceStream GROUP BY ItemID)
The output Stream destination could be FireHose (to S3 and such), or another DataStream.
Implementation is serverless; scales automatically.
Pay for resources consumed -- but it is not cheap.
Use SQL or Flink to write the computation.
Use Case: Real-Time Clickstream Analytics :
# First, create a Kinesis Data Stream to ingest clickstream data.
aws kinesis create-stream --stream-name ClickStreamData --shard-count 1
# Create a Sample Producer to Send Click Data. Python Script.
import boto3
....
kinesis_client = boto3.client('kinesis')
while True:
...
# Send data to Kinesis
kinesis_client.put_record(StreamName='ClickStreamData',
Data=json.dumps(click_data),
PartitionKey='partitionkey'
)
....
sleep(1)
$ python send_click_data.py
# Create a Kinesis Data Analytics Application
aws kinesis-data-analytics create-application \
--application-name ClickStreamAnalytics \
--inputs '[{
"namePrefix": "clickstream",
"kinesisStreamsInput": {
"resourceArn": "arn:aws:kinesis:REGION:ACCOUNT_ID:stream/ClickStreamData",
"roleArn": "arn:..."
},
"inputSchema": { .... } " # { userId : <userId>, action: <action> }
}]'
# Define the SQL Query. Send Analytics output to FireHose Stream.
aws kinesis-data-analytics add-application-output \
--application-name ClickStreamAnalytics \
--output-configuration '{
"outputId": "clickstreamOutput",
"kinesisFirehoseOutput": {
"resourceArn": "arn:.../YourFirehoseDeliveryStream",
"roleArn": "arn:.../KinesisAnalyticsRole"
},
"sql": "SELECT userId, COUNT(action) as actionCount FROM clickstream GROUP BY userId"
}'
# Start the Application
aws kinesis-data-analytics start-application --application-name ClickStreamAnalytics
.
. MSK Apache Flink
. broker1 Glue ETL
. Producer ------------------> broker2 ------------------> Lambda
. Write to Topic broker3 Poll From Topic Applications
. [Partition Aware]
. [Can Reset Seek Point]
. Zoo Keeper
. Partitions Replication-Factor: 3 (2-4)
.
. Total Partitions < 2x to 3x total Brokers since concurrency is limited to brokers.
. Total Partitions ~= Max(total_producers, total_consumers)
. Partitions represents IO parallelism. Even single topic is spread over partitions.
. Total concurrent consumers is the primary factor.
.
.
. Kafka Broker = Kafka Server = Kafka Node.
. Kafka Cluster = Set of Kafka Nodes + One Zoo Keeper
.
. Kafka (Leader) Broker ----1:N------ Partitions
.
. Kafka Topic ----1:N------ Partitions (Total partitions vary by topic. You choose per Topic)
.
. 2 <= Topic Partition Replicatition Factor <= 4;
. Every broker owns some Partitions and replicate some.
.
| DataStreams | MSK
|----------------------------------------|----------------------------------------
| 1 MB msg size limit | 1 MB default, but upto 10 MB.
| Scales with Shards | Scales with Topics with Partitions
| Can do shard splitting & Merging | Add partitions to a topic
| TLS in-flight encryption | In-flight Encryption is optional
| Storage upto 1 year | Unlimited time. Leave in EBS.
- Kinesis Data Analytics (Managed Apache Flink - Streaming and Analytics)
- AWS Glue. Streaming ETL Jobs.
- Glue Streaming is Managed Apache Spark Streaming (Micro Batching and Spark RDDs).
- Lambda
- Any application on EC2 or ECS or EKS
Commands:
# Create the MSK Cluster
aws kafka create-cluster \
--cluster-name MyKafkaCluster \
--broker-node-group-info '{
"instanceType": "kafka.m5.large",
"clientSubnets": ["YOUR_SUBNET_ID_1", "YOUR_SUBNET_ID_2"],
"securityGroups": ["YOUR_SECURITY_GROUP_ID"],
"storageInfo": {
"ebsStorageInfo": {
"volumeSize": 100
}
}
}' \
--kafka-version "2.8.1" \
--number-of-broker-nodes 2
aws kafka describe-cluster --cluster-arn YOUR_CLUSTER_ARN
aws kafka create-topic --cluster-arn YOUR_CLUSTER_ARN --topic-name MyTopic
--partitions 1 --replication-factor 2
# You can use the kafka-console-producer command to send messages to the Kafka topic.
# First, install Kafka tools on your machine (or use a Docker container).
aws kafka get-bootstrap-brokers --cluster-arn YOUR_CLUSTER_ARN
kafka-console-producer --broker-list YOUR_BROKER_ENDPOINT --topic MyTopic --property "parse.key=true"
--property "key.separator=:"
# You can now type messages, and they will be sent to the MyTopic topic. For example:
key1: Hello, Kafka!
key2: Another message.
# Create an AWS Lambda Function to read.
import json
from kafka import KafkaConsumer
def lambda_handler(event, context):
# Replace with your bootstrap server and topic
consumer = KafkaConsumer(
'MyTopic',
bootstrap_servers='YOUR_BROKER_ENDPOINT',
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='lambda-consumer-group'
)
for message in consumer:
print(f"Received message: {message.value.decode('utf-8')}")
return {
'statusCode': 200,
'body': json.dumps('Messages processed successfully!')
}
# Set Up Event Source Mapping
aws lambda create-event-source-mapping \
--function-name MyKafkaLambda \
--event-source-arn YOUR_TOPIC_ARN \
--starting-position LATEST \
--batch-size 100
.
. Run Batch Job using Docker Container.
. Cheaper since resources are released as soon as job is done.
.
. Batch Job ------> Fargate (Serverless) OR
. ECS OR
. EC2
.
. Fully Managed (Almost Serverless)
.
. Prioritized Job Queues ; Job Dependencies;
.
. Computing Environment -- Min and Max CPUs. Managed or Unmanaged.
.
Run batch jobs as Docker Images
You can use Fargate (managed) or ECS or EC2 or your own computing environment (unmanaged).
Options:
- Run on AWS Fargate (serverless)
- Dynamic provisioning of EC2 & spot instances in your VPC
- Run on your own EC2s
Computing Environment abstracts limited resources that you already have (EC2 instances) or that can be created on demand. Say, I can create atmost "10 EC2 instances" on demand and I have 100 Jobs to run. How will I run it ? AWS Batch helps with it using Job Queues and scheduling.
You can schedule using Amazon EventBridge
Orchestrate batch jobs using AWS Step functions.
If you have to invoke a batch job in response to S3 upload, you have two options:
- Trigger Lambda on S3 upload, and lambda invokes AWS Batch job. (Bit messy)
- Send S3 upload event to EventBridge, configure this to invoke AWS Batch Job (easier)
If you launch within VPC private subnet, make sure it has access to ECS Service. ie. Use NAT gateway or VPC endpoint for ECS.
You can also invoke job on your Own preconfigured running EC2.
SDK Application can enque your job into "AWS Batch Job Queue".
In multi-node mode, job may invoke multiple EC2/ECS instances at same time! :
Use Cases: HPC, Machine Learning, ETL, Media Processing, etc.
Each job queue has a priority number attached. Higher number, higher priority.
Array jobs mechanism can be used to start identical jobs in parallel -- Each job inherits AWS_BATCH_JOB_ARRAY_INDEX environment variable that it can use to consume different inputs.
There are mechanisms like Job Dependency, Compute Environment Max vCPUs, Fair Share Scheduling that can be used to limit concurrency of batch jobs.
.
. EMR - Elastic Map Reduce
.
. Hadoop-Clusters Apache-Spark HBase Presto Flink
.
. Apache Hive (For SQL and meta-data)
.
. Master-Node ---- Core-Node (Run Tasks, Store Data)
. Task-Node (Run Temp Tasks using Spot instance)
.
.
. S3 (EMRFS) (Data lake)
. |
. EMR-Cluster ---- Hive (HiveQL) / Presto SQL (ANSI)
. |
. Spark-Jobs
. ML
.
. EMR on EKS:
. EKS ------- (Embedded Spark-Job No-EMR-Cluster)
.
- Provision EC2 instance and run CRON jobs.
- Amazon Event Bridge + Lambda (cron)
- Reactive Workflow: On EventBridge, S3, API Gateway, SQS, SNS, run lambda
- AWS Batch Job using Docker Image or scripts. (Triggered by EventBridge Schedule)
- AppRunner to Run docker Image in fargate. (Triggered by EventBridge Schedule)
- Use EMR for SPARK jobs.
.
. Glue --- Serverless Data Integration Service.
.
. Crawler - Crawl And Create Data Catalog
. Data Catalog - Hive Catalog Compatible
. ETL - Run Glue (ETL) Job in Spark environment.
. Streaming - Managed Spark Streaming
. DataBrew - Visual Data Preparation Tool to clean data.
. Studio - Create and Run Glue (ETL) jobs using Notebook
.
.
. +--------------+ Used By EMR
. S3/RDS/JDBC ----> Glue --->| Glue |------------> Athena
. DynamoDB Crawler | Data Catalog | Spectrum
. +--------------+ Glue ETL
.
.
. Extract Load
. S3 / RDS ---------> Glue ETL ----------> RedShift
. (Transform)
. (Batch Oriented)
.
You can use an AWS Glue crawler to populate the AWS Glue Data Catalog with databases and tables.
A visual data preparation tool to clean and normalize data without writing any code.
# Create glue job. Same commands for Glue ETL Job or Glue Streaming Job.
aws glue create-job --name my-glue-job --role my-glue-role
--command '{ ... "ScriptLocation": "s3://my-bucket/scripts/my_script.py", ... }'
--max-capacity 2.0
# Start a Glue Job
aws glue start-job-run --job-name my-glue-job \
--arguments '{"--input_path": "s3://...", "--output_path": "s3://..."}'
aws glue list-jobs
# To list all job runs (past executions) for a Glue job, use the get-job-runs command:
aws glue get-job-runs --job-name my-glue-job
aws glue delete-job --job-name my-glue-job
# Create a Glue Crawler
aws glue create-crawler --name my-glue-crawler --role my-glue-role \
--database-name my-glue-database \
--targets '{"S3Targets": [{"Path": "s3://my-bucket/data/"}]}' \
--table-prefix my_table_prefix_
# Start crawler.
aws glue start-crawler --name my-glue-crawler
aws glue list-crawlers
aws glue get-crawler --name my-glue-crawler --query 'Crawler.State'
# Listing All Tables Created by a Crawler
aws glue get-tables --database-name my-glue-database
# To update crawler to scan new S3 bucket ...
aws glue update-crawler --name my-glue-crawler \
--targets '{"S3Targets": [{"Path": "s3://new-bucket/data/"}]}' ...
Example Python Glue ETL Job Script:
#
# SparkContext: Automatically created when the job runs.
# GlueContext: Provides additional Glue-specific ETL functions.
# DynamicFrame: Glue’s custom data structure allowing for schema inference and flexibility.
#
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
# Initialize Spark and Glue contexts
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
# Example data source and target
datasource = glueContext.create_dynamic_frame.from_catalog(
database="my_database",
table_name="my_table"
)
# Transformation example
transformed_df = datasource.apply_mapping(
[("column1", "string", "new_column1", "string")]
)
# Save the transformed data back to an S3 bucket
sink = glueContext.getSink(
path="s3://my-target-bucket/transformed-data/",
connection_type="s3"
)
sink.write_dynamic_frame(transformed_df)
- Snapshots are point-in-time backups of a cluster, stored in S3.
- snapshots are incremental. Restore into new cluster.
- Automated, every 8 hours, every 5 GB or on a schedule. Set Retention.
- Can configure to copy snapshots of a cluster to another region. You need to enable "snapshot copy grant" for destination region to use appropriate destination KMS key while copying snapshot.
- It is common to enable automatic copy to another destination region. You can also make a copy from automatic snapshot to create manual snapshot.
- To copy snapshots for AWS KMS–encrypted clusters to another AWS Region,
- create a grant for Amazon Redshift to use a customer managed key in the destination AWS Region.
- Then choose that grant when you enable copying of snapshots in the source AWS Region
Commands:
aws redshift enable-snapshot-copy
--region us-east-1
--cluster-identifier cc-web-data-cluster
--destination-region us-west-1
--retention-period 7
--manual-snapshot-retention-period 14
aws redshift create-cluster-snapshot --cluster-identifier mycluster --snapshot-identifier my-snapshot-id
# Prepare auto snapshot copy across regions when encryption enabled ...
# Execute the following in destination region:
aws redshift create-snapshot-copy-grant --snapshot-copy-grant-name my_copy_grant
# Enable auto snapshot copy from source to another region. Execute the following in source region:
aws redshift enable-snapshot-copy --cluster-identifier mycluster --destination-region us-west-1
--snapshot-copy-grant-name my_copy_grant
.
. SQL
. Client -------> Redshift-Cluster ----> Redshift-Spectrum <------ S3
.
- Query S3 data along with Relational tables.
- `Redshift Cluster must be running` to use Spectrum.
- Serverless and auto allocated resources. Bigger your redshift, bigger allocation.
Pricing is only based on data size scanned (e.g. $5 for terrabyte data)
- Query looks like: Select * from S3.ext_table ...
- Existing redshift processes the query using Redshift Spectrum nodes.
- Store S3 objects in Apache Parquet format for better columnar performance.
- You need to associate IAM role with Redshift cluster to access S3 files.
- You need to create external schema to specify external table on S3 location.
In addition you need data catalog which can be Hive Catalog in EMR or Athena (simpler
and can be managed by Redshift).
- To prevent short-running queries from getting stuck behind long-running queries
- Define multiple query queues, route queries to appropriate queues.
- Internally, there are superuser queue, short-running queue, long-running queue
- Automatic WLM queues and resources managed by Redshift.
- Manual WLM - queues managed by user.
- When enabled, adds automatic additional cluster capacity (i.e. Concurrency-scaling cluster)
- uses WLM (workload management feature) to decide which queries sent to additional cluster
- Charged per second.
Fully managed Graph, Analytics, Serverless Database.
|
| HA ; 3 AZ; 15 Replicas. Scales to billions of relationships.
|
| R1 R2
| A ----> B <------ C
|
|
.
. Keyspaces === Managed Apache Cassandra
.
.
. Serverless HA 1000RPS 3x Replication
.
. Peer-to-Peer cluster Write-Intensive
.
. Netflix (For Logging), Instagram
.
.
.
. S3 with Glue Catalog ODBC
. S3 with Hive Metastore ----> Athena ---------> QuitkSight
. CloudWatch Logs [Glue Crawler] JDBC SQL Editor Results
.
.
AWS Lake Formation is a service that is an authorization layer that provides fine-grained access control to resources in the AWS Glue Data Catalog.
. AWS Lake Formation
.
. Source Crawlers
. Data Catalog (Glue) RedShift ----> QuickSight
. Source ingest Security Settings ---> Athena
. S3 RDS --------> ETL Data Prepare EMR (Hadoop and Spark)
. On-Premise (Import by Blueprint) Apache Spark
. |
. |
. V
. DataLake (Stored in S3)
.
.
. Row-Level-Security
.
.
. DataSources: S3 (With Manifest file), Athena (coupled with Glue Catalog), RDS, JDBC, ...
.
. SPICE Engine
.
. Column-Level-Security
.
.
. EMR (Hadoop/Presto/Spark/Hive ... )
.
. S3 -----> Redshift/Spectrum --> Quicksight
.
. Amazon Athena (serverless)
.
.
. IOT Devices ------> Kinesis --> Firehose --> S3
. DataStreams ETL Lambda
.
. Athena -----> S3 Reporting Bucket --> Quicksight
. (Periodic or --> Redshift
. on ingestion event)
.
Realtime monitoring and centralized log management.
.
. Log-Groups Subscriptions (Get Log Events)
. Alarms (Send notification on Metrics threshold breach)
. Metrics (CPUUtilization from AWS/EC2, etc standard metrics available + Custom metrics)
.
. CloudWatch Events = Events Bridge: Events Rules are used for Event Driven Automation.
.
. Insights Dashboard
.
. Notification
. Alarm ----------------------> SNS --> Lambda
. EventBridge
.
. EventsBridge Events ---------> lambda | Almost all Services
.
.
. Metric Dimension Statistics
.
. Alarms Custom-Metrics
.
. Subscriptions Streaming Retention-Time Detailed-Monitoring (1min)
.
. Unified-Cloudwatch-Agent: (For Memory, etc additional-Metrics)
.
. Synthetic-Canary KMS-Encryption-at-rest
.
Service Dimensions
Metrics Description EC2
- InstanceId
- AutoScalingGroup
- CPUUtilization
- NetworkIn/Out
- DiskReadOps/Write
- Average CPU utilization of EC2 instance.
- Bytes of incoming/outgoing network traffic.
- Number of disk read/write operations.
ECS * ClusterName, * ServiceName * CPUUtilization * MemoryUtilization * Across ECS per task per service. S3 * BucketName * StorageType * BucketSizeBytes * NumberOfObjects * Total bytes stored. DynamoDB
- TableName
- GlobalSecondaryIndex
- ConsumedReadCapacityUnits
- (Also Write)
- Read capacity units used.
API Gateway
- ApiName
- Stage
- Method
- Count
- 4XXError, 5XXError
- Latency
- Total number of API requests.
- Number of such errors (4XX/5XX).
- Average response time.
Lambda
- FunctionName
- Resource
- (Version,Alias)
- Invocations
- Duration
- Errors
- Total invocations.
- Average execution time of the Lambda function.
- Number of invocation errors.
RDS
- DBInstanceIdentifier
- DBClusterIdentifier
- DatabaseConnections
- FreeStorageSpace
- ReadIOPS, WriteIOPS
- Number of active database connections.
- Available storage space in the instance.
- Disk I/O operations per second.
ELB
- LoadBalancerName,
- AvailabilityZone
- RequestCount
- HealthyHostCount
- UnHealthyHostCount
- Latency
- Total load balancer requests.
- Number of healthy targets.
- Number of Unhealthy targets
- Average response time of requests.
Redshift
- ClusterIdentifier
- NodeID
- CPUUtilization
- DatabaseConnections
- ReadIOPS, WriteIOPS
- CPU utilization of the Redshift cluster.
- Active database connections.
- Disk read/write operations per second.
Kinesis
- StreamName, ShardId
- IncomingBytes
- IncomingRecords
- ReadProvisioned-
- ThroughputExceeded
- Volume of data ingested into the Kinesis stream.
- Total Incoming Records.
- Read operations exceeding provisioned throughput.
CloudFront
- DistributionId
- Region
- Requests
- BytesDownloaded
- BytesUploaded
- 4xxErrorRate,
- 5xxErrorRate
- Total number of requests served by CloudFront.
- Data transferred through CloudFront.
- (Downloaded/Uploaded)
- Rate of client/server errors in CloudFront.
EBS
- VolumeId
- VolumeReadBytes
- VolumeReadOps
- (Also Write)
- BurstBalance
- Data read in bytes on the volume.
- Number of read operations on the volume.
- (Also Write Bytes, WriteOps)
- Remaining burst credits for burstable volumes.
.
. Max Rate Of Logging: 5000 TPS per account per region. (Quota can be increased)
. Max Log event Size: 256 KB. (fixed)
.
. Log-Group Metric-Filter Custom-Metrics Rules Alarm
.
Sources:
Format: Log Groups: usually represents application; Log stream: instances within app
Log expiration policies: never expire, 30 days, etc
Optional KMS encryption
Can send logs to S3 (exports), Kinesis Data Streams, Firehose, Lambda, Elastic Search Log data can take upto 12 hours, so it is not realtime!
For realtime analysis of logs, use Logs Subscriptions
Logs can use filter expressions to generate Alarm, for example. e.g. ERROR keyword in logs can generate alarm.
Cloudwatch Logs Insights can be used to query logs and add queries to dashboard
Cloudwatch Logs Subscriptions: Allows (custom) Lambda function using subscription filter. This is realtime. It can send output to Elastic Search etc. :
.
. Lambda (custom) Realtime
. Logs --> Subscription Filter ---> Firehose (Near Realtime) --> Elastic Search
. Write to S3
.
Note: Lamda is for realtime and Firehose near realtime (1 minute or more).
For logs aggregation of multi-account, multi-region, you can define single Kinesis data streams and use subscription filters in all accounts to send logs to that single Data Stream.
By default, you don't get memory usage metrics from EC2. If you install (Unified) Cloudwatch Agent on EC2 and On-premises machines, then you get those metrics as well.
You can install cloudwatch agent using SSM (AWS Systems Manager) run command. Note: SSM agent is available by default on EC2 instance.
See cloudwatch agent source code:
https://github.com/aws/amazon-cloudwatch-agent/
https://github.com/docker/buildx/ #building-multi-platform-images
.
. On-Premise Send Metric-Filter Alarm
. Cloudwatch Agent -------> Log-Group --------------> Custom Metrics -------> SNS
. Logs Logs
.
aws logs put-metric-filter --log-group-name my-log-group --filter-name ErrorCountMetricFilter --filter-pattern "ERROR" \
--metric-transformations metricName=ErrorCount,metricNamespace=MyApplication/Metrics,metricValue=1
aws logs describe-metric-filters --log-group-name my-log-group
aws cloudwatch put-metric-alarm --alarm-name "HighErrorCountAlarm" --metric-name "ErrorCount"
--namespace "MyApplication/Metrics" --statistic "Sum" --period 300 --threshold 10
--comparison-operator "GreaterThanOrEqualToThreshold" --evaluation-periods 1
--alarm-actions arn:...::MySNSTopic --actions-enabled
Prometheus is an open-source monitoring and alerting toolkit for cloud-native environments.
Prometheus is known for its powerful querying language (PromQL)
Time-Series Database: Prometheus stores all data as time-series.
Service Discovery: Prometheus can automatically discover targets based on labels, has tight integration with Kubernetes.
Amazon Managed Service for Prometheus (AMP) is available. AMP integrates with AWS services like Amazon CloudWatch for alerts, IAM for access control, and Grafana for dashboards.
CloudWatch Agent can push metrics to Prometheus.
Use specialized metric exporters like RDS exporter, S3 exporter, etc if needed.
For kubernetes integration:
helm install prometheus prometheus-community/prometheus
# Configure Prometheus with Kubernetes service discovery
# Scrapes metrics from all pods labeled with Prometheus metrics.
Integrate with Grafana (optionally, Amazon Managed Grafana) for visualizations
.
. CloudMap Route-53 ECS-Built-in Kubernetes-Built-In
.
. AppMesh (Uses Envoy Proxy for Service-Service communication)
.
Fully managed resource (such as microservices) discovery service.
.
. AWS CloudMap -- Microservices and others lookup by name. - Resource Discovery Service.
.
. Health Checks: Look up service and also locate healthy one.
.
. ECS and Fargate etc - Enable Service Discovery ===> Uses AWS CloudMap
.
. EKS can publish external IPs to CloudMap.
.
. Use Custom names for your application resources and endpoints.
.
. Resources Examples: DynamoDB, SQS, RDS, etc
.
.
. SSM
.
. Patch Manager
. State Manager
. Session Manager
. etc.
.
AWS Systems Manager is the operations hub for your AWS applications and resources. Secure end-to-end management solution for hybrid and multicloud environments.
It provides following broad capabilities :
------------------------------------------------------------------------------------------------------------
Application management - Group some resources together and name it as an application.
Dynamic property lookup. If version = AppConfigLookup('version') etc.
Deployment validators and UI as well. A/B Testing.
------------------------------------------------------------------------------------------------------------
Change management Includes Change Manager (Request/Approve changes in Application), Automation,
Change Calendar, Maintenance Windows.
------------------------------------------------------------------------------------------------------------
Node management Includes Patch Manager, Fleet Manager, Session Manager, State Manager, Compliance,
Run Command, Distributor, Inventory.
------------------------------------------------------------------------------------------------------------
Operations management Incident Manager, Explorer, OpsCenter
------------------------------------------------------------------------------------------------------------
Quick Setup Manage service configurations and deploy it in single or multiple accounts.
e.g. Setup default host management configuration (Create necessary EC2 roles, etc)
Enable periodic updates of SSM and cloudwatch agents, etc.
e.g. Create an AWS Config configuration recorder.
e.g. Patch Manager configuration in quick setup is called "Patch Policy".
------------------------------------------------------------------------------------------------------------
Shared resources Create SSM Document to share across organization. 100 Predefined.
SSM Document could be classified as:
Automation Runbook, CloudFormation Template, Command Document, App Configuration,
AWS Config Conformance Pack, Change Calendar Document, Package Doc(Distributor)
------------------------------------------------------------------------------------------------------------
It provides different features such as:
------------------------------------------------------------------------------------------------------
Node Management:
------------------------------------------------------------------------------------------------------
Patch Manager # Bulk Patch Management. Patch baseline.
Fleet Manager # View and Manage group of EC2 (and on-premise) nodes in single UI.
Session Manager # Run ssh shell
Run Command # Run SSM Command Document on selected nodes. e.g. "AWS-RunShellScript"
State Manager # Associate SSM documents with selected nodes to run once or periodically.
# Associations e.g. Patch Commands or Collect Inventory (Meta data) etc.
Distributor # Create zip as installable package. A package is a kind of SSM Document.
# Use State manager to run on schedule or use SSM Run command to install once.
Inventory # Collect Inventory meta data into S3 file (periodically).
# Integrates with Compliance reports, AWS Config, State Manager etc.
# Uses AWS:GatherSoftwareInventory SSM (Policy) document.
Compliance # View mainly Patching compliance (as per Patch Manager)
# View also State Manager association compliance.
------------------------------------------------------------------------------------------------------
Operations Management
------------------------------------------------------------------------------------------------------
OpsCenter # Track Ops Issues called OpsItems. Can use Automation Runbooks to solve issues.
# OpsItems can be auto-created from cloud watch alarms.
# Event bridge Rule can create OpsItem on Security Hub alert issued.
Explorer # Configure source for OpsData. e.g. Security Hub, Trusted Advisor, regions, etc.
# View consolidated reports of OpsItems from different sources.
------------------------------------------------------------------------------------------------------
Applications Management
------------------------------------------------------------------------------------------------------
Application Manager # Your application as logical group of resources. UI View.
# Provides AppConfig and Parameter Store.
# A/B Testing. Dynamic Props Lookup. Deployment and Validators.
App Config # Application related dynamic properties. e.g. AppConfig Lookup('enable_debug')
# A/B Testing. Consolidated UI View.
Parameter Store # Application parameters and secure strings.
# Note: Secrets Manager (vs PM) supports versions and auto rotation.
------------------------------------------------------------------------------------------------------
Change Management
------------------------------------------------------------------------------------------------------
Change Manager # Advanced Change Request Management with Approvals.
SSM Automation # Run Automation Runbook using Automation Service.
# Runbook includes Tasks aka Actions.
# Also supports: aws:executeScript action for Python or shell scripts.
# Common simple or complicated bulk IT Tasks across accounts.
Change Calendar # Restrict actions that can be performed during specific time interval.
# e.g. Do not run some automation runbooks on business hours, etc.
Maintenance Windows # Maintenance window has a schedule, registered targets and registered tasks.
------------------------------------------------------------------------------------------------------
Shared Resources
------------------------------------------------------------------------------------------------------
SSM Document # SSM Document represents actions or configurations or template or such.
# More than 100 predefined documents to share across organization.
# SSM Document could be classified as:
# Automation Runbook, CloudFormation Template, Command Document,
# App Configuration, AWS Config Conformance Pack,
# Change Calendar Document, Package Doc(Distributor)
------------------------------------------------------------------------------------------------------
SSM Document is used to specify actions or configurations or template or such. These are the categories of SSM Document:
------------------------------------------------------------------------------------------------------
Category Examples
------------------------------------------------------------------------------------------------------
Command Document AWS-RunPatchBaseline, AWS-ConfigureAWSPackage, AWS-RunShellScript
AWS-InstallApplication, AWSSSO-CreateSSOUser, AWSFleetManager-CreateUser
Automation Runbook AWS-CreateImage, AWS-CreateSnapShot, AWS-ECSRunTask,
AWSConfigRemediation-DeleteIAMUser, AWSDocs-Configure-SSL-TLS-AL2,
AWSSupport-ExecuteEC2Rescue, AWSSupport-CollectECSInstanceLogs,
AWSSupport-ResetAccess, AWSSupport-TroubleshootEC2InstanceConnect
Change Calendar Define a schedule to restrict actions. No predefined AWS document.
Application Config Application related dynamic properties. No predefined AWS documents.
Cloudformation Cloudformation template that creates or updates resources.
"AWSQuickStarts-AWS-VPC,
"AWSQuickSetupType-SSMChangeMgr-CFN-DA,
"AWSQuickSetupType-SSMHostMgmt-CFN-TA, etc.
AWS Config AWS Config managed rules and remediations.
Conformance Pack AWSConformancePacks-OperationalBestPracticesForNIST800181
AWSConformancePacks-OperationalBestPracticesforAIandML
AWSConformancePacks-OperationalBestPracticesforAPIGateway
Package Document Document used to define package. Used by SSM distributor.
AWSCodeDeployAgent, AWSEC2Launch-Agent, AWSNVMe,
AWSSupport-EC2Rescue, (Note: EC2Rescue is pakcage. ExecuteEC2Rescue is runbook)
AmazonCloudWatchAgent
Policy Document AWS-GatherSoftwareInventory Policy document used by Inventory and State Manager
to track resources and associations (for desired state of patch compliance).
Session Document For use with SSH session from Session Manager.
AWS-PasswordReset, AWS-StartSSHSession, etc.
Systems Manager Run command can run command across multiple instances. No need for SSH. Results in the console. For example:
aws ssm send-command \
--document-name document-name \
--targets Key=tag:tag-name,Values=tag-value \
[...] # Run command on all EC2 instances with specific tag.
# Reboot EC2
aws ssm start-automation-execution --document-name "AWS-RestartEC2Instance" --parameters "InstanceId=i-xxx"
# Dedicated account for SSM Admin. (Similarly for security, you can use an Account, etc.)
# The management account for organization always has super power.
aws organizations register-delegated-administrator --account-id <delegated-admin-account-ID> \
--service-principal ssm.amazonaws.com
#
Set of rules for auto-approving what patches need to be applied. An example rules:
All "CriticalUpdates" and "SecurityUpdates" released until 1 week back.
Specific Patch to be applied or skipped. (whitelist or blacklist)
For Windows rule1 and For Mac Rule2
Default patch baseline is the default set of rules to apply.
The patch operations are: Scan and Scan and Install Following 4 methods are available:
SSM document can be runbook or cloudformation template or a command document etc.
The SSM Command document can be used for patching OS and applications. Uses the default patch baseline if no patchgroup specified.
e.g. You can use the document AWS-RunPatchBaseline to apply patches for both OS and Applications. On Windows only Microsoft Applications are supported for patching.
There are only 5 recommended SSM command documents for patch management:
AWS-RunPatchBaseline # For all OS and Apps
AWS-ConfigureWindowsUpdate
AWS-InstallWindowsUpdates
AWS-RunPatchBaselineAssociation # Often used to Scan only. Can select Baseline on Tags.
AWS-RunPatchBaselineWithHooks # Supports pre-Install, post-Install, post-Reboot hooks!
You can create custom SSM command documents (JSON File) for your own operations which looks like:
{
...
action: "aws:runDocument", // or aws:runShellScript, etc.
...
documentType: "LocalPath", // Or SSMDocument for composite document!
documentPath: "bootstrap" //
}
- Allows you to start secure shell on your EC2 or on-premises servers.
- Access through AWS Console, AWS CLI, or Session Manager SDK
- Does not need SSH access already setup.
- Every command is logged. Better for security and tracing.
- Manage OpsItems - issues, events and alerts
- Provides Automation Runbooks that you can use to resolve issues.
- EventBridge or CloudWatch Alarms can create OpsItems
- Aggregates information about AWS Config changes, Cloudwatch Alarms, etc.
- Reduces mean time to resolution.
- Can be integrated with JIRA, ServiceNow
See https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-hybrid-multicloud.html
Follow these steps:
Create a new service role which uses AmazonSSMManagedInstanceCore policy. Add optional other policies if you need additional permissions. Select Trusted Entity as Systems Manager. :
aws iam create-role --role-name SSMServiceRole
--assume-role-policy-document file://SSMService-Trust.json
aws iam attach-role-policy --role-name SSMServiceRole
--policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Create and apply a hybrid activation. Following command should give you activation code and ID. You need this later to install and activate SSG agent on the on-premise node.
aws ssm create-activation \
--default-instance-name MyWebServers \
--description "Activation for Finance department webservers" \
--iam-role service-role/SSMServiceRole --region us-east-2 \
--tags "Key=Department,Value=Finance"
Run following at on-premise node:
mkdir /tmp/ssm
curl https://amazon-ssm-region.s3.region.amazonaws.com/latest/debian_amd64/ssm-setup-cli
-o /tmp/ssm/ssm-setup-cli
sudo chmod +x /tmp/ssm/ssm-setup-cli
sudo /tmp/ssm/ssm-setup-cli -register -activation-code "activation-code"
-activation-id "activation-id" -region "region"
For classifying your expenses you have 2 ways:
Resource Groups could be created using Tags. You can also create static Resource Groups without using tags (by specifying certain EC2 resources, etc).
Management account owners can activate the AWS-generated tags in the Billing and Cost Management console. When a management account owner activates the tag, it's also activated for all member accounts. This tag is visible only in the Billing and Cost Management console and reports. e.g. aws:createdBy
User tags can be defined by user and starts with prefix "user:"
Example Tags:
aws:createdBy = Root:123456789
user:Cost Center = 56789
user:Stack = Test | Production
user:Owner = DbAdmin
user:app = myPortal1
Total cost report can group by desired Tags. e.g. Per owner and CostCenter, etc.
You can see "Your Cost Explorer trends", monthly costs, Chart style
You can choose filters such as Service, Region, Tag, Instance type, etc.
You can get various granular data:
Resource-level data at daily granularity;
Cost and usage data for all AWS services at hourly;
EC2-Instaces resource level data by hourly.
You can have an IAM or SCP policy to enforce Tags on creation of resources.
An example is given below:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "*",
"Condition": {
/* Multi context condition. Usually there is only one condition */
"ForAllValues:StringEquals": {
"aws:TagKeys": ["Department", "Project"]
},
/* There is implied AND here. If you need OR, then add another statement element! */
"Null": {
"aws:RequestTag/Department": "false", /* Key exists */
},
"Null": {
"aws:RequestTag/Project": "false" /* Key exists */
}
}
}
]
}
Some of the common operators are:
While creating EC2 instance, you will have option to select a launch type. The billing will be based on that.
. Launch Type : Comments
........................................................................................
. On-Demand : Short Workload
. Spot-Instances : Cheap, not reliable. Up to 90% Savings
. Reserved Instances : Reserve for 1-3 Years. Up to 72% savings.
. Dedicated Instances: May share hardware only with same accounts resources.
. On reboot hardware may change.
. Dedicated Hosts : Dedicated physical server. On host affinity, reboots to same.
. Available only large config. On-demand price.
.
- Reserve On-Demand instances capacity on specific AZ for any duration.
- No discounts. Just on-demand rate. But combined with Savings Plan you gain some discount.
- Use Case: Short-term uninterrupted workloads that need to be in specific AZ.
.
. Create +-----------> Run Instances (Using the PG)
. Placement Group --+
. +---> Tie with ASG ---> Tie with TargetGroup/ELB --> Tie with ECS
.
.
Use Case: Create multiple EC2 instances in same AZ with lowest latency for cluster workload. :
aws ec2 create-placement-group --group-name HDFS-GROUP-A --strategy partition
--partition-count 3
aws ec2 run-instances --placement "GroupName = HDFS-GROUP-A, PartitionNumber = 1" --count 100
Partition Strategy:
- "cluster": Packs instances close together with lowest latency in same AZ for HPC apps.
- "partition" : It means partition of racks. (Not a partition within a rack). A parition number within a partition group means a collection of racks. When one rack fails, few nodes in the same cluster partition will fail in Hadoop cluster reducing damage. i.e. Want low latency but don't want co-related cluster partition failure on single rack fail.
- Spread: Spreads the small workload in different hardware mostly in same AZ, but could be in different AZs. "Rack level Spread" placement groups can have only atmost 7 per AZ.
. N:1
. ENI ----------- AZ ENI is bound to AZ, but can be reassigned to any subnet in same AZ.
.
. N:1
. ENI ----------- EC2 EC2 has one Primary ENI and optional multiple secondary ENIs.
. Secondary ENI can even be in another VPC, but must be in same AZ!
. N:1
. ENI ----------- subnet ENI is bound to single subnet.
.
. N:M
. EC2 ----------- subnet EC2 belongs to multiple subnets if multiple ENI's attached.
. However, primary ENI determines primary subnet of EC2.
.
. Max ENI per c5.large is 3.
.
. Use floating ENI and disable delete on terminate option to persist ENI.
.
. Elastic Fabric Adapter == EFA == 100 Gbps For HPC computing.
. Elastic Network Adapter == ENA == Enhanced Networking == upto 100 Gbps
. Intel 82599 VF (Virtual Func) == Supports up to 10 Gbps.
.
. Max speed is capped by EC2 instance type. More ENIs does not increase throughput.
.
. Internet
. -------------->
.
.
. NACL Implicit inbound/outbound - All Allow; Stateless; All inbound/outbound rules are independent.
.
. NACL Outbound Rules (Match on first Rule and exit)
. Rule Destination
. 1 All high outbound ports TCP 1024-64K 0.0.0.0/0 Allow (Most important Rule)
. 2 Allow SSH TCP 22 0.0.0.0/0 Allow
. 3 Allow All to EC2 TCP 22 0.0.0.0/0 Allow
. 100 Deny if no match ALL ALL 0.0.0.0/0 Deny (Least important Rule)
.
. (Security Group) SG allows Only Allow Rules; No Deny Rules;
. TCP 22 0.0.0.0
. EC2 - ENI-1 <-- SG1 TCP 80 0.0.0.0
. ENI-2 <-- SG2 (Implicit Deny; Default SG has outbound rule to allow all)
.
. SG Implicit inbound/outbound - All Deny; Stateful; Outbound auto allowed if inbound allowed.
. New SG group has all allow outbound rule added.
. (Network ACL)
.
. Note: Outbound Rules refer to Port at destination. Inbound Rules refer to Port at local.
. This is true for SG and NACL.
.
Security group is attached to EC2 (specifically ENIs) and Network ACLs are attached to subnets. Using SG, you can control which ENI should serve your inbound and outbound requests. SG can be attached to other resources whenever ENI is involved, for example, interface VPC endpoints, RDS, etc.
By default security group has implicit deny inbound all logic. By defaut, there is also "allow all outbound" connections rule attached to security group which can be deleted if desired.
.
. Instance Profile: Defines Instance IAM Role.
. Supplies temp credentials to Applications running on that instance.
.
aws iam create-instance-profile --instance-profile-name Webserver
aws iam add-role-to-instance-profile --role-name S3Access --instance-profile-name Webserver
aws iam tag-instance-profile --instance-profile-name WebServer
--tags '[{"Key": "Department", "Value": "Engineering"}]'
Supported services using the instance identity role:
The instance identity role looks like, for example:
arn:aws:iam::0123456789012:assumed-role/aws:ec2-instance/i-0123456789example
IMDS, often laden with environment variables, is a service present on every EC2 instance.
This offers insights into the instance's network settings, attached role credentials, and other metadata.
Such environment variables help applications retrieve information without extra configurations.
The instance identity role credentials are accessible from the Instance Metadata Service (IMDS) at:
/identity-credentials/ec2/security-credentials/ec2-instance.
The credentials consist of an AWS temporary access key pair and a session token. They are used to sign AWS Sigv4 requests to the AWS services that use the instance identity role.
Instance identity roles are automatically created when an instance is launched, have no role-trust policy document, and are not subject to any identity or resource policy.
# Access temporary credentials using Metada service ...
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>
# Output:
{
AccessKeyId: ..., SecretAccessKey: ..., Token: ...., Expiration: ....
}
aws ec2 stop-instances --instance-ids i-1234567890abcdef0 [--hibernate]
. Auto Scaling Group
.
. instance1 min <= N <= max instances.
. instance2
. instance3 Auto Register
. instance4 ---------------> Load-Balancer
.
.
.
. ASG Health Check ---> EC2-Built-In-Health-Checks | ELB-TargetGroup-HealthCheck
.
. Launch Template (Preferred) vs Launch Config
.
. CloudWatch-Based-TargetTracking
.
Auto add new/remove EC2 instances depending on load
Dynamic Scaling:
Scheduled Scaling: e.g. Increase the min capacity to 10 at 5 PM on Fridays.
Predictive Scaling: Scale based on time series analysis => Schedule based on that.
LoadBalancer can do healthcheck on ASG instances!
Scaling Cooldown period (by default 300 seconds) is the period ASG will not add or remove instances.
You can temporarily suspend auto-scaling activities (adding or deleting instances), so that you can ssh into machines and do some investigations:
aws autoscaling suspend-processes --auto-scaling-group-name MyGroup
aws autoscaling resume-processes --auto-scaling-group-name MyGroup
Configure Healthcheck:
aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg
--health-check-type ELB # EC2: For built-in EC2 Healthcheck.
--health-check-grace-period 300
Use launch template (instead of Launch Config) which defines things including:
If your EC2 is stuck because of network or kernel corruption / blue screen then you can use the Systems Manager AWSSupport-ExecuteEC2Rescue Automation document, or run EC2Rescue manually.
It works for both Linux or Windows.
First you install EC2Rescue on your working (good) machine and then run the Systems manager Document.
Metadata is divided into two categories.
Instance metadata:
Includes metadata about the instance such as:
It Can be accessed from http://169.254.169.254/latest/meta-data/
Dynamic data:
Security credentials for automatically available for AWS SDK and other applications from EC2 instance (from instance role). It is saved in some place similar to .aws/credentials;
If you need to directly get the keys, you can also do this:
http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access
You may have to get a token and pass it in http header. See docs for more details
# If you want to convert from your own Ubuntu machine without any hypervisor ...
sudo dd if=/dev/sda of=/path/to/disk-image.img bs=4M status=progress
# Convert disk image to .vmdk format.
qemu-img convert -O vmdk /path/to/disk-image.img /path/to/output-image.vmdk
# Run from on-premises. You can directly use .VMDK file or create one first. Copy over to S3.
aws s3 cp <vmdk> s3://my-bucket/myfile.vmdk
# Run from EC2. Converts .vmdk to .ami
aws ec2 import-image --description "My server image" --disk-containers \
Format=VMDK,UserBucket={S3Bucket=your-bucket-name,S3Key=your-image-file.vmdk}
# Run from EC2. Can export running or stopped instance. This creates .vmdk file.
aws ec2 create-instance-export-task --instance-id i-instanceid --target-environment vmware \
--export-to-s3-task S3Bucket=mybucket,S3Prefix=exports/
.
. Spot-Fleet EC2-Fleet ASG
. (Spot+On-Demand) (Spot+On-Demand+RI) (Dynamic Metrics)
.
. Target-Capacity Batch-Job Web-Servers
.
.
. 1st Million calls free. Next $0.20 for 1 million. Very Cheap.
. Max Memory: 128 MB - 10GB !
. Max Execution Time: 900 Seconds (15 mins) (API Gateway limit is 29 seconds)
. Max Deployment size: 50 MB (compressed zip)
. Concurrency executions: 1000 (can be increased)
.
. Attach to VPC
. Lambda ---------------> Use ENI in your subnet
.
. CanaryRelease (Functional Alias and route-config) (AWS SAM + Code Deploy)
.
. Reserved Concurrency == Max Concurrency Per Function.
. Provisioned Concurrency == Pre-warmed Available Lambda Instances. (For Application AutoScaling)
.
See Also: Lambda@Edge in CloudFront section.
.
. Event Source Mapping Supported Sources
.
. SQS
. Kinesis DataStream
. DynamoDB
. MSK
. MQ
.
. Note: Kinesis Firehose supports only Lambda Triggers (For Data Transformation).
.
...........................................................................................................
.
. Feature Event Source Mapping Event Source Trigger
...........................................................................................................
.
. Polling Mechanism Lambda polls source Source service invokes Lambda
...........................................................................................................
.
. Batching Support Yes (configurable batch size) No (invocation per event)
...........................................................................................................
.
. Supported Services Queues and streams Most services (S3 SNS EventBridge API GW, etc)
. SQS Kinesis DynamoDB MQ MSK
...........................................................................................................
.
. Invocation Timing Polling interval and batching Immediate upon event occurrence
...........................................................................................................
.
. Error Handling Retry policies, DLQ Limited to retries within source service
...........................................................................................................
.
. Use Case High-throughput stream process Real-time response to specific events
...........................................................................................................
Event Source Mapping Lambda invocation modes:
Synchronous Invocation: e.g. CLI, API Gateway, DataFirehose: Caller may timeout quicker than lambda timeout.
Asynchronous Invocation: e.g. S3, SNS, CodePipeline, etc. Caller does not wait. Lambda may retry.
Polling Invocation: e.g. SQS, Kinesis Streams, MSK, DynamoDB. Invoked by Lambda Framework.
Retries until input stream event is visible.
Triggers are best suited for discrete independent events and invoked by the calling framework. Examples:
SNS Triggers (Async)
S3 Triggers (Async)
API Gateway Triggers (Sync)
.
. Long Poll Get Batch
. Event -----------------------------> Lambda --> Success ---> Remove messages
. Max wait (20 secs) visibility Timeout 30 secs --> Fail ---> Reinvoke
. Max Batch Count (10) Messages not visible in Queue
. Max Payload (6MB)
.
Example SQS message event:
{
Records: [
{
body: "....",
eventSourceARN: "<arn-queue>",
...
}
]
}
The invocation of lambda tries to batch events.
.
. Default Limits Per Account Per Region
. Max RPS Max Integration Timeout Max Concurrency
. API GW REST API 10K 29 Seconds 29*10K; 290K
. API GW Websocket 500 2 hours 3600*2*500=3.6Million
.
. Lambda 10K 15 mins 1K (can increase to 10K+)
.
. ALB 100K-million 4000s(1hr+) No limit.
. (Default:60s)
.
. ALB-Lambda (Throughput limited by Lambda concurrency)
.
.
See https://medium.com/hackernoon/how-to-solve-the-api-gw-30-seconds-limitation-using-alb-700bf3b1bd0e
.
. Authenticate
. Client ----------------------> API Gateway
. Rate Limit using (Lookup API Key and Throttle and Bill it)
. API Key
.
You can throttle clients depending on usage plans using API Keys to identify your clients.
. . Apply Budget . Management Account --------------> Member Accounts (Optional) . . SNS Topic ---> lambda --> Fix Policies . . . Budget Action ----> Lambda | Stop EC2, RDS Instances .
Create budget and send alarms when budge costs exceeded.
4 types of budgets: Usage, Cost, Reservation, Savings Plans
For Reserved Instances (RI):
Track utilization:
Supports EC2, ElastiCache, RDS, Redshift
Can filter by Service, Tag, etc.
Same options as AWS Cost Explorer!
2 budgets are free, then 2 cents/day/budget
Upto 5 SNS notifications per budget.
Budget Actions could be created to:
- Apply IAM policy to user/group/role
- Apply SCP to an OU
- stop EC2 or RDS instances
- Action can be auto or require workflow approval.
- Can configure Lambda to trigger
Centralized Budget management can be done from management account in organization. Each budget created can apply a filter for one member account, so that each budget refers to different member account.
For decentralized management accounts, we can apply a cloudFormation template from management account to each member account. This auto creates budgets in all member accounts.
.
.
. Forecast-Usage Visualize-Cost-History Custom-Reports
.
. Savings-Plans-Selection Cost-Breakup-By-OU-Accts-Tags-Service
.
. RI-Utilization RI-Coverage Custom-Usage-Report
.
RPO : Recovery Point Objective: Last recovery point that can be stored. e.g. Last backup taken 1 hour before, so last 1 hour data is lost.
RTO : Recovery Time Objective: From disaster to RTO is the downtime:
. {Data Lost } {Down Time}
. ----RPO-------------Disaster-----------RTO--------
.
Disaster Recovery strategies:
- Backup and Restore
- Pilot Light (Continous Backup Of data. Small downtime Okay.)
- Warm Standby (not hot standby. may operate minimally)
- Hot Site / Multi Site Approach
Backup and
Perform analytics using data from supported SaaS applications:
. AppFlow:
.
. Glue Catalog S3
. Salesforce Transform RedShift
. Google Analytics ====> Mask Fields ===> Salesforce
. Facebook Ads Snowflake (DataCloud AWS Partner)
.
.
. Templates: Json or YAML file template.
. Stacks: Create/update/delete Stacks using Template
. Change Sets: Review changes then update Stack
. Stack Sets: Multiple stacks across regions using single template.
.
. Template-Config-File: Defines Parameters, Tags, Stack Policy
.
. StackPolicy: Permissions policy. Defines Who can update.
.
.
. StackSets
. Region1 Region2
. Delegated-Admin-Account
. Target Accounts
.
. Resource Type: e.g. EC2::Instance, EC2::EIP, ECS::Cluster, etc.
. Resource Properties: ImageId, SubnetId, etc.
. Resource Attributes: Special Attributes to control behaviour and relationships.
. e.g. CreationPolicy, MetaData, DependsOn (controls creation order)
.
{
"AWSTemplateFormatVersion" : "version date",
"Description" : "JSON string",
"Metadata" : { template metadata },
"Parameters" : { set of parameters }, # Define parameters. If you specify AllowedValues, it becomes menu.
# Specify Default/max/min constraints.
"Rules" : { set of rules },
"Mappings" : { set of mappings }, # Reuse parameters e.g.
# !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
"Conditions" : { set of conditions },
"Transform" : { set of transforms },
"Resources" : {
set of resources
},
"Outputs" : { set of outputs }
}
Note: YAML format of cloudformation templates support: !Ref (Reference) and !Sub (Substitute) and similar functions. This is specific to cloudformation. To convert this yaml to json, you need to use cfn-flip command.
aws cloudformation create-stack --stack-name myteststack --template-body file://sampletemplate.json \
--parameters ParameterKey=KeyPairName,ParameterValue=TestKey
--stack-policy-url file://my-policy.json # Permission policy for updates
--tags Key=dept,Value=dev # Tag applied to the cloudformation and all resources.
aws cloudformation update-stack --stack-name <stack-name> ...
A template configuration file is a JSON file that defines:
Useful when you deploy single template into different environments with different configuration files. Stack policy also can live in a separte file and used as such.
Mappings:
RegionMap:
us-east-1:
AMIID: ami-0abcdef1234567890
us-west-1:
AMIID: ami-0abcdef1234567891
Resources:
MyInstance:
Type: "AWS::EC2::Instance"
Properties:
ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", AMIID]
Typically there is a Delegated Administrator member Account for a service (say, CloudFormation or EC2 etc).
Users in that member account can administer other accounts in AWS Organization.
aws organizations register-delegated-administrator \
--service-principal=member.org.stacksets.cloudformation.amazonaws.com \
--account-id="memberAccountId"
It is a permissions policy to prevent accidental update of the stack.
Defines permissions who can update and which resources.
You can protect all resources or only some resources from being updated.
Atmost one stack policy only can be attached to the stack.
If you attach empty policy, all updates are denied by default.
Example policy:
{
"Statement" : [
{ "Effect" : "Allow", "Action" : "Update:*", "Principal": "*", "Resource" : "*" },
{ "Effect" : "Deny", "Action" : "Update:*", "Principal": "*", "Resource" : "my/db" }
]
}
You can first create a secret using Secrets Manager. Later you can reference it:
Resources:
mySecret:
Type: AWS::SecretsManager::Secret
Properties:
.....
myRDSDBInstance:
Type: AWS::RDS::DBInstance
Properties:
....
..... Refer to ${mySecret}::password here ...
It is similar to using Secrets Manager keys. Another example of using it dynamically :
UserData:
Fn::Base64: !Sub | <===== Note the !Sub (Substitute Function)
#!/bin/bash
echo "API_KEY=${ssm:/myApp/apiKey}" >> /etc/environment
# echo "API_KEY=$(aws ssm get-parameter --name /myApp/apiKey ... )" >> /etc/environment
The DeletionPolicy to delete resources on deleting the stack could be one of following:
Note: For RDS cluster, default is to snapshot. Note: For S3, you need to empty bucket first.
Custom Resources can be defined using Lambda:
- Resource not yet supported. New service for example.
- Empty S3 bucket before deletion.
- Fetch an AMI Id.
CloudFormation Drift:
- Single resource may have manually changed later.
- Detect drift of entire stackset using CloudFormation Drift feature from console!
Resource Import allows importing existing resources in the template:
- No need to delete and re-create all resources. You can keep some and import it.
.
. Access Control - URL based restriction possible (unlike NACL or SG)
. Load Balancing
. Caching - Similar to browser caching, local common caching.
. SSL Terminnation
. Rate limiting
.
. VmWare-Cloud
.
. vSphere ESXi-Hypervisor vCenter
.
. Backup-Gateway
.
Offers you to reserve some physical hosts on AWS and run your familiar vSphere suite of products that you use on-premise on cloud also.
. IPv6 - Dual Stack - CIDR - Subnet - NACL - SG
.
. All public. No NATGW. DNS AAAA.
.
Trusted Advisor is a service with programmatic API. (Premium support required) You can get recommendations on following categories:
Cost optimization
Performance
Security
Fault tolerance
Service limits – Checks if usage approaches or exceeds the limit (also known as quotas)
Operational Excellence – Recommendations as per standards.
aws support describe-trusted-advisor-checks --language en
Check Check-Id
.....
Service Limits eW7HH0l7J9
....
aws support describe-trusted-advisor-check-result --check-id eW7HH0l7J9 --language en
.... (Json output regarding service limits)
{
"checkId": "eW7HH0l7J9",
"result": {
...
"flaggedResources": [
{
"region": "us-west-2",
"service": "Amazon EC2",
"limit": "On-Demand Instances",
"currentUsage": "110",
"maxLimit": "120",
"status": "WARN"
}, ...
]
}
}
.
. CloudWatch
. Spot Fleet
. |
. ECS ------ Scaling-Plan ---- ASG
. |
. Aurora, DynamoDB
.
.
. Free-Service CloudWatch-Based
.
. Discover-Scalable-Resources
.
. Built-In-Recommendations Predictive-Scaling-Support
.
Manage scaling for multiple scalable AWS resources through a scaling plan.
Discover Scalable Resources.
Choose scaling strategies - Built-in Scaling Recommendations.
Free Service. CloudWatch Based. Basic CloudWatch Charges Applicable.
Supported Resources:
Scaling strategies available:
In addition to Scaling strategies, Enable/Disable Following:
Other tuning parameters:
Synopsis :
aws autoscaling-plans create-scaling-plan \
--scaling-plan-name MyScalingPlan \
--application-source "ResourceType=AutoScalingGroup,TagFilters=[{Key=Environment,Values=[Production]}]" \
--scaling-instructions '[
{
"ServiceNamespace": "autoscaling",
"ResourceId": "autoScalingGroup/my-asg",
"ScalableDimension": "autoscaling:autoScalingGroup:DesiredCapacity",
"MinCapacity": 1,
"MaxCapacity": 10,
"TargetTrackingConfiguration": {
"PredefinedScalingMetricSpecification": {
"PredefinedScalingMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0,
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}
}
]'
aws autoscaling-plans describe-scaling-plans --scaling-plan-names MyScalingPlan
Application Auto Scaling vs Scaling Plan based Auto Scaling:
Feature/Aspect AWS Auto Scaling (Scaling Plans)
Application Auto Scaling
Primary Purpose Centralized scaling for ASG and core. Support More Resources Supported Resources EC2 ASG, ECS, DynamoDB, Aurora, Spot Fleet
Plus Lambda, EMR, etc.
Scaling Plan Yes, defines a central scaling strategy.
No single plan;
Scaling Policies Target tracking, step scaling, and scheduled.
Similar.
Predictive Scaling Yes, forecasts patterns and scales.
No Predictive Scaling.
Ease of Management CLI, UI Console.
CLI only. Service UI mostly.
Built-in Strategy Built-in support for balancing (40,50,70%)
No such built-in strategy.
Custom Resource No support for custom resources.
Yes. With Custom Metrics.
Application Auto Scaling Supported AWS Services:
Feature/Aspect Comments - What it scales. Amazon ECS ECS Service tasks. EC2 Spot Fleet Spot Fleets Capacity. DynamoDB WCU/RCU capacity On-Demand for DynamoDB tables and indexes. RDS Aurora Read replicas. Kinesis Data Streams The number of shards On-Demand. Amazon EMR The instance count in EMR managed instance groups. AppStream 2.0 Backing Fleets. Amazon Comprehend Document classifier/recognizer Inference endpoints. Amazon ElastiCache Redis Replication NodeGroups and Replicas. Amazon EMR Total Master, Core Group Instances (EC2) Amazon Keyspaces WCU and RCU Lambda Per function provisioned concurrency. Amazon MSK Storage Volume Size Amazon Neptune Read Replica Count SageMaker Desired Instances (with in min max range) etc Spot Fleet Target Capacity i.e. Total instances. (with in min max) WorkSpaces Desired User Sessions (with in min-max range) Custom resources Custom
ECS Example:
aws application-autoscaling register-scalable-target
--service-namespace ecs
--scalable-dimension ecs:service:DesiredCount
--resource-id service/cluster-name/service-name
--min-capacity 1 --max-capacity 10
aws application-autoscaling put-scaling-policy
--policy-name TargetTrackingCPUUtilizationPolicy
--service-namespace ecs
--resource-id service/cluster-name/service-name
--scalable-dimension ecs:service:DesiredCount
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration file://config.json
Example config.json for maintaining CPU utilization at 70%:
{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 300
}
Client Request:
GET /socket HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
# Server Response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hSmTq3NQZLHT4VkZJ1xGjbkA0V8=
"Custom Full text search" "Your Pages" "Your documents" "Auto Complete"
"Term Boosting" "Faceting - Classifying search results"
Amazon CloudSearch is a fully managed search service in the cloud for yor website to search your collection of web pages, document files, forum posts or product information.
.
. Factory | AWS
. | Deploy
. Nucleous MQTT-Broker |<-------> GreenGrass Cloud Service
. X.509 Cert MQTT Components |
. IOT Device ---------- GreenGrass Core Device |
. FreeRTOS Linux Machine |<-------> IOT-Core Analytics S3
. Device SDK Auth CLI |
. (Sensors) JVM |
. (Thing) | Sitewise
. |
. MQTT
. IOT Device -------> Data-ATS --> IOT Core --> Rule --> Timestream DB | DynamoDB
. Endpoint
. (ATS - Amazon Trust Services for IOT Core)
AWS CodeCommit: Primarily used for version control of source code and other assets. AWS CodeArtifact: Designed for managing and sharing software packages and dependencies.
Amazon WorkDocs is a fully managed platform for creating, sharing, and enriching digital content. Supports client managed encryptions, versions and sharing of documents within and across teams in secure manner.
Entire Userbase used before seamless version switch.
Green: Running new version; Blue: Running existing version (e.g. production)
Elastic Beanstalk supports it using SWAP URL using CNAME :
.
. Route-53 Dynamic Mapping to Blue/Green Load Balancer of Beanstalk
.
. DNS-Web-URL example.com --------------> blue.example.com | green.example.com
. Blue BeanStalk Green Beanstalk
.
. Note: ELB is optional in Beanstalk.
.
Code Deploy supports it in ECS deployment and Lambda functions.
Route 53 supports (using weighted route policy) switching CNAME to new env.
API Gateway Stages can be used as Blue/Green strategy.
.
. Listener Rule +---- Blue-Target-Group
. Load-Balancer ---------------->|
. Host SNI Rule +---- Green-Target-Group
. 20% - 80% Rule
. (Traffic Splitting)
. Canary Deployment
.
. Deployment Type: Blue/Green
. Traffic Shifting Strategy: Linear | Canary | All-at-once-after-testing
.
Commands:
#
# You can have Host based routing, even when the traffic shifting is in progress.
# The higher priority rule, will enable manual testing of Blue/Green environment.
#
aws elbv2 create-rule \
--listener-arn <your-listener-arn> \
--priority 10 \ # Higher Priority Rule for manual testing.
--conditions Field=host-header,Values=blue.example.com \
--actions Type=forward,TargetGroupArn=<blue-target-group-arn>
#
# For traffic Splitting ....
aws elbv2 create-rule \
--listener-arn <your-listener-arn> \
--priority 10 \
--conditions Field=path-pattern,Values='/app/*' \
--actions Type=forward,ForwardConfig="
{TargetGroups=[{TargetGroupArn=<blue-target-group-arn>,Weight=80},
{TargetGroupArn=<green-target-group-arn>,Weight=20}]}"
Userbase divided to test different versions.
Use ML to analyze images or videos.
For Images:
- Detect objects, scenes, and concepts in images. Augment with inline labels.
- Recognize celebrities
- Detect text in a variety of languages
- Detect explicit, inappropriate, or violent content or images - Content Moderation.
- Detect, analyze, and compare faces and facial attributes like age and emotions
- Detect the presence of PPE
For Videos do above and in addition:
- Track people and objects across video frames
- Search video for persons of interest
- Analyze faces for attributes like age and emotions
- Aggregate and sort analysis results by timestamps and segments
Used in social media, broadcast, advertising and e-commerce to create safer user experience.
Set a min confidence threshold for item to be flagged.
Flag sensitive content for manual review in Amazon Augmented AI (A2I) (with augmented labeling)
If your goal is only to extract text, use Textract since this is heavy weight solution.
.
. Voice Call
. User -------------> Lex (Convert to Text) ------> Connect
. (Understand Intent) (Connect to Backend Workflow)
. (Uses Lambda)
.
Amazon Comprehend Medical Service does NLP analysis on unstructured clinical text:
- Physician's notes, Discharge summaries, Test results, Case notes.
- Use NLP to detect Protected Health Information and process accordingly.
- Store documents in S3 and analyze text, media accordingly.
| Image ----Amazon-Textract--> { DocumentId: "12345", "Name": "",
| Sex: "F", DOB: "23/05/1980" }
.
. Elemental MediaConvert - Video Transcoding - One format to another.
. Elemental MediaLive - Live Video Processing.
. Elemental Media Package - Media prepare and Packaging. (Scalabe Streaming?)
. Elemental Media Store - Low Latency Media Storage
. Elemental Media Tailor - Personalized Ad insertion.
. Amazon Interactive
. Video Servce IVS - Like Zoom or Google Meet application.
.
.
. CodeCommit / CodeBuild / CodeDeploy
.
. Caching-Artifacts: S3, EFS, CodeArtifact
.
.
CodeBuild with EFS Caching across stages, yaml config file:
environment:
computeType: BUILD_GENERAL1_LARGE
image: aws/codebuild/standard:4.0
type: LINUX_CONTAINER
fileSystemLocations:
- location: fs-12345:/build-cache
mountPoint: /mnt/efs
type: EFS
.
. AWS Workspace === Windows (and Linux) Remote Desktop
. Persistent Disk
. Microsoft AD FS
.
.
. AppStream === Access Windows Remote Applications using Browser or Client.
. Not Persistent.
.
. Hint: Remember as "AppStream is a Stream of Pixels in Windows".
Access any applications and/or non-persistent desktops in your HTML5 browser or Windows Client!
AWS compute and data is co-located and only encrypted pixels are streamed to client! It is like a remote desktop.
.
. Macie: Analyze S3 using ML for security and Personal Data.
. Notify Security Hub.
.
. Hint: Imagine Macie Cleaning (S3) Bucket full of documents/letters.
Amazon Macie is a data security service that uses ML to protect your sensitive data.
Continously assess your S3 bucket for security and access controls.
Look for Personally identifiable data.
Full discovery scans from interactive user given data map.
You can run scan job daily/weekly or just one-time from console. You can also specify export findings frequency.
Generate findings and send to AWS Security Hub or EventBridge.
e.g. A PDF or xlsx file in S3 bucket contains names and credit card numbers.
. Push
. Mobile <--------> Pinpoint ----------> Email | SMS | Notification | In-App Messaging
.
| create create
| EC2 Image ------> Builder --------> New-AMI ---> Test --> Distribute
| Builder EC2 EC2 AMI
|
|
|
| Actions
| MQTT --> IoT Topic ------> IoT Rules -------> SQS | SNS | S3 | FireHose etc
|
|
npm install -g aws-cdk
Following patterns were taken from AWS Solutions Constructs.
AWS Solutions Constructs is open source extension of CDK. (Cloud Development Kit).
You can create cloudformation template out of these constructs.
.
. Run Docker Image with Ease - Serverless Solution.
.
. AppRunner is independent of ECS, Fargate, Elastic Beanstalk
.
. Github checkin ==> Can Auto Trigger Build & Running Job
.
. Built-in Scaling, Built-in Load Balancer, Built-in Health Checks and replace.
.
.
. On-Premise OutPosts
. Network <----------> Subnet ------> AWS
. Local Gateway
.
EC2 boot time (user data) script can be used to install certs on EC2 retrieving things from SSM parameter store. The EC2 IAM role should have permissions to retrieve things from SSM parameter store ??
There are AWS managed Logs --- Load Balancer Access Logs, CloudTrail Logs, VPC Flow Logs, Route 53 Access Logs, S3 Access Logs, etc.
aws:PrincipalOrgId is AWS Organization Id. you can use this in IAM policy to restrict action on resources belonging to any account in same Org.
You can use EC2 Instance Connect feature to use IAM permission to ssh into EC2. The AMI should have EC2 Instance Connect pre-installed. It allows temporary public key copied to EC2 instance metadata and then gets copied over to root user's ssh public key. The client then connects using private key.
AWS Batch supports multi-node Parallel jobs. Schedule jobs that auto launches EC2s.
Auto Scaling Groups is usually used with ALB: Setup rules like if CPU > 40% then scale up. You can also use network In/Out, RequestCountPerTarget (From ALB to EC2), custom metrics. Predictive scaling involves historical load analysis and scale in advance.
Splot Fleet Support can mix on Spot and On-Demand EC2 instances to keep cost minimum.
AWS App Runner Service allows you to deploy Docker Image based application. Provides Auto scaling, HA, Load Balancer. Rapid production deployment! It supports custom domain for the application. It uses fargate underneath but provides simpler workflow.
Amazon EKS Anywhere also makes available EKS on-premises. Optionally you can use EKS Connector to connect to AWS. Otherwise you can just run in disconnected mode from AWS.
AWS Lambda coupled with Event Bridge can run serverless Cron Job.
Lambda supports Node.js, Python, Java, Ruby, etc. Lambda also supports running Docker image but it must implement Lambda Runtime API.
Lambda limits:
- RAM: 128 MB to 10GB memory
- CPU: Linked with RAM. 2 to 6 vCPUs
- Timeout - Up to 15 mins.
- /tmp storage - 10GB
- Deployment Package - 50 MB (zipped), 250MB (unzipped) including layers.
- Concurrent executions - 1000 (softlimit that can be increased)
- Container Image Size - 10 GB
- Invocation Payload (Request/response) - 6 MB (sync), 256 KB (async)
- Async invocations (invoked from SDK) typically raise SNS notifications and such.
Lambda can be invoked from ALB, API Gateway or from SDK/CLI directly. Limit concurrency at API Gateway to prevent too many concurrent lambdas. There is Lambda global concurrency that applies too.
AWS Wavelength Service is to support Telecom providers datacenter at the edge of 5G networks. Brings AWS services to 5G networks. Traffic does not leave 5G network. Highbandwidth connection to parent AWS region is optional and possible. Use cases include real-time gaming, Smart cities, AR/VR, Connected Vehicles, etc.
Some AWS Wavelength Zones are made available by AWS along with availability zones.
AWS Local Zones are additional availability zones mainly targetting cities. These zones are available along with standard availability zones for regions. Local Zones support most popular services, but some services may not be available. You can create EC2 on specific local zone (say Boston). You should first create subnet on that local zone then use that subnet for that EC2.
To check if certain property exists or not:
| "Condition":{"Null":{"aws:TokenIssueTime":"true"}} # Is null = True means the key does not exist.
|
| "Condition":{"Null":{"aws:TokenIssueTime":"true"}} # Is null = False means the key exists
| # value is immaterial.
AWS Data Pipeline is retired and deprecated in favor of Glue for ETL workflow.
See following references.
Opensource extension of AWS Cloud Development Kit (AWS CDK).