Notes on AWS Certifications
* Customer Engagement Services ============================= * Amazon Connect : Create process flows for customers. Self service ??? * Amazon Pinpoint: Send Email, SMS, etc targetted marketing. * SES - Simple Email Service. Bulk Email
DeepLens - Deep Learning enabled Video camera. Vision Systems Applications.
Sage Maker - Build and Train Models and deploy them into AWS.
Rekognition - Deep Learning analysis for images and videos.
Given image, gives you labels with confidence probabilities.
Amazon Lex - Useful to build Conversational Chat Bots.
Amazon Polly - Speech to Text Service.
Comprehend - Deep Learning of Text and relationships.
Translate - Machine learning aided translation.
Transcribe - Convert audio to transcribed text.
Artifact - Online Portal provides access to security compliance documentation
Certificate Manager - SSL certificates for free.
Cloud Directory Service - MCan support Multi dimensional Hierarchy.
LDAP supports only single hierarchy.
Directory Service - Managed Microsoft Active Directory service.
CloudHSM - Dedicated Hardware Security Module in cloud. (Alternative to
pendrive like device).
Cognito - Sign-In and Sign-Up for applications. Integrate signup with google
facebook or SAML providers.
IAM - Granular security - Role based access controls.
AWS Organizations - Manage multiple AWS accounts from one login.
Amazon Inspector - Analyze vulnerabilities.
KMS - Key management Services - Create and manage encryption keys and
uses hardware security modules to secure keys. Integrated with S3,
Redshift, EBS.
WAF - Web application Firewall - SQL injection, Cross site scripting etc. WAF is integrated with CloudFront, ALB, API Gateway services.
Shield - Protect against DDOS attack. For Layer 3/4 attack. Integrated with EC2 apps, ELB, CloudFront, AWS Global Accelerator, and Route 53.
AWS Guard Duty - Analyze AWS CloudTrail Events, VPC Flow Logs, and DNS Logs. Recognize known malicious IP addresses, anomaly detection, and machine learning to identify threats more accurately.
AWS KMS - Used for Client-side encryption before sending it to S3. Use a customer master key (CMK) stored in AWS Key Management Service (AWS KMS).
Symmetric and Asymmetric encryption. Assymmetric key used to verify server identity. Server creates symmetric key for bi-directional encryption for the session. AWS KMS is used to keep symmetric key on AWS. IAM user should have permission to access that key.
Cloud9 - Online IDE
CodeStar - Project management dashboard. Integrated Jira.
Automates CI/CD pipeline.
X-Ray - Diagnostics for performance analysis.
CodeCommit - Hosted git
CodePipeline - CI/CD pipeline.
CodeBuild - Automated build.
CodeDeploy - Deploy into EC2 or Lambda etc.
AWS IOT - Managed service enables embedded devices to interact with cloud apps.
FreeRTOS - OS for microcontrollers. Allows to connect to AWS IOT.
Greengrass - Enables Lambda functions to run locally on IOT devices and
interact with cloud.
AWS Elastic Beanstalk is an orchestration service for EC2, AutoScaling, Elastic Load balancers and such. It internally uses ECS and Tasks to deploy application.
Supported applications and software stacks include:
- Apache Tomcat for Java applications
- Apache HTTP Server for PHP applications
- Apache HTTP Server for Python applications
- Nginx or Apache HTTP Server for Node.js applications
- Passenger or Puma for Ruby applications
- Microsoft IIS 7.5, 8.0, and 8.5 for .NET applications
- Java SE
- Docker
- Go
For NodeJS application, Nginx or Apache server is a reverse proxy that can forward requests to express server.
Default Autoscaling metric is not good enough -- It just watches outbound network traffic -- If it is over 6MB in 5 mins, it triggers scaling. You can configure to use EC2 instance metrics such as CPU usage or memory usage or %user time, %system time, etc.
For session stickiness, there is a configuration to use cookies.
The environment could be web server environment or "Worker Environment". For worker environment, each instance runs a daemon that reads from SQS queue and does a POST request to configured path.
You can choose, classic or network or application load balancer.
Dockerrun.aws.json file: Amazon EC2 instances running Multicontainer Docker in an Elastic Beanstalk environment require a configuration file named Dockerrun.aws.json. This file is used to generate ECS Task definitions to run tasks in generated ECS.
Application Vs Network Load balancers.
ALB can support:
NLB can:
Classic Load Balancer is deprecated. It is like ALB but it is required if your EC2 instances are in classic network (not in VPC).
ALB makes use of subnets that can be configured as:
What is TLS offloading? A TLS (or SSL) termination proxy is a proxy server used to terminate and/or establish TLS (or DTLS) tunnels. NLB can make use of termination proxy server. Note that you can not decrypt unless you had negotiated and established the SSL session using relevant certificates.
When you create your load balancer, you configure listeners (e.g. https), configure health checks, and register back-end instances.
When you create elb, the name looks like below:
name-123456789.region.elb.amazonaws.com
ipv6.name-123456789.region.elb.amazonaws.com
dualstack.name-123456789.region.elb.amazonaws.com
See https://www.bluematador.com/blog/static-ips-for-aws-application-load-balancer
Availability Zone
Local Zone - Like availability zone, but smaller one established in
popular cities. If your service is mostly consumed in same city or
local region, you can prefer the local zone (on Los Angeles, for
example). Limited services and they usually cost extra than regular
zones. Part of the same parent region.
Outpost - On-premise AWS hardware like EC2 racks. On-premise VPC.
No AWS direct link. Connects to AWS services using internet. If you
want to replicate AWS zone on-premise, this is for you.
Wavelength Zone - Wavelength embeds storage and compute inside telco
providers 5G networks. e.g. EC2 and EBS installed inside JIO 5G
networks to provide ultra low milliseconds latency to IOT devices. 5G
devices can reach apps running in Wavelength Zones without ever
leaving the 5G network.
Edge locations exist only to serve Cloudfront, Route 53, WAF, Shield and Global Accelerator (static IP). The data travel from edge location to AWS Region using AWS backbone (much faster compared to global internet). All regions are connected by AWS backbones. AWS local zone is different because you can run your EC2 etc services in local zone, but not in edge.
CloudFront actually has two tiers of edge location: edge points of presence (POPs) and regional edge caches (RECs), bigger and powerful.
The Lambda@Edge does not really run at edge location but on regional edge caches -- i.e. Nearest region to the client. Another feature Lambda Functions run really at edge location, but meant only for limited purposes like Header rewrite etc and have lot of restrictions.
As of 2022 Jan, Amazon CloudFront uses a global network of:
- 310+ Points of Presence (300+ Edge locations and 13 regional mid-tier caches)
- in 90+ cities
- across 47 countries
In addition to the existing region in Mumbai and the new one planned for Hyderabad, there are currently seventeen CloudFront edge locations in India: four in Hyderabad, another four in New Delhi, three in Bangalore, three in Mumbai, two in Chennai, and one in Kolkata.
Kafka brokers are clustered.
Each Topic contains multiple partitions (based on key of the message). Each partition contains a list of segments.
Each broker is repsonsible to maintain list of partitions.
A broker is a leader for some partitions. Other brokers are followers. Typical replication factor is 3.
Partition leader handles reads and writes. (Followers do not serve even reads)
Total replicas can never be more than total brokers.
Total partitions could be more than total brokers for same topic. E.g. Single broker configuration is possible with 10 partitions per topic. But total replicas will be only 1.
Rule of thumb is 10 partitions per topic and 10K partitions per cluster.
Followers have ISR (In-Sync replica) of leader partitions. Could lag max 10 seconds to catch up. (configurable).
Example :
Properties settings = new Properties();
settings.put("bootstrap.servers", "kafka-1:9992,kafka-2:9992");
// bootstrap.servers === Brokers
settings.put("client.id", "basic-producer-v1.0");
// Set key.serializer and value.serializer as well.
final KafkaProducer<String,String> producer = new KafkaProducer<>(settings);
// Add shutdown hook, that is producer.close();
final String topic = "Hello_World_Topic";
for (int i=1; i<=5; i++){
final String key = "key-" + i;
final String value = "value-" + i;
final ProducerRecord<String, String> record =
new ProducerRecord<>(topic, key, value);
producer.send(record);
}
// Consumer Pseudo code:
Set configuration:
{
'group.id' : 'my-consumer-group',
'bootstrap.servers' : 'kafka-1:9992',
'auto.commit.interval.ms' : 5000,
'auto.offset.reset', 'earliest'
}
let consumer = New Consumer(config, StringDeserializer)
consumer.subscribe('Hello_World_Topic');
while true: consumer.Poll(100ms);
Default data retention policy is 1 week.
producer is aware of partition logic and it knows exactly the destination partition (and the leader broker endpoint). It sends records in mini batches.
producer can demand no-ack i.e. Ack0, or Acks-1 (Leader Ack) or Acks-All (ie. Leader and all replicas are written then Ack received).
Delivery Gurantees - Atmost once or Atleast once or Exactly once. Idempotent operations are Okay with Atleast Once. Atmost once ?? Less critical operation like notification and recommendation ? Exactly Once ? Like a finanical transaction.
Tools like ksqlDB and Kafka Streams provide better delivery guarantee ?
Consumer Group -- Typically represents a cluster of consumers for an application. Load balancing can happen, more consumers may be auto added in a group.
Single consumer can subscribe to multiple topics and be inside a consumer group.
Consumer cluster rebalancing after adding a consumer involves automatic reassignment of subscription partitions to consumer instances all done by consumer API. However application specific complex state management is not done by the API.
Cruise-control (2.1K github stars) is a tool to automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
* etcd vs zookeeper ================== * Kafka, Hadoop uses zookeeper for distributed consensus. * Kubernetes uses etcd which uses (flat unlike zookeeper's tree) distributed key-value pairs. * etcd primarily uses replicated state machine where as zookeeper uses coordination kernel. * zookeeper protocol is slightly different from literature. * etcd provides total ordering of events where as zookeeper does only partial. * Now a days etcd is preferred over zookeeper. * etcd3 provides zetcd daemon which is a proxy for etcd and provides API compatibility with zookeeper client. * consul is another alternative to etcd and zookeeper - provides better service discovery mechanism.