devops
Cloud Fundamentals (AWS / Azure / GCP)
Shared responsibility model, IAM, EC2, VPC, S3, RDS, ECR, and cost awareness
Shared Responsibility Model
Cloud doesnβt mean βno security concerns.β It means security responsibilities are split.
| AWS Responsible | You Responsible | |
|---|---|---|
| Physical | Data centers, hardware | β |
| Network | Backbone network | VPC config, security groups |
| Compute | Hypervisor | OS patching (EC2), code |
| Storage | Physical disks, S3 durability | Bucket policies, encryption |
| Database | RDS infrastructure | DB user permissions, data |
| IAM | IAM service | Who has what permissions |
The mistake most teams make: Assuming βitβs in the cloud so itβs secure.β S3 bucket misconfigurations and overly permissive IAM are the most common causes of cloud breaches.
IAM β Identity and Access Management
Users vs Roles
| IAM User | IAM Role | |
|---|---|---|
| Who uses it | Humans with long-term credentials | Services, EC2 instances, Lambda |
| Credentials | Access key + secret (long-lived) | Temporary tokens (STS) |
| Use case | Developers, CI/CD service accounts | EC2 to access S3, Lambda execution |
| Best practice | Avoid long-lived keys, use SSO | Prefer over users for services |
# Check your current identityaws sts get-caller-identity
# List IAM usersaws iam list-users
# List IAM rolesaws iam list-roles
# Who has access to what (for a user)aws iam list-attached-user-policies --user-name aliceaws iam list-user-policies --user-name alicePolicies
IAM policies are JSON documents defining Allow/Deny for actions on resources.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::my-bucket/*" }, { "Effect": "Deny", "Action": "s3:DeleteObject", "Resource": "*" } ]}Policy types:
| Type | Scope | Example |
|---|---|---|
| Managed (AWS) | AWS-maintained | AmazonS3ReadOnlyAccess |
| Managed (Customer) | Your account | Your custom policies |
| Inline | Attached directly to user/role | Specific one-off permissions |
| Resource-based | On the resource itself | S3 bucket policy |
| Permission boundary | Max permissions a role can have | Limiting what devs can grant |
Access Boundaries & Least Privilege
# Check effective permissions (IAM Policy Simulator via CLI)aws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::123456789012:user/alice \ --action-names s3:GetObject \ --resource-arns arn:aws:s3:::my-bucket/file.txt
# Generate least-privilege policy from CloudTrail# Use IAM Access Analyzer to see unused permissionsLeast privilege checklist:
- No
*actions unless absolutely necessary - No
*resources unless justified - Separate roles per service (EC2 role β Lambda role)
- Rotate access keys regularly or switch to roles/SSO
EC2 β Elastic Compute Cloud
Instance Lifecycle
stopped β pending β running β stopping β stopped β shutting-down β terminated# Launch instance (aws CLI)aws ec2 run-instances \ --image-id ami-0abcdef1234567890 \ --instance-type t3.medium \ --key-name my-key-pair \ --security-group-ids sg-12345 \ --subnet-id subnet-12345 \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=my-server}]'
# List instancesaws ec2 describe-instances \ --query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value|[0],InstanceId,State.Name,PublicIpAddress]' \ --output table
# Stop / start / terminateaws ec2 stop-instances --instance-ids i-1234567890abcdef0aws ec2 start-instances --instance-ids i-1234567890abcdef0aws ec2 terminate-instances --instance-ids i-1234567890abcdef0Security Groups
Security groups are stateful firewalls at the instance/ENI level.
# Create security groupaws ec2 create-security-group \ --group-name web-sg \ --description "Web server security group" \ --vpc-id vpc-12345
# Allow inbound HTTPS from anywhereaws ec2 authorize-security-group-ingress \ --group-id sg-12345 \ --protocol tcp \ --port 443 \ --cidr 0.0.0.0/0
# Allow SSH from specific IP onlyaws ec2 authorize-security-group-ingress \ --group-id sg-12345 \ --protocol tcp \ --port 22 \ --cidr 203.0.113.10/32
# Allow traffic from another security group (e.g., from load balancer SG)aws ec2 authorize-security-group-ingress \ --group-id sg-app \ --protocol tcp \ --port 8080 \ --source-group sg-albSecurity group rules:
- Inbound: controls what traffic can reach your instance
- Outbound: controls what traffic your instance can send (default: all allowed)
- Stateful: if inbound is allowed, the response is automatically allowed
- No explicit deny β only allow rules exist; everything else is denied
VPC β Virtual Private Cloud
CIDR Planning
VPC CIDR: 10.0.0.0/16 (65,534 IPs)βββ AZ-1a: 10.0.1.0/24 (254 IPs)β βββ Public subnet: 10.0.1.0/25 (web servers, load balancers)β βββ Private subnet: 10.0.1.128/25 (app servers)βββ AZ-1b: 10.0.2.0/24β βββ Public subnet: 10.0.2.0/25β βββ Private subnet: 10.0.2.128/25βββ AZ-1c: 10.0.3.0/24 βββ Public subnet: 10.0.3.0/25 βββ Private subnet: 10.0.3.128/25Planning rules:
- Reserve enough space for growth (use /16 for VPC)
- Donβt overlap with on-prem networks (needed for VPN/Direct Connect)
- Separate AZs for high availability
- Separate subnets for different tiers (public/private)
Public vs Private Subnets
| Public Subnet | Private Subnet | |
|---|---|---|
| Has | Internet Gateway route | NAT Gateway route (optional) |
| Instances get | Public IP possible | Private IP only |
| Accessible from internet | Yes (with security group) | No |
| Examples | Load balancers, bastion hosts | App servers, databases |
# Key components:# Internet Gateway (IGW) β allows public subnet traffic to/from internet# NAT Gateway β allows private subnet instances to reach internet (outbound only)# Route Table β rules for where to send traffic
# Check route tablesaws ec2 describe-route-tables --filters "Name=vpc-id,Values=vpc-12345"S3 β Simple Storage Service
# Create bucketaws s3 mb s3://my-unique-bucket-name --region us-east-1
# Upload fileaws s3 cp localfile.txt s3://my-bucket/prefix/file.txt
# Upload directoryaws s3 sync ./local-dir s3://my-bucket/prefix/
# Downloadaws s3 cp s3://my-bucket/file.txt ./
# List contentsaws s3 ls s3://my-bucket/aws s3 ls s3://my-bucket/ --recursive
# Deleteaws s3 rm s3://my-bucket/file.txtaws s3 rm s3://my-bucket/prefix/ --recursiveS3 Bucket Policies
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowReadFromCDN", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::my-bucket/*" }, { "Sid": "DenyUnencryptedUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "AES256" } } } ]}Security checklist:
- Block public access unless intentionally hosting public content
- Enable versioning for important buckets
- Enable access logging
- Use bucket policies to limit who can read/write
- Never enable
ListBucketpermission for public access
RDS β Relational Database Service
# List RDS instancesaws rds describe-db-instances \ --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceStatus,Engine,DBInstanceClass]' \ --output table
# Create parameter group snapshot (before changing DB params)aws rds create-db-snapshot \ --db-instance-identifier my-db \ --db-snapshot-identifier pre-migration-snapshot-$(date +%Y%m%d)RDS awareness checklist:
- Multi-AZ for production (automatic failover ~1-2 min)
- Read replicas for read scaling
- Automated backups with appropriate retention period
- Parameter groups for DB tuning (require reboot to apply some)
- Security groups: only allow from app server security group, not
0.0.0.0/0 - Storage auto-scaling to prevent disk full
- Monitoring: FreeStorageSpace, DatabaseConnections, CPUUtilization
ECR β Elastic Container Registry
# Authenticate Docker to ECRaws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin \ 123456789012.dkr.ecr.us-east-1.amazonaws.com
# Create repositoryaws ecr create-repository \ --repository-name myapp \ --image-scanning-configuration scanOnPush=true \ --encryption-configuration encryptionType=AES256
# Tag and push imagedocker build -t myapp:latest .docker tag myapp:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0
# List imagesaws ecr list-images --repository-name myapp
# Lifecycle policy (auto-delete old images)aws ecr put-lifecycle-policy \ --repository-name myapp \ --lifecycle-policy-text '{ "rules": [ { "rulePriority": 1, "description": "Keep last 10 images", "selection": { "tagStatus": "any", "countType": "imageCountMoreThan", "countNumber": 10 }, "action": { "type": "expire" } } ] }'Cost Awareness β Common Mistakes
The Expensive Surprises
| Mistake | Cost Impact | Fix |
|---|---|---|
| Running EC2 without stopping (even idle) | Full hourly cost 24/7 | Stop dev instances at night, use auto-scaling |
| NAT Gateway data processing charges | $0.045/GB processed | Minimize traffic through NAT, use VPC endpoints for S3 |
| Forgotten snapshots/AMIs | Accumulates over time | Tag + lifecycle policies |
| S3 data transfer out | $0.09/GB out to internet | Use CloudFront, minimize cross-region transfers |
| RDS Multi-AZ in dev | 2x base cost | Single-AZ for dev, Multi-AZ for prod only |
| Oversized instances | Full cost of unused capacity | Right-size with CloudWatch metrics |
| Unattached EBS volumes | Continuous cost | Cleanup after instance termination |
# Find unattached EBS volumesaws ec2 describe-volumes \ --filters "Name=status,Values=available" \ --query 'Volumes[].[VolumeId,Size,CreateTime]' \ --output table
# Find old snapshotsaws ec2 describe-snapshots --owner-ids self \ --query 'Snapshots[].[SnapshotId,VolumeSize,StartTime,Description]' \ --output table
# Check for idle EC2 instances (CPU < 5% over 2 weeks)# Use AWS Cost Explorer and Compute Optimizeraws compute-optimizer get-ec2-instance-recommendationsCost hygiene habits:
- Enable billing alerts at $10, $50, $100
- Tag every resource with
Project,Environment,Owner - Use AWS Cost Explorer weekly
- Enable S3 Intelligent-Tiering for infrequently accessed data
- Use Reserved Instances or Savings Plans for stable workloads