devops

Job Readiness

Resume with engineering impact, GitHub portfolio, interview story building, debugging interviews, and system design basics


Resume with Engineering Impact

The biggest mistake engineers make on resumes: listing tools instead of impact.

Wrong vs Right

WRONG:
β€’ Used Docker, Kubernetes, Terraform, Jenkins, AWS
RIGHT:
β€’ Reduced deployment time from 45 minutes to 8 minutes by migrating from
manual deployments to a containerized CI/CD pipeline (Docker, GitHub Actions, EKS)
β€’ Decreased infrastructure costs by 35% ($12k/month) by right-sizing EC2 instances
using CloudWatch metrics and implementing auto-scaling policies
β€’ Improved mean time to recovery from 2 hours to 18 minutes by building
centralized observability with Prometheus, Grafana, and structured logging

The Formula

[Action verb] + [what you did] + [how you did it] + [measurable result]
"Automated database backup verification, eliminating 3 hours/week of manual work
and catching 2 silent backup failures before they became incidents"

Impact numbers to track:

  • Deployment frequency before/after
  • MTTR before/after
  • Cost savings ($)
  • Time saved (hours/week)
  • Reliability improvement (99.x% uptime)
  • Performance improvement (latency reduction, throughput increase)

Structure

[Name] | [LinkedIn] | [GitHub] | [Location]
SUMMARY (2-3 lines)
DevOps/Platform engineer with X years of experience building CI/CD pipelines
and cloud infrastructure. Focus on reliability, automation, and developer experience.
EXPERIENCE
Company Name β€” Job Title (dates)
β€’ Impact bullet 1
β€’ Impact bullet 2
β€’ Impact bullet 3
SKILLS
Cloud: AWS (EC2, EKS, RDS, S3, IAM), Azure
Containers: Docker, Kubernetes, Helm
IaC: Terraform, CloudFormation
CI/CD: GitHub Actions, Azure Pipelines, ArgoCD
Observability: Prometheus, Grafana, OpenTelemetry
Languages: Bash, Python, Go (basic)

GitHub as Portfolio

Your GitHub is your portfolio. Recruiters and hiring managers look at it.

What Makes a Good Portfolio

  1. Real projects with READMEs β€” not tutorial repos
  2. Consistent commits β€” shows you actually work, not just courses
  3. DevOps-specific repos (examples below)
  4. Pinned repos β€” pin your 6 best repos

Project Ideas for DevOps Portfolio

1. Home Lab Infra (Terraform + Kubernetes)
- Terraform modules for AWS infrastructure
- K8s cluster with Helm charts for your apps
- CI/CD pipeline to deploy on git push
- README explaining architecture
2. Automated Blog/Portfolio CI/CD
- Your own blog (this one!) deployed with a pipeline
- Show: GitHub Actions, Docker, deployment
3. Observability Stack
- docker-compose with Prometheus + Grafana + sample app
- Pre-built dashboards with useful alerts
4. Incident Response Toolkit
- Bash/Python scripts for common diagnostics
- Document your runbook template

GitHub Profile README

Create a README.md in a repo named [yourusername]/[yourusername] β€” it appears on your profile page.

# Hi, I'm [Name]
DevOps engineer focused on reliability and automation.
## What I'm working on
- Building a multi-region Kubernetes setup with GitOps (Argo CD)
- Learning SRE practices through hands-on labs
## Tech Stack
Docker Β· Kubernetes Β· Terraform Β· AWS Β· GitHub Actions Β· Prometheus
## Recent Projects
- [infra-homelab](link) β€” Full AWS infrastructure with Terraform
- [devops-notes](link) β€” My learning notes (this site!)

Interview Story Building

Interviewers ask behavioral questions β€” β€œTell me about a time when…” Use the STAR method.

STAR Method

ComponentQuestion it answers
SituationWhat was the context?
TaskWhat were you responsible for?
ActionWhat did you specifically do?
ResultWhat was the measurable outcome?

Build Your Story Bank

Prepare stories for these common categories:

Incident/Problem Solving:

β€œTell me about a time you resolved a production incident.”

S: Our checkout service was throwing 500 errors at 2am, affecting 20% of users.
T: I was on-call and responsible for restoring service.
A: I checked deployment history (recent deploy 30min earlier), reviewed error logs
(saw null pointer on payment config), identified missing env var, rolled back deployment.
R: Service restored in 18 minutes. Wrote postmortem, added env var validation to CI/CD.

Automation:

β€œTell me about a time you improved a process.”

Collaboration/Communication:

β€œTell me about a time you had a conflict with a teammate.”

Failure/Learning:

β€œTell me about a mistake you made and what you learned.”

Impact:

β€œTell me about your biggest technical accomplishment.”


Debugging Interviews

Technical interviews for DevOps often include debugging scenarios. These are less about knowing the answer and more about demonstrating your thought process.

The Framework

1. CLARIFY β€” "Before I dive in, can I ask a few questions?"
- What does the user/customer actually experience?
- When did this start? Any recent changes?
- What environment (dev/staging/prod)?
2. GATHER DATA β€” don't guess, look at evidence
- Metrics (is there an alert? what does the graph show?)
- Logs (recent errors in logs?)
- Recent changes (deployment, config change, infra change?)
3. HYPOTHESIZE β€” form a hypothesis before testing
- "Based on this error message, I think it could be X because..."
- "I'd first check Y because it's the most likely cause"
4. TEST ONE THING AT A TIME β€” systematic, not random
- If you change two things at once, you don't know which fixed it
5. COMMUNICATE β€” narrate what you're doing and why
- "I'm checking the service logs first because the error suggests..."

Common Interview Scenarios

β€œA service is returning 503 errors β€” what do you do?”

1. Check if the service is running (kubectl get pods, ps aux)
2. Check if the port is listening (ss -tlnp | grep :8080)
3. Check the service logs (kubectl logs, journalctl)
4. Check upstream dependencies (database, cache, downstream services)
5. Check load balancer health checks
6. Check recent deployments

β€œDisk is full on a server β€” what do you do?”

1. df -h β€” which filesystem?
2. df -i β€” could be inode exhaustion
3. du -sh /* β€” find large directories
4. Find large files: find / -type f -size +500M
5. Common culprits: /var/log, Docker images, core dumps, tmp files
6. Quick fix: truncate large logs, clean docker, clear tmp
7. Long-term: log rotation, monitoring alert, increase disk

System Design Basics (Entry Level)

You’ll get basic system design questions even as a junior/mid DevOps engineer. Know the fundamentals.

The Framework

1. Clarify requirements
- Scale: how many users? requests/sec? data size?
- Availability requirements? (99.9%? 99.99%?)
- Read-heavy or write-heavy?
2. Estimate scale
- 1M users, 10% daily active, 10 requests each = 1M req/day = ~12 req/sec
- 1TB of data, growing 1% daily
3. High-level design
- Draw the components (LB, app servers, DB, cache, CDN)
- Show data flow
4. Deep dive
- Interviewer will pick a component to go deeper on
5. Identify failure points and mitigations

Building Blocks

Load Balancer β†’ App Servers β†’ Cache β†’ Database
↓
Message Queue β†’ Worker Servers
↓
Object Storage (S3)
ComponentWhen to use
Load balancerMultiple app servers, horizontal scaling
Cache (Redis)Repeated reads, session storage, rate limiting
CDNStatic assets, geographically distributed users
Message queueAsync tasks, decouple producers/consumers
Database read replicaRead-heavy workloads
Database shardingWrite-heavy, very large datasets
Object storageImages, videos, backups, logs

Example: β€œDesign a URL shortener”

Requirements:
- 100M URLs shortened per day
- Redirect latency < 10ms
- Links don't expire (or have configurable TTL)
Scale:
- 100M writes/day = 1,150/sec
- Reads likely 10-100x writes = 11,500-115,000 req/sec
Design:
[Client] β†’ [CDN] β†’ [Load Balancer] β†’ [App Servers] β†’ [Redis Cache]
↓
[Database]
Short URL generation: base62 encode a counter or random ID
Redirect: look up in Redis cache first β†’ DB if not cached
Database: simple key-value (short_code β†’ original_url, metadata)
Cache: hot URLs cached in Redis with 24h TTL
Scale considerations:
- App servers: stateless, scale horizontally
- Redis: cluster mode for high availability
- Database: read replicas for redirect lookups
- Global: deploy in multiple regions, use Route53 latency-based routing

The STAR Summary

Everything comes back to this: engineers who get hired are the ones who can demonstrate impact, not just tool knowledge.

Before any interview:

  1. List 5-7 technical accomplishments with measurable results
  2. Prepare STAR stories for each
  3. Practice explaining your architecture decisions out loud
  4. Be ready to say β€œI don’t know, but here’s how I’d find out”

Interviewers hire people they’d want to debug a production incident with at 3am. Be that person β€” calm, systematic, communicative, and honest about what you don’t know.