CI/CD β Delivery Engineering
Pipeline design, build-test-scan-artifact, secrets management, deployment strategies, and failure handling
CI vs CD
| Continuous Integration (CI) | Continuous Delivery (CD) | Continuous Deployment | |
|---|---|---|---|
| What | Merge + build + test automatically | Also packages + deploys to staging | Also deploys to production automatically |
| Goal | Catch integration bugs fast | Always have a deployable artifact | Ship to users on every commit |
| Human gate | After tests | Before production | None |
| Required for | All teams | Most product teams | High-maturity teams with great tests |
The confusion: βCI/CDβ usually means CI + Continuous Delivery. True Continuous Deployment to production is rare and requires very mature test coverage.
Pipeline Design Principles
- Fast feedback β developers should know if something broke in minutes, not hours
- Fail fast β run the fastest checks first (linting before integration tests)
- Reproducible β same input always produces same output
- Idempotent β running the pipeline twice doesnβt cause problems
- Observable β logs, artifacts, and metrics at every stage
- Secure β secrets never in logs, minimal permissions per stage
Commit β [Lint] β [Unit Test] β [Build] β [Security Scan] β [Integration Test] β [Push Artifact] β [Deploy Staging] β [Deploy Production] β fast β parallel possible β gate here β β manual approval?Pipeline as Code
Everything in version control. No clicking in UIs to configure pipelines.
# GitHub Actions examplename: CI/CD Pipeline
on: push: branches: [main] pull_request: branches: [main]
jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - run: npm ci - run: npm run lint - run: npm run typecheck
test: runs-on: ubuntu-latest needs: lint # only run if lint passes services: postgres: image: postgres:16 env: POSTGRES_PASSWORD: test options: >- --health-cmd pg_isready --health-interval 10s steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - run: npm ci - run: npm test env: DATABASE_URL: postgres://postgres:test@localhost:5432/test
build-and-push: runs-on: ubuntu-latest needs: test if: github.ref == 'refs/heads/main' # only on main branch permissions: id-token: write # for OIDC auth to AWS contents: read steps: - uses: actions/checkout@v4 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/github-actions-role aws-region: us-east-1 - name: Build and push to ECR run: | IMAGE_TAG="${{ github.sha }}" docker build -t myapp:$IMAGE_TAG . docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:$IMAGE_TAGBuild β Test β Scan β Artifact
Build
# Reproducible builds β always specify exact versions- name: Build Docker image run: | docker build \ --build-arg BUILD_DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ) \ --build-arg GIT_SHA=${{ github.sha }} \ --build-arg VERSION=${{ github.ref_name }} \ -t myapp:${{ github.sha }} \ .Test
# Run tests with coverage- name: Run tests run: npm test -- --coverage --ci
- name: Upload coverage uses: codecov/codecov-action@v4Test pyramid: More unit tests (fast, cheap) β fewer integration tests β even fewer E2E tests (slow, expensive).
Scan
# Security scanning β multiple layers- name: Scan dependencies for vulnerabilities run: npm audit --audit-level=high
- name: Scan Docker image uses: aquasecurity/trivy-action@master with: image-ref: myapp:${{ github.sha }} severity: 'CRITICAL,HIGH' exit-code: '1' # fail pipeline if found
- name: Scan for secrets in code uses: trufflesecurity/trufflehog@main with: path: ./Artifact
# Tag with both git SHA (immutable) and version tag- name: Tag and push artifact run: | GIT_SHA="${{ github.sha }}" VERSION="1.2.3"
docker tag myapp:$GIT_SHA myrepo/myapp:$GIT_SHA docker tag myapp:$GIT_SHA myrepo/myapp:$VERSION docker tag myapp:$GIT_SHA myrepo/myapp:latest
docker push myrepo/myapp:$GIT_SHA # immutable reference docker push myrepo/myapp:$VERSION docker push myrepo/myapp:latest
# Store build artifacts- name: Upload build artifacts uses: actions/upload-artifact@v4 with: name: build-artifacts path: ./dist/ retention-days: 30Secrets Management
Never Do This
# WRONG β secrets in code, in logs, in history- run: docker login -u myuser -p mysecretpassword123
env: API_KEY: sk-live-abc123 # visible in git history foreverDo This Instead
# Use GitHub Secrets- run: echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
# Use OIDC (no long-lived credentials at all)- uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/github-actions aws-region: us-east-1 # GitHub gets temporary credentials via OIDC β no secrets stored anywhereSecrets management tools:
| Tool | Use case |
|---|---|
| GitHub/GitLab Secrets | CI/CD pipeline secrets |
| AWS Secrets Manager | Runtime secrets, auto-rotation |
| HashiCorp Vault | Multi-cloud, complex secret workflows |
| AWS Parameter Store | Config + simple secrets (cheaper) |
| SOPS | Encrypted secrets in Git |
| External Secrets Operator | Sync secrets into K8s |
Deployment Strategies
Rolling Deployment
Replace instances one at a time. Zero downtime, slow rollback.
Before: [v1] [v1] [v1] [v1]Step 1: [v2] [v1] [v1] [v1]Step 2: [v2] [v2] [v1] [v1]Step 3: [v2] [v2] [v2] [v1]After: [v2] [v2] [v2] [v2]# Kubernetes rolling update (default)spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # create 1 extra pod before removing old maxUnavailable: 0 # never reduce capacity during updateBest for: Standard deployments where you can tolerate both versions running simultaneously.
Blue-Green Deployment
Run two identical environments (blue = current, green = new). Switch traffic instantly.
Blue (v1): [LB] β [v1] [v1] [v1] β receives trafficGreen (v2): [ ] [v2] [v2] [v2] β idle, being deployed
Switch:Blue (v1): [ ] [v1] [v1] [v1] β idle (kept for rollback)Green (v2): [LB] β [v2] [v2] [v2] β now receives trafficPros: Instant rollback (just switch LB back), no mixed versions in production Cons: Double the infrastructure cost during deployment, DB migration complexity
Canary Deployment
Send a small percentage of traffic to the new version, gradually increase.
v1: ββββββββββββββββββββ 95% trafficv2: β 5% traffic β canary
Monitor metrics... if healthy:
v1: ββββββββββββ 50% trafficv2: ββββββββββββ 50% traffic
Continue until...
v1: 0%v2: ββββββββββββββββββββ 100%Best for: High-risk changes where you want to validate with real traffic before full rollout. Requires good observability to detect regressions.
Rollback Mechanisms
# Kubernetes β rollback deploymentkubectl rollout undo deployment/myappkubectl rollout undo deployment/myapp --to-revision=3
# Check rollout historykubectl rollout history deployment/myapp
# Docker Swarmdocker service update --rollback myapp
# ECSaws ecs update-service \ --cluster production \ --service myapp \ --task-definition myapp:42 # previous task definition
# General principle: tag images with git SHA, deploy by SHA, rollback = redeploy old SHAFailure Handling
In the Pipeline
# Allow a step to fail without failing the whole pipeline- name: Optional security scan run: trivy image myapp:latest continue-on-error: true
# Retry on flaky steps- name: Integration tests run: npm run test:integration timeout-minutes: 10
# Notifications on failure- name: Notify on failure if: failure() uses: slackapi/slack-github-action@v1 with: payload: | { "text": "Pipeline failed on ${{ github.repository }} (${{ github.sha }})" }In Deployment
# Always verify deployment after rolling out# Health check endpoint pattern:GET /health β 200 OK{ "status": "ok", "version": "1.2.3", "git_sha": "abc1234"}
# Automated smoke tests after deploycurl -f https://myapp.example.com/health || \ (echo "Health check failed, rolling back" && kubectl rollout undo deployment/myapp)Pipeline Observability (Logs & Artifacts)
# Always upload test results- name: Upload test results uses: actions/upload-artifact@v4 if: always() # upload even on failure with: name: test-results path: | ./test-results/ ./coverage/
# Annotate failures with test details- uses: dorny/test-reporter@v1 if: always() with: name: Test Results path: test-results/*.xml reporter: java-junit
# Track deployment in external system- name: Create deployment record uses: chrnorm/deployment-action@v2 with: token: ${{ secrets.GITHUB_TOKEN }} environment: production ref: ${{ github.sha }}What to track per pipeline run:
- Git SHA being built/deployed
- Start time, end time, duration
- Test results (pass/fail counts, flaky tests)
- Artifact versions and locations
- Deploy target and version before/after
- Who triggered the pipeline (human or automated)