← Back to Tutorials

DevOps Resources: CI/CD, Infrastructure as Code, Observability & Automation

devopscicdinfrastructure-as-codekubernetesobservability

DevOps Resources: CI/CD, Infrastructure as Code, Observability & Automation

This tutorial is a practical, command-heavy guide to core DevOps capabilities: CI/CD, Infrastructure as Code (IaC), observability, and automation. It’s written to be used as a reference you can copy from while building real pipelines and systems.


Table of Contents


1. What “DevOps” Means in Practice

DevOps is less a job title and more a set of operational outcomes:

A useful mental model is a feedback loop:

  1. Code changes are proposed (PR).
  2. CI runs tests, security checks, and builds artifacts.
  3. CD deploys to environments using consistent mechanisms.
  4. Observability detects regressions quickly.
  5. Automation accelerates response and prevents repeated manual work.

The goal is not “deploy more” at any cost; it’s “deploy more safely” with measurable reliability.


2. CI/CD: Build, Test, Package, Release

2.1 CI/CD design principles

A robust pipeline usually follows these principles:

A common anti-pattern is “deploy from a developer machine.” Instead, the pipeline should be the only path to production.


2.2 A minimal CI pipeline (GitHub Actions)

Even though this tutorial avoids YAML frontmatter, CI systems themselves often use YAML. The following is a minimal GitHub Actions workflow that:

Create .github/workflows/ci.yml:

name: ci

on:
  pull_request:
  push:
    branches: [ "main" ]

jobs:
  test-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - name: Install
        run: npm ci

      - name: Test
        run: npm test -- --ci

      - name: Build
        run: npm run build

Key details:

To run the same steps locally (a best practice for developer experience):

npm ci
npm test -- --ci
npm run build

2.3 Build artifacts, versioning, and SBOM

Artifacts are outputs of CI that you can deploy: a container image, a zip, a binary, etc. A key DevOps rule:

Build once; deploy many times.

Versioning

A practical approach is Semantic Versioning plus build metadata:

In CI, you can generate a version string:

git rev-parse --short HEAD
git describe --tags --always

SBOM (Software Bill of Materials)

An SBOM lists components included in your build. Many organizations require it for supply chain security.

Example using syft (works well for containers and directories):

# Install (macOS)
brew install syft

# Generate SBOM for a container image
syft your-image:tag -o spdx-json > sbom.spdx.json

# Or for a local directory
syft dir:. -o cyclonedx-json > sbom.cdx.json

You can store SBOMs as build artifacts and attach them to releases.


2.4 Container image build & push (Docker)

A typical pipeline builds an image and pushes it to a registry.

Build locally

docker build -t myapp:dev .
docker run --rm -p 8080:8080 myapp:dev

Tag with commit SHA

SHA="$(git rev-parse --short HEAD)"
docker tag myapp:dev "ghcr.io/yourorg/myapp:${SHA}"

Login and push (GitHub Container Registry example)

echo "$GITHUB_TOKEN" | docker login ghcr.io -u youruser --password-stdin
docker push "ghcr.io/yourorg/myapp:${SHA}"

Best practices:


2.5 Deployment strategies: rolling, blue/green, canary

How you deploy matters as much as what you deploy.

Rolling deployment

Replace instances gradually. Pros: simple; Cons: mixed versions during rollout.

Blue/green

Two environments (blue=live, green=next). Switch traffic after validation. Pros: fast rollback; Cons: higher cost.

Canary

Release to a small percentage of traffic, observe, then expand. Pros: safest at scale; Cons: requires routing/metrics maturity.

A canary mindset depends on observability: you must measure errors/latency and compare canary vs baseline.


3. Infrastructure as Code (IaC)

IaC is about managing infrastructure with the same discipline as software:

Two broad categories:

In modern setups, Kubernetes and managed services reduce the need for heavy configuration management, but it still matters for VMs, edge cases, and bootstrapping.


3.1 Terraform fundamentals

Terraform describes desired infrastructure in code and reconciles it via:

Basic workflow:

terraform fmt -recursive
terraform validate
terraform init
terraform plan -out tfplan
terraform apply tfplan

Important concepts:

State is critical: losing it can cause drift and accidental recreation.


3.2 Remote state, locking, and environments

Why remote state?

Local state (terraform.tfstate on a laptop) is dangerous:

Use a remote backend (S3 + DynamoDB locking on AWS, GCS on GCP, Terraform Cloud, etc.).

Even without showing provider-specific backend config, the operational commands look the same:

terraform init -reconfigure
terraform plan
terraform apply

Environments: dev/staging/prod

Avoid copy-pasting entire Terraform directories. Prefer:

Example usage:

terraform workspace new dev
terraform workspace select dev
terraform plan -var-file=env/dev.tfvars
terraform apply -var-file=env/dev.tfvars

Note: many teams prefer separate state per environment directory rather than workspaces, because it’s harder to accidentally apply to the wrong workspace when you’re tired.


3.3 Example: provisioning a VM (conceptual) + best practices

Terraform code varies by cloud, but the structure is consistent:

Best practices you can apply everywhere:

  1. Small modules with clear inputs/outputs
  2. No secrets in state
  3. Use terraform plan in CI and require approval for apply
  4. Tag resources (owner, cost center, environment)
  5. Policy checks (OPA/Conftest, Sentinel, or cloud-native policies)

A common CI pattern:

terraform fmt -check -recursive
terraform validate
terraform plan -no-color -out tfplan

Then, in a protected environment step (manual approval):

terraform apply -no-color tfplan

3.4 Configuration management: Ansible basics

Ansible is useful for:

Install:

python3 -m pip install --user ansible
ansible --version

Inventory example (inventory.ini):

[web]
10.0.0.10
10.0.0.11

Ping hosts:

ansible -i inventory.ini web -m ping

Run a command:

ansible -i inventory.ini web -a "uname -a"

Run a playbook:

ansible-playbook -i inventory.ini site.yml

Operational best practices:


4. Observability: Metrics, Logs, Traces, and SLOs

Observability answers: “What’s happening inside the system?”—not just “Is it up?”

Three pillars:

A fourth pillar often included in practice:


4.1 What to measure and why

Start with the Golden Signals (common SRE practice):

  1. Latency: how long requests take
  2. Traffic: request rate, throughput
  3. Errors: error rate, failed requests
  4. Saturation: resource utilization (CPU, memory, queue depth)

For APIs, also track:

A good metric is:


4.2 Prometheus + Grafana quickstart (local)

You can run Prometheus and Grafana locally using Docker. This section uses real commands and focuses on the operational flow.

Start Grafana quickly

docker run -d --name grafana -p 3000:3000 grafana/grafana:latest

Open http://localhost:3000 (default login is admin / admin, then change it).

Run a node exporter (host metrics)

docker run -d --name node-exporter -p 9100:9100 prom/node-exporter:latest
curl -s http://localhost:9100/metrics | head

Run Prometheus

Prometheus needs a config file. Create prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["host.docker.internal:9100"]

Run Prometheus:

docker run -d --name prometheus \
  -p 9090:9090 \
  -v "$PWD/prometheus.yml:/etc/prometheus/prometheus.yml:ro" \
  prom/prometheus:latest

Open http://localhost:9090.

Try a query:

What you just built:

This is the same pattern you’ll use in Kubernetes and production, just with service discovery and more robust storage.


4.3 Logging with structured JSON and correlation IDs

Logs become dramatically more useful when they are:

A simple example of emitting JSON logs from a shell script:

REQUEST_ID="$(uuidgen | tr '[:upper:]' '[:lower:]')"
echo "{\"level\":\"info\",\"msg\":\"request started\",\"request_id\":\"$REQUEST_ID\",\"service\":\"payments\",\"env\":\"dev\"}"

In application code, you typically:

When logs are centralized (ELK/OpenSearch, Loki, Cloud Logging), you can search by request_id to reconstruct user journeys.


4.4 Distributed tracing with OpenTelemetry

Distributed tracing is essential once you have multiple services. OpenTelemetry (OTel) is the industry standard for instrumentation.

Concepts:

A practical approach:

  1. instrument services with OpenTelemetry SDK
  2. export traces to a collector
  3. send to a backend (Jaeger, Tempo, Honeycomb, etc.)

Run Jaeger locally:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

If your app exports OTLP to http://localhost:4318, you can view traces in Jaeger.

Why tracing matters operationally:


4.5 SLOs, error budgets, and alerting

SLIs are measurements (e.g., “% of requests under 300ms”). SLOs are targets (e.g., “99.9% under 300ms over 30 days”). SLAs are contracts with users/customers.

Example SLI/SLO:

Error budget:

Alerting guidance:

A simple Prometheus-style alert query conceptually looks like:

Even if your tooling differs, the principle is the same: alerts should be actionable and tied to SLOs.


5. Automation: Repeatability at Scale

Automation is how you remove manual, error-prone steps. It’s also how you scale operations without scaling headcount linearly.

Targets for automation:


5.1 Makefiles and task runners

A Makefile is a simple, effective way to standardize local workflows.

Example Makefile:

SHELL := /bin/bash

.PHONY: test build run docker-build docker-run fmt

fmt:
	npm run fmt

test:
	npm test

build:
	npm run build

run:
	npm start

docker-build:
	docker build -t myapp:local .

docker-run:
	docker run --rm -p 8080:8080 myapp:local

Now developers can run:

make test
make docker-build
make docker-run

This reduces “works on my machine” problems by making the happy path consistent.


5.2 Shell scripting patterns for safe automation

Shell scripts are powerful but can be dangerous without guardrails.

Use strict mode:

set -euo pipefail
IFS=$'\n\t'

Add logging and validation:

#!/usr/bin/env bash
set -euo pipefail

log() { printf '%s %s\n' "$(date -u +%FT%TZ)" "$*"; }

: "${ENVIRONMENT:?ENVIRONMENT is required}"
: "${IMAGE_TAG:?IMAGE_TAG is required}"

log "Deploying ${IMAGE_TAG} to ${ENVIRONMENT}"

Dry-run patterns:

DRY_RUN="${DRY_RUN:-0}"

run() {
  if [[ "$DRY_RUN" == "1" ]]; then
    echo "[dry-run] $*"
  else
    eval "$@"
  fi
}

run "echo Deploy step here"

Idempotency matters: scripts should be safe to re-run after partial failure.


5.3 GitOps workflows

GitOps is an operational model where:

Benefits:

Typical flow:

  1. CI builds and pushes image myapp:<sha>
  2. CI updates deployment config repo to reference <sha>
  3. GitOps controller applies change to cluster
  4. Observability confirms health

Even outside Kubernetes, the model applies: treat operational state as code, reconcile continuously.


6. Security Essentials: Supply Chain, Secrets, and Least Privilege

DevOps without security becomes “fast failure.” Modern DevOps integrates security into pipelines and daily workflows.

6.1 Secrets management

Rules:

Practical local check: scan for accidental secrets before pushing:

git diff --cached | grep -Ei "api_key|secret|password|token" || true

Better: use dedicated scanners (e.g., gitleaks):

brew install gitleaks
gitleaks detect --source . --no-git

At runtime, use:


6.2 Container scanning and signing

Scan images for vulnerabilities:

brew install trivy
trivy image myapp:local

Sign images (conceptually) with Sigstore Cosign:

brew install cosign
cosign version

In real pipelines, you’d sign the pushed image and verify signatures during deployment admission.


7. A Practical End-to-End Example (Local)

This section ties together CI-like steps, containerization, and basic observability locally.

Step 1: Build and test

npm ci
npm test
npm run build

Step 2: Build a container image

docker build -t myapp:local .
docker run --rm -p 8080:8080 myapp:local

Step 3: Add a basic health check endpoint

If your app supports it, expose:

Then you can validate:

curl -i http://localhost:8080/healthz
curl -i http://localhost:8080/readyz

Step 4: Emit metrics (conceptually) and scrape them

If your app exposes /metrics in Prometheus format:

curl -s http://localhost:8080/metrics | head

Then configure Prometheus to scrape it (add a job in prometheus.yml) and query in Prometheus:

Step 5: Add request correlation in logs

Have your reverse proxy or app add a request ID header, then log it. Validate by making a request and checking logs:

curl -H "X-Request-Id: test-123" http://localhost:8080/
docker logs <container_id> | tail -n 50

This is the smallest “full loop” that resembles production: build → run → observe.


8. Curated Resource List

Below is a focused list of high-value resources by category.

CI/CD

Infrastructure as Code

Observability

Security / Supply Chain

Automation & Operations


Closing Notes

A mature DevOps practice is built from small, repeatable building blocks:

If you want, tell me your stack (cloud provider, language, container/Kubernetes or VM-based) and I can adapt this into a concrete blueprint with a recommended repo structure, pipeline stages, and observability setup tailored to your environment.