Table of Contents
- General
- Infrastructure as Code
- Logging
- Tracing
- Monitoring & Observability
- Networking
- PaaS: Platform as a Service
- Data visualization and dashboards
- Querying
- Data Analytics
- Data Science and Machine Learning
- Security
- Testing
- Feature toggles/flags
- Development and ephemeral environments
- Internal Developer Platform
- Serverless
- Databases
- Backoffice
- REST APIs
- Marketing
- Tools for online events
- Social networks
- Hotspots and code analysis
- Playgrounds
- Books and other resources
- LLMs
- DevOps Lifecycle Mesh
- Kubernetes (k8s)
- Tips, tricks, tools, etc: https://twitter.com/patrickdebois/status/1221114733186682885
- Flux
- k8s deployment tool: https://github.com/slok/kahoy
- Kustomize
- Popeye: cluster resource sanitizer
- Skaffold
- Cilium: Network policies
- https://github.com/cilium/hubble
- Local k8s:
- minikube
- https://kind.sigs.k8s.io/
- NATS (Cloud messaging)
- AWS
- https://awsclibuilder.com/home
- https://github.com/open-guides/og-aws
- AWS Security Best Practices Assessment, Auditing, Hardening and Forensics Readiness Tool
- Certifications recommended: CSAA (Certified Solutions Architect - Associate) and CSysOpA (Certified SysOps Administrator - Associate)
- AWS mock for local testing:
- https://github.com/localstack/localstack
- https://github.com/p4tin/goaws (faster for SQS/SNS)
- Moto
- Write blog post for terratest + localstack: https://github.com/gruntwork-io/terratest/blob/master/test-docker-images/moto/README.md#localstack
- Codebase with localstack examples
- Data centers
- 12factor
- Consul
- Consul/etcd registrator: https://github.com/gliderlabs/registrator
- Consul snapshots: https://github.com/pshima/consul-snapshot
- LinkerD
- Service Mesh
- Load balancing
- Service discovery: we used Consul
- Circuit breaking: detect and remove unhealthy nodes.
- Retries and timeouts
- Traffic split: e.g. for canary releases and blue/green deployment
- No ingress controller
- It offers all the info about what is happening (observability).
- Kong:
- Open Source API gateway
- Authentication
- Traffic control: restrict inbound/outbound traffic
- Load balancing
- Healthchecks and circuit breakers
- Squid (HTTP proxy)
- Varnish: cache, for HTTP performance, e.g. behind an nginx.
- HA proxy
- PagerDuty
- Distributed tracing
- Zipkin
- Jaeger
- NATS: Cloud-native messaging system.
- OpenEBS: Opensource cloud native storage solution.
- DNS:
- VPN
- https://docs.google.com/presentation/d/1TUz8TtLu6Y-UdOsXgZwanjqcIMPu-LRqyoMyEikTuvc/edit?ts=5ef10fa0#slide=id.p
- https://github.com/sshuttle/sshuttle
- https://www.wireguard.com/
- https://openvpn.net/
- Openswan
- ProtonVPN
- Windscribe
- https://tunnelblick.net/ (OpenVPN client for Mac)
- Git
- Bash scripting
- Terraform
- Atlantis
- Alternative to Terraform Cloud
- Pulumi
- Packer: immutability
- Docker
- lazydocker
- Docker platforms
- Ansible
- Chef
- Puppet
- fluentd
- systemd-journald
- Logz.io
- Logstash
- Splunk
- ELK
- DataDog
- https://grafana.com/oss/loki/
- Zipkin
- Jaeger
- OpenTelemetry
- Sentry
- DataDog
- Comparisons
- OpenTelemetry
- Prometheus
- https://prometheus.io/docs/prometheus/latest/getting_started/
- Cortex: Cortex: horizontally scalable, highly available, multi-tenant, long term storage for Prometheus
- promtool: unit testing for rools
- https://chrome.google.com/webstore/detail/prometheus-formatter/jhfbpphccndhifmpfbnpobpclhedckbb?hl=en
- Grafana
- ELK
- Logz.io
- Statuspage
- Pingdom
- Runscope: healthcheck for external customers, connected to PagerDuty, Slack...
- Consul dashboards: internal healthcheck status
- Grafana + Prometheus
- AWS Cloudwatch + Lambda
- Postman
- Lightweight monitoring (uptime, healthcheck): https://jvns.ca/blog/2022/07/09/monitoring-small-web-services/
- Uptime
- Traffic interceptor:
- Wireshark
- https://httptoolkit.com/
- https://proxyman.io/
- Heroku
- Render
- https://fly.io/ (used by jvns)
- act: Run your GitHub Actions locally!
- SQL for the cloud: https://steampipe.io/
- Tools: https://twitter.com/episuarez/status/1338035772608360451
- Amplitude
- Mixpanel
- https://posthog.com/
- Debezium: Change Data Capture (CDC)
- Streamlit: Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.
- Hugging Face Spaces: Hugging Face Spaces offer a simple way to host ML demo apps directly on your profile or your organization's profile. [Backend]
- Gradio): Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere! [Frontend]
- MLFlow: MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
- MLOps: end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable ML-powered software.
- Kubeflow: Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes.
- VertexAI (GCP): machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. Fast, scalable, and easy-to-use AI technologies. Branches of AI, network AI, and artificial intelligence fields in depth on Google Cloud.
- Model Cards Google Cloud: the README.md for AI models. Model cards aim to provide a concise, holistic picture of a machine learning model. The value of a shared understanding of AI models.
- Amazon SageMaker: Build, train, and deploy machine learning models for any use case with fully managed infrastructure, tools, and workflows. Amazon SageMaker is a cloud machine-learning platform that enables developers to create, train, and deploy machine-learning models in the cloud. It also enables developers to deploy ML models on embedded systems and edge-devices.
- Auth0
- JWT
- OAuth
- Cloudflare
- Fail2ban
- Hashicorp Vault: secrets management
- https://www.doppler.com/
- AWS Secrets Manager
- AWS Systems Manager Parameter Store
- Snyk
- ZAProxy: Automate network vulnerability scans (Internet facing networks and systems)
- Amazon GuardDuty: Automate network intrusion detection
- https://github.com/zricethezav/gitleaks
- https://snyk.io/product/container-vulnerability-management/
- https://www.trendmicro.com/en_us/business/products/hybrid-cloud/deep-security.html
- https://www.hackthebox.com/
- https://www.iriusrisk.com/
- Postman
- Localstack
- TestContainers
- MockAPI
- e2e tests
- Cypress
- Playwright
- https://www.qawolf.com/ (SaaS)
- Load testing
- Pact
- Hey: HTTP load generator
- ToxiProxy: to simulate network and system conditions for chaos and resiliency testing.
- Mail SMTP servers: https://github.com/mailhog/MailHog
- Visual testing:
- https://applitools.com/
- https://percy.io/
- Oculow
- BrowserStack
- https://twitter.com/morvader/status/1452584938482634757
- https://github.com/Netflix/SimianArmy
- https://github.com/netflix/chaosmonkey
- https://medium.com/@adhorn/chaos-engineering-part-3-61579e41edd8
- https://netflix.github.io/chaosmonkey/
- https://chaostoolkit.org/
- https://github.com/KTH/royal-chaos
- https://docs.litmuschaos.io/
- OpenFeature: CNCF initiative
- https://twitter.com/mpjme/status/1301127511967961089
- GitLab Feature Flags
- https://flagger.app/
- https://www.split.io/
- Togglz
- https://launchdarkly.com/
- Unleash
- GitLab/Azure feature flags
- https://learn.hashicorp.com/tutorials/terraform/blue-green-canary-tests-deployments
- https://groundcontrol.sh/
- Flipper cloud (used in Devengo, for example)
- Flagsmith
- Tools
- https://code.visualstudio.com/docs/remote/containers
- https://github.com/features/codespaces
- https://www.gitpod.io/: Spin up fresh, automated dev environments for each task, in the cloud, in seconds (IDE as a Service).
- https://www.bunnyshell.com/: Full-stack production-like replicas on any cloud.
- https://www.okteto.com/: Instantly spin up pre-configured environments in the cloud and start developing within seconds
- https://localstack.cloud/
- Readings
- Backstage
- Roadie (based on Backstage)
- Humanitec
- https://www.cortex.io/
- Other alternatives: https://internaldeveloperplatform.org/developer-portals/
- AppSmith: internal custom application development, LowCode
- https://www.serverless.com/
- https://tech.genial.ly/en-cualquier-aplicaci%C3%B3n-o-sistema-existen-una-serie-de-acciones-que-t%C3%ADpicamente-nos-permiten-797615db17b6
- https://homeschool.dev/class/production-ready-serverless
- How we used serverless to speed up our servers" by Jessica Kerr and Ian Wilkes
- https://dbeaver.io/
- https://www.beekeeperstudio.io/
- PostgreSQL execution plan visualizer
- https://planetscale.com/
- MySQL
- https://www.forestadmin.com/
- https://www.glideapps.com/
- https://www.jetadmin.io/
- https://retool.com/
- Search for more no-code tools
- https://mockoon.com/
- OpenAPI
- Liquibase
- Flyway
- https://dev.to/juanvegadev/language-and-framework-agnostic-database-migrations-56bj
- HotJar: Website Heatmaps & Behavior Analytics Tools
- Clarity (from Microsoft): free user behavior analytics tool, Free Heatmaps & Session Recordings
- LogRocket: Session Replay | Product Analytics | Error Tracking | Identify technical and UX issues with our AI, quantify impact with analytics, and then watch session replays to see exactly what went wrong
- Google Tag Manager: measure your advertising ROI
- Product updates announcement: https://announcekit.app/
- https://www.hoppier.com/blog/best-virtual-event-platforms-and-tools-for-2021
- https://streamyard.com/
- https://vidiv.com/ (used in Tarugoconf)
- Fishbowl: https://www.stooa.com/es
- SpatialChat
- Survey: https://www.mentimeter.com/enterprise
- GetStream: Build In-App Chat. Video & Audio + Feeds
- Codescene
- Glean (System for collecting, deriving and querying facts about source code):
- https://next.github.com/projects/repo-visualization
- https://understandlegacycode.com/blog/focus-refactoring-with-hotspots-analysis/
- https://github.com/smontanari/code-forensics
- https://github.com/pbmiguel/behavioural-code-analyser
- nginx playground
- SQL playground
- https://jvns.ca/blog/2023/04/17/a-list-of-programming-playgrounds/
- https://github.com/marcosnils/awesome-playgrounds
- Books
- https://www.goodreads.com/review/list/6102002-isidro-l-pez?shelf=systems
- "Infrastructure as Code"
- "Release it!"
- Devops Handbook
- SRE
- https://gumroad.com/l/aws-good-parts/released
- https://www.amazon.com/Designing-Distributed-Systems-Patterns-Paradigms
- Posts
- Training
- Workshops, exercises and examples:
- ChatGPT
- Gemini
- Perplexity
- DeepSeek
- Llama
- Accelerate Your Learning with ChatGPT
- IDEs and AI assistants
- Rules for AI from Edu Ferro
- My LLM codegen workflow atm
- GitHub Copilot
- Cursor
- JetBrains AI Assistant
- Aider: AI pair programming
- Claude Engineer
- Claude AI
- Cline, for example with Claude as a provider (looks like only for VSCode?)
- Sweep AI: AI coding assitant for JetBrains
- Repomix: a powerful tool that packs your entire repository into a single, AI-friendly file. It is perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, DeepSeek, Perplexity, Gemini, Gemma, Llama, Grok, and more.