DevOps & Platform Engineer

Designing, automating, and operating production infrastructure - from serverless AWS to national-scale government platforms.

Recently

Built a self-hosted multi-agent AI code reviewer (39s per 64-file MR, ~$50/mo vs $5K–15K SaaS), a patent-pending distributed AI platform, and national-scale government infrastructure across three Ethiopian ministries.

Highlights

Co-validated 23 of 28 claims on a non-provisional utility patent. Zero long-lived credentials AWS architecture. Self-hosted multi-agent AI code review at $50/month vs $5K-15K SaaS. GitLab CI/CD pipelines with Trivy scanning and supply-chain auditing.

Background

4+ years production operations across banking, government, and cloud-native platforms, with patent-validated work in distributed AI infrastructure. BSc Computer Science, Admas University.

Selected Work

Case Studies

Six projects across distributed AI, government infrastructure, serverless logistics, banking, self-hosted AI tooling, and CI/CD architecture.

Nov 2025 – Present Platform Engineering

NetrionX DDEF Platform

Distributed AI-driven infrastructure orchestration with patent-validated architecture

5 nodes AWS + GCP + Edge
11.6ms avg RTT, 0.35ms jitter
23/28 patent claims validated
20,820+ AI decision cycles
Read full case study

Architected and deployed a 5-node distributed platform spanning AWS, GCP, and physical edge (Jetson Nano), connected via WireGuard/VXLAN mesh overlay. Built SR-MPLS transport fabric using FRRouting with IS-IS adjacency and BGP AS65001, assigning segment routing labels for deterministic path selection.

Configured ONOS SDN controller with OpenFlow 1.3 managing 4 virtual devices and 12 bidirectional links. Integrated autonomous DQN + LSTM decision engine executing 20,820+ decision cycles with 1,015+ autonomous placement actions at 2ms latency and 84.5% average confidence.

Built heterogeneous inference pipeline across three GPU tiers - T4 (PyTorch, 150ms), L4 (ONNX-GPU, 90ms), Jetson Nano (TensorRT FP16, 24ms). Built Grafana observability stack with 29 real-time panels covering system metrics, network health, AI engine telemetry, and ONOS topology.

ONOS AWS T4 GCP L4 Edge K3s WireGuard / VXLAN mesh · SR-MPLS fabric DQN+LSTM decision engine · 84.5% confidence · 2ms latency
Python · ONOS SDN · OpenFlow 1.3 · FRRouting · SR-MPLS · BGP · IS-IS · WireGuard · VXLAN · K3s · NVIDIA RuntimeClass · PyTorch · ONNX · TensorRT · Prometheus · Grafana
Mid-2025 – Present Government Infrastructure

OpenG2P Ethiopia

National-scale digital transformation for Ethiopia's Fayda ID-enabled welfare systems

3 ministries ATI, MoWSA, EDRMC
Dual infra Ethio Telecom + On-prem
RAID 1+0 XCP-ng HA infra
National Fayda ID scale, millions
Read full case study

Leading DevOps implementation across three Ethiopian government entities - the Agricultural Transformation Institute, Ministry of Women and Social Affairs, and Ethiopian Disaster Risk Management Commission. Deployed production-grade RKE2 clusters with Rancher management, Istio service mesh with mTLS, and Keycloak SSO federation across all entities.

Architected dual-infrastructure deployments spanning Ethio Telecom cloud and on-premises environments with HA infrastructure including Hardware RAID 1+0, XCP-ng hypervisor, and WireGuard VPN tunnels. Currently implementing large-scale beneficiary data migration from ODK to OpenG2P for national social protection programs.

Ethio Telecom Cloud RKE2 + Rancher Istio Mesh Keycloak OpenG2P WireGuard On-Premises (RAID 1+0) XCP-ng RKE2 ATI · MoWSA · EDRMC
RKE2 · Rancher · Istio · Keycloak · XCP-ng · WireGuard · OpenG2P · Helm
Jan 2025 – Apr 2025 Serverless Architecture

WeTruck Serverless Logistics

Cloud-native serverless platform for shipment lifecycle, real-time GPS, and automated payments

Zero long-lived credentials
11+ async SQS job types
100% Terraform IaC
OTel distributed tracing
Read full case study

Designed AWS Lambda (Python 3.12) architecture serving REST APIs via FastAPI + Mangum, handling synchronous HTTP traffic and asynchronous job processing across dev/prod environments. Built containerized Lambda deployments using ECR with automated Docker builds and semantic version tagging.

Built end-to-end GitHub Actions pipeline with OIDC federation to AWS - zero long-lived credentials - automated testing, Docker build/push to ECR, Lambda deployment, and post-deploy migration triggers. Managed PostgreSQL (RDS) with PostGIS for geospatial queries, Alembic migrations with advisory locking.

Implemented SQS-driven async job processing with batch item failure reporting. Integrated OpenTelemetry distributed tracing with OTLP HTTP exporter and AWS X-Ray propagation. Implemented JWT authentication, role-based access control, and security headers middleware.

Python · FastAPI · AWS Lambda · ECR · RDS PostgreSQL · PostGIS · SQS · S3 · GitHub Actions · OIDC · OpenTelemetry · Docker
Feb 2023 – Jun 2025 Financial Infrastructure

Multi-Bank Mobile Banking Platform

Production banking infrastructure for Siinqee, Hijra, and Wegagen banks with HA failover

3 banks Siinqee, Hijra, Wegagen
HA Heartbeat failover
Keycloak production SSO
4h → 15m deploy reduction
Read full case study

Deployed Docker and Kubernetes clusters with resource limits, HPA, and load balancing for mobile banking, USSD, and REST microservices on WildFly and Nginx. Implemented Jenkins pipelines automating the full lifecycle for Maven-based Java applications.

Managed Proxmox clusters for HA virtualization and AWS infrastructure. Configured ActiveMQ message brokering and MySQL tuning for banking workloads. Integrated ELK stack for centralized logging and Prometheus/Grafana for resource monitoring. Implemented Keycloak SSO, Heartbeat HA failover, and comprehensive security policies across production banking environments.

Docker · Kubernetes · Jenkins · Proxmox · AWS · Nginx · WildFly · ActiveMQ · MySQL · ELK · Keycloak
NDIT Solutions AI Infrastructure

BugBot - Self-Hosted Multi-Agent AI Code Reviewer

Replaces SaaS code review tools for regulated and cost-sensitive engineering teams

39 seconds 64-file MR review
~$50/mo vs $5K-15K SaaS
5 agents parallel review
Zero code leaves network
Read full case study

Architected and deployed for NDIT Solutions (US IT consulting firm) as an internal code review system replacing SaaS alternatives like CodeRabbit, Greptile, and Cursor. The system reduces external SaaS spend by an order of magnitude while keeping all source code inside NDIT's network, critical for serving regulated clients across cybersecurity, government, and financial services engagements where data sovereignty is a hard requirement.

Five specialized review agents - security, performance, code quality, architecture, and testing - run in parallel through an n8n orchestration layer. The system pulls merge requests from GitLab via webhook, splits diffs by file, dispatches them to the agents simultaneously, and aggregates findings into a single ranked review comment posted back to the MR with file-line-level annotations. Inference is served via Groq with Kimi K2 and Llama 3.3 70B models for sub-second per-agent latency.

End-to-end latency for a 64-file merge request is 39 seconds from webhook to posted review. Total infrastructure cost is approximately $50/month for 1,000+ reviews, compared to $5,000-$15,000/month for equivalent SaaS tooling at the same volume. Architecture is model-agnostic, allowing NDIT to swap inference providers (Groq, OpenAI, Anthropic, or fully self-hosted Llama/Mistral) without rewriting the orchestration layer.

GitLab CI · GitLab Webhook API · n8n workflow orchestration · Groq inference · Kimi K2 · Meta Llama 3.3 70B · Python · Docker · Self-hosted compute · Prometheus
NDIT Solutions DevOps Infrastructure

Umoja Booking - GitLab CI/CD Pipeline Architecture

Production CI/CD pipelines for a 10-service Docker Compose platform with full security scanning and supply-chain auditing

10 services orchestrated
6 + 4 stages main + AI pipeline
5 app builds Trivy scanned
Self-hosted GitLab runners
Read full case study

Designed and deployed dual GitLab CI/CD pipelines for the Umoja Booking platform - a 10-service Docker Compose application requiring distinct deployment workflows for application services and AI services. Each pipeline runs on self-hosted GitLab runners with isolated execution environments and full audit logging.

The main 6-stage pipeline orchestrates: cleanup, supply-chain audit, parallel Docker builds across 5 services, Trivy vulnerability scanning at the container layer, staging deploy, production deploy. The AI service pipeline runs a 4-stage variant tailored for AI-specific build artifacts and inference dependencies.

Built-in disk cleanup automation prevents runner exhaustion. Dependency auditing catches transitive vulnerabilities before they reach production. Trivy scans every container image against known CVE databases with configurable severity thresholds. The result is a pipeline that developers actually trust - verified, audited, and reproducible from a clean state every time.

GitLab CI · Self-hosted GitLab Runners · Docker Compose · Trivy · Supply-chain auditing · Parallel build orchestration
Atlas Computer Technology Government Systems

Ministry of Revenue ITAS - National Tax Payment Infrastructure

Clustered tax payment platform for Ethiopia's Integrated Tax Administration System

National tax revenue scale
Clustered HA deployment
Vault managed secrets
Keycloak SSO
Read full case study

Deployed and operated the Kubernetes-orchestrated infrastructure for Ethiopia's Integrated Tax Administration System (ITAS) at the Ministry of Revenue. The platform processes national tax payment workflows with strict regulatory and uptime requirements typical of government revenue systems.

The deployment uses Ansible-driven configuration management for reproducible cluster provisioning, GitLab CI for pipeline automation across staging and production environments, and HashiCorp Vault for secrets management - keeping API keys, database credentials, and Keycloak client secrets out of code. Keycloak provides SSO across MoR services with fine-grained RBAC for tax officers, auditors, and administrators.

Observability is built on the ELK stack for centralized log aggregation, with Apache Superset and JasperReports providing structured business reporting on payment flows, audit trails, and reconciliation data. End-to-end health checks and resilience policies ensure the platform remains available during tax filing peak windows.

Kubernetes · Ansible · GitLab CI · HashiCorp Vault · PostgreSQL · Keycloak · ELK · Apache Superset · JasperReports

Additional Work

Exponent.ch

NVIDIA GPU-enabled RKE2 clusters for AI/ML workloads - n8n, Langfuse, LibreChat, ClickHouse with Terraform/Terragrunt and Teleport access.

ODK → OpenG2P Migration

Large-scale beneficiary data migration for national social protection programs across Ethiopian government entities.

Centriweb (NZ)

Backend and DevOps for international clients - Terraform IaC, CI/CD pipeline optimization, cloud resource monitoring for performance and cost.

Capabilities

Architectural Expertise

Container Platforms

Production RKE2 clusters managed through Rancher at government and enterprise scale. K3s for edge and lightweight deployments, EKS for AWS-native workloads. Istio service mesh with mTLS for zero-trust networking. Helm umbrella charts for repeatable multi-service deployments.

RKE2 · Rancher · K3s · EKS · Istio · Helm · Docker · OpenShift

Infrastructure as Code

Terraform and Terragrunt with strict module discipline, DRY principles, and multi-environment support. Ansible for configuration management across heterogeneous infrastructure. Packer for reproducible machine image builds. Every environment reproducible from version-controlled definitions.

Terraform · Terragrunt · Ansible · Packer · OpenTofu

CI/CD Architecture

GitHub Actions with OIDC federation for zero-credential deployments to AWS. GitLab CI with protected environments and approval gates. Jenkins for legacy Maven pipelines in banking environments. Every pipeline tested, versioned, and auditable.

GitHub Actions · GitLab CI · Jenkins · OIDC · ArgoCD

Cloud & Hybrid

AWS-native architectures (Lambda, ECR, RDS, SQS, S3, EventBridge, Secrets Manager, IAM) alongside GCP compute and Azure evaluation. On-premises Proxmox and XCP-ng virtualization. Dual-infrastructure government deployments bridging public cloud and sovereign on-premises environments.

AWS · GCP · Azure · Proxmox · XCP-ng

Networking & SDN

SR-MPLS transport fabrics using FRRouting with IS-IS and BGP for deterministic path selection. ONOS SDN controller with OpenFlow 1.3 for programmatic flow rule installation. WireGuard mesh overlays with VXLAN for multi-cloud connectivity. Nginx and HAProxy reverse proxies at production scale.

SR-MPLS · FRRouting · ONOS · BGP · IS-IS · WireGuard · VXLAN · Nginx

Observability

Prometheus with Grafana and Mimir for metrics at scale. ELK stack for centralized logging in banking and government environments. OpenTelemetry distributed tracing with X-Ray propagation. Zabbix for legacy infrastructure monitoring. 29-panel observability dashboards covering system, network, and AI telemetry.

Prometheus · Grafana · Mimir · ELK · OpenTelemetry · Zabbix · SigNoz

Identity & Zero-Trust

Keycloak federation across multi-entity government deployments. Teleport for identity-aware Kubernetes access with session recording. IAM OIDC federation eliminating long-lived credentials in CI/CD pipelines. Fine-grained RBAC and JWT authentication in production APIs.

Keycloak · Teleport · Vault · IAM OIDC · JWT · RBAC

GPU & AI Infrastructure

NVIDIA RuntimeClass on Kubernetes with GPU-aware scheduling across heterogeneous compute - T4 16GB, L4 24GB, Jetson Nano. Inference pipelines spanning PyTorch, ONNX Runtime, and TensorRT FP16. Dynamic workload placement and live migration across GPU tiers.

NVIDIA RuntimeClass · PyTorch · ONNX · TensorRT · vLLM · Jetson Nano

Experience

Timeline

  1. Mar 2026 - Present
    DevOps Engineer · NDIT Solutions Remote contractor for 25-year US IT consulting firm. Architected BugBot (self-hosted multi-agent AI code review, ~1/100th SaaS cost). Designed dual GitLab CI/CD pipelines for 10-service Umoja Booking platform with Trivy scanning and supply-chain auditing. Standardizing DevOps across US client engagements.
  2. Nov 2025 - Present
    Platform Engineer · NetrionX Inc. Distributed AI infrastructure orchestration - patent-pending DDEF platform
  3. Mid-2025 - Present
    DevOps Engineer · OpenG2P Ethiopia National digital transformation - ATI, MoWSA, EDRMC ministries
  4. Jun - Dec 2025
    Backend / DevOps Engineer · Centriweb International client engagements - Auckland, New Zealand
  5. Apr - Jun 2025
    Senior DevOps Engineer · Exponent.ch GPU-enabled RKE2 clusters, Terraform/Terragrunt, Teleport - Switzerland
  6. Jan - Apr 2025
    Backend & DevOps Engineer · WeTruck Serverless AWS, OIDC federation, OpenTelemetry distributed tracing
  7. Feb 2023 - Jun 2025
    DevOps Engineer · Atlas Computer Technology Banking infrastructure - Siinqee, Hijra, Wegagen - Kubernetes, Jenkins, Proxmox, ELK
  8. Sep 2021 - Jan 2023
    Junior DevOps Engineer · Atlas Computer Technology On-premise Proxmox, USSD/mobile banking UAT, monitoring with Zabbix and ELK

About

Background

BSc Computer Science from Admas University, Addis Ababa (2018–2022). Based in Addis Ababa, operating across UTC+3 with full overlap to US and European business hours when required.

Languages: English (fluent business and technical), Amharic (native), French (A1), Spanish (A1).

Certifications

  • Advanced Kubernetes - LinkedIn Learning, Apr 2025
  • Advanced Terraform - LinkedIn Learning, Mar 2025
  • Build a CI/CD Pipeline - LinkedIn Learning, Mar 2025
  • DevOps Foundations: Site Reliability Engineering - LinkedIn Learning, Mar 2025
  • Running Kubernetes on AWS / EKS - LinkedIn Learning, Mar 2025
  • Linux Fundamentals Bootcamp
  • NGINX Web Server from Scratch
References available on request

Anteneh Temesgen (Senior Software Engineer), Biruk Tesfaw (Senior Software Developer), Muluken Solomon (Senior Software Engineer)

Contact

Let's work together

Available for senior platform / SRE roles, regulated infrastructure consulting, and technical advisory engagements. Full overlap with EU, UK, and US business hours from Addis Ababa (UTC+3). Open to remote full-time or contract.

Most fits start with writing first. Send me a paragraph about what you're building.