Cloud Infrastructure / MLOps

Cloud-Native ML Inference Platform

Enterprise AI Company

Timeline
12 weeks
Team Size
1 platform engineer, 1 DevOps, 1 SDK developer
Technologies
KubernetesAWS EKSTerraform

The Problem

Client needed production-ready ML serving platform supporting multiple model types (LLMs and general inference) with enterprise authentication, auto-scaling, and full automation. Existing setup was manual, lacked security, had no observability, and couldn't handle variable workloads efficiently. Teams spent weeks setting up infrastructure and managing deployments.

What We Built

Built comprehensive Kubernetes-based ML inference platform on AWS EKS combining industry-leading open-source technologies. Automated entire stack: Terraform for AWS infrastructure (VPC, EKS, S3, IAM), Helmfile for platform components (Istio, Knative, KServe, Karpenter, monitoring). Implemented OAuth2/OIDC authentication via Keycloak with JWT validation at Istio ingress. Added intelligent auto-scaling: Karpenter for dynamic node provisioning (CPU/GPU), Knative for request-based pod scaling with scale-to-zero. Integrated cert-manager for TLS automation, external-dns for Route53, comprehensive monitoring with Prometheus/Grafana. Built Python SDK for easy model deployment and inference. Supports both vLLM (LLM-optimized with PagedAttention) and Triton (multi-framework) runtimes.

Tech Stack

KubernetesAWS EKSTerraformHelmfileIstioKServeKnativeKarpenterKeycloakPythonvLLMTritonPrometheusGrafana

Results

  • Infrastructure provisioning: days → <2 hours (one-command deployment)
  • Platform includes enterprise auth, TLS automation, DNS management out-of-the-box
  • Auto-scaling at both node (Karpenter) and pod (Knative) levels with scale-to-zero
  • Cold start optimized to <30 seconds for model serving
  • Complete observability: Grafana dashboards, Prometheus metrics, distributed tracing
  • Python SDK with OAuth2, retry logic, and Kubernetes integration
<2 hrs
Setup Time
<30s
Cold Start
100%
Automation

Client Feedback

"Went from manually configuring infrastructure for weeks to one-command deployment. The platform handles authentication, scaling, and monitoring automatically. Game-changer for our ML teams."

Head of ML Infrastructure

Need Similar Work?

Tell us what you're building and we'll let you know if we can help.