Senior AI Engineer
Building production-grade RAG, GenAI and Agentic AI platforms with latency, cost and scale in mind.
I bring 8+ years of software engineering experience, and 6+ years dedicated to LLM/IR/GenAI platforms. I specialize in multi-agent workflows, RAG pipelines, vector/hybrid retrieval, observability and cost-/latency-guardrails. I’m passionate about driving developer-productivity tooling and scaling inference platforms for real-world impact.
I’m a Senior Machine Learning / LLM Platform Engineer with a background in backend and distributed systems. Over the past few years I’ve focused on building LLM-powered platforms, RAG systems, and agentic workflows that are reliable enough for production, not just demos.
I like working at the intersection of models, infrastructure, and product:
designing retrieval pipelines over vector databases, orchestrating multi-step workflows with tools like LangChain / LangGraph, and deploying everything on Kubernetes with proper observability and cost/latency guardrails.
I’ve built and led platforms that power internal AI assistants, “ask-my-docs” style search over large corpora, and multi-agent tools that can call APIs, query knowledge bases, and automate workflows—while staying secure, traceable, and debuggable.
Right now I’m especially interested in:
Agentic workflows that can diagnose and remediate real-world problems (not just chat)
RAG done properly – good retrieval, evaluation, and guardrails instead of blindly stuffing more context
LLM infra – vLLM/TGI, vector stores, tracing, metrics, and model routing on top of Kubernetes
LLM & RAG Platforms
Designing end-to-end pipelines: ingestion, chunking, embeddings, vector search (Weaviate/FAISS/Pinecone, hybrid BM25 + dense), reranking, and grounded generation APIs.
Agentic Workflows & Tools
Building multi-step, tool-using workflows with LangChain / LangGraph and similar frameworks—planner/worker patterns, tool schemas, and safe integrations with APIs and internal systems.
Production Infra & Observability
Deploying LLM services on Kubernetes/EKS with Docker, CI/CD, and telemetry (OpenTelemetry, Prometheus/Grafana) including latency, tokens, cost, and per-request traces.
Technical Leadership
Defining platform standards, reviewing designs, mentoring engineers on LLM/RAG/infra, and partnering with product and security to ship AI features that are actually maintainable.
Built a RAG-based document assistant using OpenAI embeddings and Pinecone/Weaviate for hybrid semantic + BM25 retrieval with reranking.
Exposed as a FastAPI + Streamlit service, containerized with Docker and deployed with CI/CD.
Implemented latency/token/cost observability and offline IR evaluation (MRR, Recall@K) to compare retriever and prompt variants and enable safe changes.
Designed and developed a LangChain-based multi-agent assistant with a planner and tool-using agents (Gmail, Calendar, RAG search, external APIs).
Implemented structured tool calls, trace logging, and guardrails to experiment with caching, routing, and future OSS LLM serving via vLLM/TGI.
Used this as a sandbox for agentic patterns: task decomposition, tool selection, and cross-agent coordination—directly relevant to agentic security operations workflows.
Retrieval-Augmented Generation (RAG) – DeepLearning.AI, 2025
Generative AI with Large Language Models – Amazon Web Services, 2025
Deep Learning Specialization – DeepLearning.AI, 2025
Machine Learning Specialization – DeepLearning.AI, 2025
Supervised ML: Regression & Classification
Advanced Learning Algorithms
Unsupervised Learning, Recommenders & Reinforcement Learning
Machine Learning – Stanford University (Coursera), 2019
Full credential details and links are on my LinkedIn.
© 2025 Robin Sobti. All rights reserved.