Robin Sobti

Senior AI Engineer

Building production-grade RAG, GenAI and Agentic AI platforms with latency, cost and scale in mind.

Introduction

I bring 8+ years of software engineering experience, and 6+ years dedicated to LLM/IR/GenAI platforms. I specialize in multi-agent workflows, RAG pipelines, vector/hybrid retrieval, observability and cost-/latency-guardrails. I’m passionate about driving developer-productivity tooling and scaling inference platforms for real-world impact.

A little bit more about me

I’m a Senior Machine Learning / LLM Platform Engineer with a background in backend and distributed systems. Over the past few years I’ve focused on building LLM-powered platforms, RAG systems, and agentic workflows that are reliable enough for production, not just demos.

I like working at the intersection of models, infrastructure, and product:
designing retrieval pipelines over vector databases, orchestrating multi-step workflows with tools like LangChain / LangGraph, and deploying everything on Kubernetes with proper observability and cost/latency guardrails.

I’ve built and led platforms that power internal AI assistants, “ask-my-docs” style search over large corpora, and multi-agent tools that can call APIs, query knowledge bases, and automate workflows—while staying secure, traceable, and debuggable.

Right now I’m especially interested in:

Agentic workflows that can diagnose and remediate real-world problems (not just chat)
RAG done properly – good retrieval, evaluation, and guardrails instead of blindly stuffing more context
LLM infra – vLLM/TGI, vector stores, tracing, metrics, and model routing on top of Kubernetes

What I work on

LLM & RAG Platforms
Designing end-to-end pipelines: ingestion, chunking, embeddings, vector search (Weaviate/FAISS/Pinecone, hybrid BM25 + dense), reranking, and grounded generation APIs.
Agentic Workflows & Tools
Building multi-step, tool-using workflows with LangChain / LangGraph and similar frameworks—planner/worker patterns, tool schemas, and safe integrations with APIs and internal systems.
Production Infra & Observability
Deploying LLM services on Kubernetes/EKS with Docker, CI/CD, and telemetry (OpenTelemetry, Prometheus/Grafana) including latency, tokens, cost, and per-request traces.
Technical Leadership
Defining platform standards, reviewing designs, mentoring engineers on LLM/RAG/infra, and partnering with product and security to ship AI features that are actually maintainable.

Recent Pastime Projects

Ask-My-Docs (GitHub • Live Demo)

Built a RAG-based document assistant using OpenAI embeddings and Pinecone/Weaviate for hybrid semantic + BM25 retrieval with reranking.
Exposed as a FastAPI + Streamlit service, containerized with Docker and deployed with CI/CD.
Implemented latency/token/cost observability and offline IR evaluation (MRR, Recall@K) to compare retriever and prompt variants and enable safe changes.

Multi-Agent Assistant (Agentic Workflows)

Designed and developed a LangChain-based multi-agent assistant with a planner and tool-using agents (Gmail, Calendar, RAG search, external APIs).
Implemented structured tool calls, trace logging, and guardrails to experiment with caching, routing, and future OSS LLM serving via vLLM/TGI.
Used this as a sandbox for agentic patterns: task decomposition, tool selection, and cross-agent coordination—directly relevant to agentic security operations workflows.

Certifications & Learning

Retrieval-Augmented Generation (RAG) – DeepLearning.AI, 2025
Generative AI with Large Language Models – Amazon Web Services, 2025
Deep Learning Specialization – DeepLearning.AI, 2025
Machine Learning Specialization – DeepLearning.AI, 2025
- Supervised ML: Regression & Classification
- Advanced Learning Algorithms
- Unsupervised Learning, Recommenders & Reinforcement Learning
Machine Learning – Stanford University (Coursera), 2019

Full credential details and links are on my LinkedIn.

Resume

cvRobinSobtiS1.1.pdf

Page updated

Google Sites

Report abuse