Available for consulting

AI engineer.
Agentic AI.

Building production AI agents and document intelligence systems — LLMs, VLMs, MCP servers, and multi-agent workflows. 3+ years shipping AI to production at YC-backed startups.

3+
Years Experience
10+
Models in Production
Agents / LLMs
Specialization
Bengaluru
Based in India
Experience
Selected work.
Production ML shipped to real customers across two product companies.
Parspec.io Oct 2024 — Present
Machine Learning Engineer 2 · Bengaluru
  • Built a multi-agent parsing system combining OCR, BERT classification, and LLM-based parsing to extract structured data from complex tables in construction specification documents — 95% row-level accuracy across six complex table formats.
  • Fine-tuned a Table-Transformer for table detection, replacing AWS Textract and cutting latency and cost by 67%, with ~86% mAP (0.5–0.95) in production.
  • Ran RCA on the document pipeline; replaced AWS Textract with open-source OCR models — Textract costs down .
  • Deployed and optimized VLM Dots.OCR via vLLM — parsing accuracy 50% → 75%, throughput 60 → 150 tok/s.
  • Leveraged DSPy for automated prompt optimization — iterative prompt engineering effort down 80%.
  • Built LLM Council pipeline (adapted from Karpathy's) to auto-generate a table parsing dataset — annotation costs down 50%, effort down 60%.
  • Improved BERT column header/type detection via a [SEP] token strategy — accuracy 89% → 95% across multiple domains.
  • Built a domain identification model (BERT + Textract responses) — 92% recall.
  • Built a table complexity classifier (Gemini-2.5-Pro + BERT fine-tuning) — 95% accuracy at 0.5s latency.
  • Designed variation identification with Gemini-2.5-Flash; applied LLM-as-Judge for pre-annotation data quality.
  • Evaluated VLMs (LightonOCR, DeepSeekOCR, PaddleOCR) and LLM parsers (GPT-5, Claude, Gemini, Kimi) on document benchmarks.
  • Led 2 engineers end-to-end on RFQ extraction; designed CI/CD via GitHub Actions, ArgoCD, OpenSearch. Deployed on AWS EKS.
Pibit.ai YC W21 Aug 2022 — Sept 2024
Machine Learning Engineer · Bengaluru
  • Led the Loss Run product (YOLO + OCR) — processing speed up 40%, manual review down 50%.
  • Automated data prep and training pipelines on Azure Machine Learning — data prep 3–4 hrs → 30–45 min.
  • Optimized YOLO performance by 35% through dataset refinement, architectural tweaks, and hyperparameter tuning.
  • Deployed object detection on AWS Lambda + EKS via CodePipeline — 300ms latency.
  • Built identity services (YOLO + GPT-3.5) — 0.85 mAP, 0.95 accuracy at document level.
  • Trained OCR-free DONUT model with TED accuracy 0.81.
  • Integrated Azure MLFlow for experiment tracking; built auto-annotation pipeline with CVAT.
  • Orchestrated GPT-4V experimentation, led 4 interns — turnaround up 25%.
  • Applied advanced prompting (zero-shot, few-shot, chain-of-thought, sequential).
Projects
Side work.
Weekend builds and experiments to stay sharp.
RunHermes.sh
Full-stack SaaS built in under 6 hours — provisions Hetzner VPS per user running Hermes AI Agent in Docker, accessible via Telegram & Discord. Next.js 16, Supabase, LemonSqueezy, Hetzner Cloud API with cloud-init auto-provisioning. Security-hardened.
ToolkitAI
AI-powered multi-tool web platform — virtual try-on, background removal, face-swap, podcast generator. Integrated Gemini 2.5 Flash, Gemini 3 Pro, and Gemini TTS for multimodal features across image, video, and voice.
GrabMart MCP
MCP server integrating real GrabMart APIs for nearby stores, inventories, discounts, delivery times. Natural language → API calls for ordering and recipe-based shopping lists. Claude Sonnet 3.5 on AWS Bedrock with ReAct prompting.
DogLLaMA
Fine-tuned LLaMA2-7b on an English dataset translated into Dog Style Messages via GPT-3.5-turbo. Dataset published on HuggingFace. PEFT with LoRA and BitsAndBytes quantization.
BudgetGPT
Agentic RAG system for Indian budget Q&A using GPT-4o-mini. In-memory vectorstore with per-chunk OpenAI Embeddings, multi-index tool routing orchestrated with LlamaIndex, deployed on Streamlit.
Stack
Tools I work with.
FrameworksHuggingFace, LangChain, LlamaIndex, DSPy, PyTorch, TensorFlow, Numpy, Pandas, vLLM, MCP
ToolsGit, GitHub, VSCode / PyCharm, Cursor, Ollama, Google Colab, Claude Code, OpenAI Codex
InfrastructureKubernetes, Helm, Docker
CloudAWS, Azure, GCP, Runpod
DatabasesMySQL, PostgreSQL, DynamoDB
Education
Formal training.
B.Tech, Computer Engineering
K.J. Somaiya College of Engineering, Mumbai
Aug 2019 — May 2023 · Data Mining · AI · Operating Systems · Software Engineering
9.35
CGPA
Let's talk
Available for consulting.

AI agents. LLM systems. Agentic workflows. MCP servers. Document intelligence. MLOps. Reach out if you're shipping AI and want a second pair of hands.

pathik.consult@gmail.com