Available for consulting

AI engineer.
Agentic AI.

Building production AI agents and document intelligence systems — LLMs, VLMs, MCP servers, and multi-agent workflows. 3+ years shipping AI to production at YC-backed startups.

Years Experience

10+

Models in Production

Agents / LLMs

Specialization

Bengaluru

Based in India

View My Work → Get in Touch →

Experience

Selected work.

Production ML shipped to real customers across two product companies.

Parspec.io Oct 2024 — Present

Machine Learning Engineer 2 · Bengaluru

Built a multi-agent parsing system combining OCR, BERT classification, and LLM-based parsing to extract structured data from complex tables in construction specification documents — 95% row-level accuracy across six complex table formats.
Fine-tuned a Table-Transformer for table detection, replacing AWS Textract and cutting latency and cost by 67%, with ~86% mAP (0.5–0.95) in production.
Ran RCA on the document pipeline; replaced AWS Textract with open-source OCR models — Textract costs down 5×.
Deployed and optimized VLM Dots.OCR via vLLM — parsing accuracy 50% → 75%, throughput 60 → 150 tok/s.
Leveraged DSPy for automated prompt optimization — iterative prompt engineering effort down 80%.
Built LLM Council pipeline (adapted from Karpathy's) to auto-generate a table parsing dataset — annotation costs down 50%, effort down 60%.
Improved BERT column header/type detection via a [SEP] token strategy — accuracy 89% → 95% across multiple domains.
Built a domain identification model (BERT + Textract responses) — 92% recall.
Built a table complexity classifier (Gemini-2.5-Pro + BERT fine-tuning) — 95% accuracy at 0.5s latency.
Designed variation identification with Gemini-2.5-Flash; applied LLM-as-Judge for pre-annotation data quality.
Evaluated VLMs (LightonOCR, DeepSeekOCR, PaddleOCR) and LLM parsers (GPT-5, Claude, Gemini, Kimi) on document benchmarks.
Led 2 engineers end-to-end on RFQ extraction; designed CI/CD via GitHub Actions, ArgoCD, OpenSearch. Deployed on AWS EKS.

Pibit.ai YC W21 Aug 2022 — Sept 2024

Machine Learning Engineer · Bengaluru

Led the Loss Run product (YOLO + OCR) — processing speed up 40%, manual review down 50%.
Automated data prep and training pipelines on Azure Machine Learning — data prep 3–4 hrs → 30–45 min.
Optimized YOLO performance by 35% through dataset refinement, architectural tweaks, and hyperparameter tuning.
Deployed object detection on AWS Lambda + EKS via CodePipeline — 300ms latency.
Built identity services (YOLO + GPT-3.5) — 0.85 mAP, 0.95 accuracy at document level.
Trained OCR-free DONUT model with TED accuracy 0.81.
Integrated Azure MLFlow for experiment tracking; built auto-annotation pipeline with CVAT.
Orchestrated GPT-4V experimentation, led 4 interns — turnaround up 25%.
Applied advanced prompting (zero-shot, few-shot, chain-of-thought, sequential).

Projects

Side work.

Weekend builds and experiments to stay sharp.

01 Visit ↗

RunHermes.sh

Full-stack SaaS built in under 6 hours — provisions Hetzner VPS per user running Hermes AI Agent in Docker, accessible via Telegram & Discord. Next.js 16, Supabase, LemonSqueezy, Hetzner Cloud API with cloud-init auto-provisioning. Security-hardened.

02 Visit ↗

ToolkitAI

AI-powered multi-tool web platform — virtual try-on, background removal, face-swap, podcast generator. Integrated Gemini 2.5 Flash, Gemini 3 Pro, and Gemini TTS for multimodal features across image, video, and voice.

03 GitHub ↗

GrabMart MCP

MCP server integrating real GrabMart APIs for nearby stores, inventories, discounts, delivery times. Natural language → API calls for ordering and recipe-based shopping lists. Claude Sonnet 3.5 on AWS Bedrock with ReAct prompting.

04 GitHub ↗

DogLLaMA

Fine-tuned LLaMA2-7b on an English dataset translated into Dog Style Messages via GPT-3.5-turbo. Dataset published on HuggingFace. PEFT with LoRA and BitsAndBytes quantization.

05 GitHub ↗

BudgetGPT

Agentic RAG system for Indian budget Q&A using GPT-4o-mini. In-memory vectorstore with per-chunk OpenAI Embeddings, multi-index tool routing orchestrated with LlamaIndex, deployed on Streamlit.

Stack

Tools I work with.

FrameworksHuggingFace, LangChain, LlamaIndex, DSPy, PyTorch, TensorFlow, Numpy, Pandas, vLLM, MCP

ToolsGit, GitHub, VSCode / PyCharm, Cursor, Ollama, Google Colab, Claude Code, OpenAI Codex

InfrastructureKubernetes, Helm, Docker

CloudAWS, Azure, GCP, Runpod

DatabasesMySQL, PostgreSQL, DynamoDB

Education

Formal training.

B.Tech, Computer Engineering

K.J. Somaiya College of Engineering, Mumbai

Aug 2019 — May 2023 · Data Mining · AI · Operating Systems · Software Engineering

9.35

CGPA

Let's talk

Available for consulting.

AI agents. LLM systems. Agentic workflows. MCP servers. Document intelligence. MLOps. Reach out if you're shipping AI and want a second pair of hands.

Get in Touch → LinkedIn →

pathik.consult@gmail.com

AI engineer.Agentic AI.

AI engineer.
Agentic AI.