Aspiring AI Engineer specializing in MLOps, LLMOps & Agentic Systems
CS Graduate @ DSU Bangalore
Building Production-Ready AI Systems
6+ AI Systems Built | 1 Patent + 2 Publications
Solving real AI problems with production-ready solutions
Citation-backed, hallucination-reduced RAG pipelines
Compliance-GPT, AudioRaG Enterprise
Prompt engineering, multi-model orchestration
GPT-4o, Claude, Llama, Gemini projects
FastAPI + Docker + Cloud deployments
6+ systems with Docker, CI/CD, monitoring
Vision + Audio + Text fusion systems
Medical Chatbot, Audio-RAG
Hindi/Telugu/Kannada NLP for Indic market
Pratilipi Comics localization
Academic rigor meets industry speed
1 Patent + 2 Publications
Production-ready systems built with modern AI architecture
Multi-Agent AI System for Autonomous Inventory Management. Traditional inventory systems trigger false alarms by not distinguishing between genuine supply crises and natural demand fluctuations, leading to unnecessary restocking and inventory bloat.
Enterprise RAG System with Zero-Hallucination Citations. Compliance professionals spend 200+ hours per quarter manually searching GDPR, CCPA, PCI-DSS regulations, costing organizations $300K+ annually. Hallucination in general-purpose ChatGPT creates legal liability.
AI Audio Analytics with Multi-Tenant Security & Domain Expertise. Organizations accumulate massive audio archives (meetings, calls, interviews) but lack tools to efficiently query insights at scale.
Multi-modal AI system for detecting deepfakes and misinformation using EfficientNet-B4 for image forensics paired with NLP-based fake news analysis.
Automated property search and investment insights using multi-agent AI with web scraping, market analysis, and personalized recommendations.
Detailed technical deep dives for the top 3 featured projects below — including architecture diagrams, tech stack analysis, and key technical decisions.
Architecture diagrams, tech stack analysis, and key technical decisions
Multi-Agent AI System for Autonomous Inventory Management
Inventory Trigger (CSV/MongoDB)
Historical demand (6-12 months), current stock levels, lead times & reorder points (ROP), safety stock calculations
LangGraph Agentic Workflow → Data Loader
Loads historical demand, current stock, lead times, reorder points, safety stock calculations
AI Reasoning Engine
Gemini 2.0 (with Groq fallback): analyze demand trends, detect anomalies & demand spikes, crisis vs. natural decline logic, confidence scoring via prompting
Action Generator
Structured JSON output: Purchase Orders (external suppliers), Warehouse Transfer Orders (internal), confidence & reasoning trail
Multi-Channel Notifications
Telegram Bot (inline approve/reject), Slack Webhooks (team channels), Web Dashboard (real-time monitoring)
| Component | Purpose | Why It's Critical |
|---|---|---|
| LangGraph | Agentic orchestration framework | Enables autonomous multi-step decision workflows; state management across agent steps |
| Gemini 2.0 + Groq Fallback | LLM backbone for reasoning | Dual-model approach ensures 99.9% availability; Gemini for complex analysis, Groq for cost efficiency |
| MongoDB Atlas | Document-oriented database | Flexible schema for inventory items; auto-scaling handled by Atlas |
| Safety Stock Calculations | Demand variance quantification | Distinguishes between expected fluctuations and true shortages (confidence scoring) |
| FastAPI + SlowAPI | Production backend + rate limiting | Sub-second response times; DDoS protection; typed Python models via Pydantic |
| Redis Cache | High-speed order lookup | 1000x faster than MongoDB for recent orders; reduces database load |
Agents can make dynamic decisions about next steps, enabling adaptive workflows. Built-in memory/state management prevents information loss across steps. Reduces boilerplate code by 60% compared to manual orchestration.
MongoDB: flexible schema for heterogeneous inventory items, automatic scaling. Redis: sub-millisecond lookups for recent orders, human-in-loop approvals. Better than single DB approach for latency-sensitive operations.
Gemini: superior reasoning for demand pattern analysis ($0.075/1M tokens input). Groq: 10x faster inference for simple calculations ($0.10/1M tokens). Failover strategy ensures uptime even during API disruptions.
AI uncertainty manifests as edge cases; humans catch outliers LLM model can't. Telegram approval system reduces friction — instant mobile notifications. Audit trail for compliance and continuous model improvement.
User Query → FastAPI Endpoint
Rate Limited @ 30 req/min
Query Expansion
"breach" → ["personal data breach" + "Article 33 notification" + "72 hours" + "supervisory authority"]
Weaviate Vector Search
BM25 Keyword Index (exact term matching) + Semantic Vectors (cross-lingual understanding) → Returns top-5 relevant chunks with source metadata
Prompt Engineering (Citation-Aware)
"Use ONLY the provided context. If not found, say so. Include [Page X, File Y] inline citations."
Groq LLM Generation
7B Mixtral → 70B Llama for complex queries. Latency: 200-400ms (vs. 2-5s for GPT-4). Cost: $0.10/1M tokens (vs. $15/1M for GPT-4)
Citation Formatting & Response
"Article 33 GDPR requires notification within 72 hours [GDPR-EN.pdf, Page 34, Chunk 2]" → Response Cache (5min TTL, Redis) → JSON with Citations + Metadata
| Component | Purpose | Why It's Critical |
|---|---|---|
| Weaviate | Vector + keyword search | BM25 algorithm for exact matches + BERT embeddings for semantics |
| Query Expansion | Multi-term semantic understanding | LLM generates 5-10 synonym/related-term variants per query |
| Groq LLM | Fast, cost-effective generation | Mixtral-7B for simple queries, Llama-70B for complex regulatory parsing |
| Citation Engine | Source metadata preservation | Chunk-level provenance: filename + page number + character offsets |
| Security Layer | Enterprise hardening | Rate limiting (SlowAPI) + HTTPS enforcement + CORS (no wildcard) + admin auth |
| Prompt Injection Defense | Input sanitization | Pydantic validation + regex filtering for SQL/prompt attack patterns |
Built-in BM25 eliminates need for separate keyword search infrastructure. Hybrid search (BM25 + semantic) reduces hallucinations in legal domain. No vendor lock-in; can self-host for on-premise compliance.
10x faster inference (200ms vs. 2-5s). 67x cheaper ($0.10 vs. $15 per 1M input tokens). Sufficient reasoning capability for regulation parsing. Free tier allows bootstrapping without large budgets.
Legal liability: every claim must be traceable to official document. Audit trail: regulators require evidence of due diligence. User trust: transparent sourcing enables verification.
Regulations evolve; new amendments rare in documents but critical. Query expansion catches synonyms humans might use ("unauthorized access" → "breach"). Web search (EDPB official guidance) fills gaps in local knowledge base.
AI Audio Analytics with Multi-Tenant Security & Domain Expertise
Audio Upload (MP3/WAV/OGG)
Raw Bytes → S3 / Local Storage
AssemblyAI Async Job
Speech-to-Text (99% accuracy), Speaker Diarization (who spoke when), PII Redaction (HIPAA/GDPR compliance option)
Transcription Split into Chunks
Preserve speaker identity: "[Speaker A]: ...", timestamp metadata for seeking, overlap chunks (256 tokens, overlap: 32)
Embedding Generation (Batch)
Model: BGE-Large (1024-dim, superior for domain docs), batch size: 100 (GPU optimized), Qdrant indexing async
Store in Qdrant Vector DB
Payload: metadata (speaker, timestamp, domain), Index type: HNSW (fast nearest-neighbor search), 3 replicas for HA
User Query (Multi-Tenant Isolation)
JWT decode → tenant_id extraction, query vector embedding (real-time), metadata filter: WHERE tenant_id = {authenticated_tenant}, Qdrant similarity search (top-20 results)
| Component | Purpose | Why It's Critical |
|---|---|---|
| AssemblyAI | Speech-to-Text engine | 99% accuracy, speaker diarization, PII redaction for HIPAA/GDPR |
| Qdrant | Vector database with metadata filtering | HNSW indexing for fast nearest-neighbor search, 3 replicas for HA |
| SambaNova | Fast LLM inference | Domain-specific vocabulary injection for Healthcare, Legal, Finance |
| BGE-Large | Embedding model (1024-dim) | Superior performance for domain documents, GPU-optimized batch processing |
| JWT + RBAC | Multi-tenant security | Tenant isolation via JWT token, role-based access control |
| Redis | Session + cache layer | Fast session management and query result caching |
Built-in speaker diarization saves weeks of integration. PII redaction out-of-the-box for compliance. 99% accuracy with async webhook callbacks for scale.
Superior metadata filtering (critical for multi-tenant isolation). HNSW indexing optimized for high-dimensional BGE-Large vectors. Native tenant_id filtering in query API.
1024-dim vs 384-dim captures more semantic nuance for domain-specific audio content. Superior performance on domain documents (legal, medical, financial terminology). Worth the compute cost for enterprise accuracy requirements.
Enterprise customers require data isolation. RBAC enables team hierarchies (admin, analyst, viewer). Audit trails satisfy compliance requirements for regulated industries.
Tools and technologies I work with daily
Stable, deployment-ready systems with Docker, CI/CD, automated testing, and monitoring for real-world impact
"Stable, deployment-ready AI systems with Docker, CI/CD, automated testing, and monitoring for real-world impact"
Define core business problem and quantifiable ROI metrics. Research existing solutions and identify competitive advantages. Establish success criteria (latency SLAs, accuracy thresholds, cost per prediction).
Implement hallucination reduction mechanisms (RAG, fine-tuning, prompt engineering, validation layers). Design robustness strategies (error handling, fallback mechanisms, circuit breakers). Define evaluation metrics aligned with business KPIs.
Optimize for latency, throughput, and scalability requirements. Evaluate model selection vs. API trade-offs (local deployment vs. cloud APIs). Design infrastructure (compute resources, caching, database schema). Plan cost optimization and resource utilization.
Establish baseline metrics (accuracy, precision, recall, F1-score, latency). Perform comparative analysis against baselines and competitors. Conduct A/B testing and validation on hold-out datasets. Use production-representative data for realistic assessment.
Design clean APIs with comprehensive documentation. Ensure backward compatibility and semantic versioning. Implement CI/CD pipelines for automated testing and deployment. Plan rollout strategy (blue-green or canary deployments).
Monitor real-time metrics (latency, error rates, token usage, cost). Implement comprehensive logging and distributed tracing. Set up alerting for SLA violations and anomalies. Handle concurrent users with rate limiting and graceful degradation.
Analyze production data and user feedback. Iterate on model performance with real-world insights. Optimize costs and performance based on deployment data. Maintain feedback loops for model retraining and feature development.
Anyone can run notebooks. Few can build stable, deployment-ready systems.
Every project I build has deployment-ready architecture (Docker, FastAPI, testing) from day one.
70-85% of AI initiatives fail to meet expected outcomes due to trust issues.
My RAG systems use citation-backed answers — every claim is traceable to a source.
600M+ potential Indic language market, <0.1% AI coverage.
Building localization pipelines at Pratilipi for underserved language users.
Users don't wait. If it's slow, it's broken.
Hybrid search (BM25 + semantic), caching, and optimized inference pipelines to ensure sub-second response times.
Can't improve what you can't measure.
RAGAS evaluation, latency monitoring, accuracy benchmarks in every project.
5-Year Plan: Evolving into a full-fledged AI/ML Architect
GOAL
Evolve into a full-fledged AI/ML Architect specializing in AI Ops, MLOps, and LLMOps — building the stable, scalable backends that power the next generation of AI systems.
I am not just building models — I am building the infrastructure that makes them reliable.
Mastering the art of monitoring, logging, and debugging complex AI pipelines in production.
Building robust evaluation frameworks and deployment pipelines for Large Language Models.
Architecting distributed systems that can handle millions of inferences with high availability.
Actively seeking AI/ML roles — let's build something amazing together
Or a dynamic startup who needs a reliable person to grow your startup with Tech, AI Operations and business goals and brings in ambitions to grow with your startup. I am the right guy for you to build the next Tech Giant.