AI Operations & Localization Consultant @ Pratilipi Comics

Hemant Sudarshan

Aspiring AI Engineer specializing in MLOps, LLMOps & Agentic Systems

CS Graduate @ DSU Bangalore

Building Production-Ready AI Systems

6+ AI Systems Built | 1 Patent + 2 Publications

6+ Stable Deployment-Ready AI Systems1 Patent + 2 PublicationsLLM & RAG ExpertAI Operations & Localization

Role: AI Localization ConsultantAI Operations & LocalizationSystems: 6+ Stable Deployment ReadyResearch: 1 Patent + 2 Publications

Actively Seeking AI/ML Roles View Projects View Resume

Email LinkedIn Hugging Face GitHub

What I Bring to Your Table

Solving real AI problems with production-ready solutions

Build RAG Systems

Citation-backed, hallucination-reduced RAG pipelines

Compliance-GPT, AudioRaG Enterprise

Integrate LLMs

Prompt engineering, multi-model orchestration

GPT-4o, Claude, Llama, Gemini projects

Ship to Production

FastAPI + Docker + Cloud deployments

6+ systems with Docker, CI/CD, monitoring

Multimodal AI

Vision + Audio + Text fusion systems

Medical Chatbot, Audio-RAG

Indic Language AI

Hindi/Telugu/Kannada NLP for Indic market

Pratilipi Comics localization

Research + Execution

Academic rigor meets industry speed

1 Patent + 2 Publications

Featured AI Projects & Systems

Production-ready systems built with modern AI architecture

PythonFastAPIDockerMongoDBRedisReactPyTorchTensorFlow

Multi-Agent AI

Agentic Inventory Restocking Service

Repo Live Demo

Multi-Agent AI System for Autonomous Inventory Management. Traditional inventory systems trigger false alarms by not distinguishing between genuine supply crises and natural demand fluctuations, leading to unnecessary restocking and inventory bloat.

System Objectives

Autonomously analyze demand patterns using time-series forecasting
Differentiate between crisis situations and declining demand trends
Generate purchase/transfer orders with confidence scoring (0-100%)
Reduce manual overhead by 95% through AI-driven decision-making

LangGraphMongoDBFastAPIGemini/GroqRedisSlowAPIPydanticDocker

Dual-model (Gemini 2.0 + Groq) for 99.9% availability with automatic failover
Human-in-loop for <95% confidence decisions via Telegram inline approve/reject
Multi-channel notifications (Telegram Bot, Slack Webhooks, Web Dashboard)
Confidence scoring (0-100%) via LLM reasoning with softmax outputs
Safety stock calculations to distinguish expected fluctuations from true shortages
Rate limiting via SlowAPI (30 req/min per IP) for DDoS protection
LangSmith tracing for all AI calls; Prometheus metrics for infra

RAG System

Compliance-GPT

Repo Live Demo Tests Docker

Enterprise RAG System with Zero-Hallucination Citations. Compliance professionals spend 200+ hours per quarter manually searching GDPR, CCPA, PCI-DSS regulations, costing organizations $300K+ annually. Hallucination in general-purpose ChatGPT creates legal liability.

System Objectives

Provide citation-backed compliance answers in <2 seconds (vs. 20+ minutes manual)
Achieve 100% accuracy through retrieval-backed generation (eliminate hallucinations)
Support multi-regulation queries (GDPR, CCPA, PCI-DSS, HIPAA, SOX)
Enable audit trails for compliance documentation

WeaviateFastAPIGroqDockerLangChainHuggingFaceSlowAPIPydantic

Hybrid search (BM25 + semantic) via Weaviate's built-in fusion algorithm
1,987+ regulation chunks from official EDPB, ICO, NIST PDFs with citation provenance
80+ automated tests with CI/CD pipeline via GitHub Actions
Query expansion: LLM generates 5-10 synonym/related-term variants per query
Groq LLM: Mixtral-7B for simple queries, Llama-70B for complex regulatory parsing
Citation engine with chunk-level provenance: filename + page number + character offsets
Prompt injection defense: Pydantic validation + regex filtering for SQL/prompt attack patterns
Fallback web search via DuckDuckGo API when confidence <70%

Audio + RAG

AudioRAG Enterprise

Repo

AI Audio Analytics with Multi-Tenant Security & Domain Expertise. Organizations accumulate massive audio archives (meetings, calls, interviews) but lack tools to efficiently query insights at scale.

System Objectives

Transcribe audio (with speaker diarization) at scale
Enable semantic search over audio content in 2-3 seconds
Support domain-specific vocabularies (Healthcare, Legal, Finance)
Multi-tenant architecture with RBAC and audit logging

AssemblyAIQdrantSambaNovaRedisFastAPIBGE-LargeJWTDocker

Speaker diarization (who spoke when) with timestamp metadata for seeking
Multi-tenant RBAC architecture with JWT-based tenant isolation
Healthcare, Legal, Finance domain-specific vocabulary support
BGE-Large embeddings (1024-dim) superior for domain docs, batch size 100 GPU optimized
Qdrant HNSW indexing with 3 replicas for high availability
PII Redaction option for HIPAA/GDPR compliance
Overlapping chunks (256 tokens, overlap: 32) preserving speaker identity
Audit logging for every query with user ID, timestamp, result quality score

Deepfake + Fake News Detection

TruthTracker (AntiAi)

Repo

Multi-modal AI system for detecting deepfakes and misinformation using EfficientNet-B4 for image forensics paired with NLP-based fake news analysis.

EfficientNet-B4FastAPIReactPyTorch

EfficientNet-B4 for image forensic analysis
NLP-based fake news detection pipeline
Real-time processing with FastAPI backend

Multi-Agent AI

AI Real Estate Agent

Repo

Automated property search and investment insights using multi-agent AI with web scraping, market analysis, and personalized recommendations.

Gemini AIFirecrawlRedisPython

Autonomous property search and filtering
Investment ROI analysis with market data
Multi-agent coordination for complex queries

Detailed technical deep dives for the top 3 featured projects below — including architecture diagrams, tech stack analysis, and key technical decisions.

Deep Dive: Featured Projects

Architecture diagrams, tech stack analysis, and key technical decisions

Agentic Inventory Restocking Service

Multi-Agent AI System for Autonomous Inventory Management

Repo Live Demo

Architecture & Data Flow

Inventory Trigger (CSV/MongoDB)

Historical demand (6-12 months), current stock levels, lead times & reorder points (ROP), safety stock calculations

LangGraph Agentic Workflow → Data Loader

Loads historical demand, current stock, lead times, reorder points, safety stock calculations

AI Reasoning Engine

Gemini 2.0 (with Groq fallback): analyze demand trends, detect anomalies & demand spikes, crisis vs. natural decline logic, confidence scoring via prompting

Action Generator

Structured JSON output: Purchase Orders (external suppliers), Warehouse Transfer Orders (internal), confidence & reasoning trail

Multi-Channel Notifications

Telegram Bot (inline approve/reject), Slack Webhooks (team channels), Web Dashboard (real-time monitoring)

Critical Technical Components

Component	Purpose	Why It's Critical
LangGraph	Agentic orchestration framework	Enables autonomous multi-step decision workflows; state management across agent steps
Gemini 2.0 + Groq Fallback	LLM backbone for reasoning	Dual-model approach ensures 99.9% availability; Gemini for complex analysis, Groq for cost efficiency
MongoDB Atlas	Document-oriented database	Flexible schema for inventory items; auto-scaling handled by Atlas
Safety Stock Calculations	Demand variance quantification	Distinguishes between expected fluctuations and true shortages (confidence scoring)
FastAPI + SlowAPI	Production backend + rate limiting	Sub-second response times; DDoS protection; typed Python models via Pydantic
Redis Cache	High-speed order lookup	1000x faster than MongoDB for recent orders; reduces database load

Technology Stack (Deep Dive)

Backend Infrastructure

FastAPI with async/await for handling 1000+ concurrent requests
Session-based auth for dashboard, API-key based for external integrations
SlowAPI middleware (30 req/min per IP) to prevent abuse

AI/ML Engine

Primary Model: Google Gemini 2.0 (Reasoning mode enabled for complex inventory analysis)
Fallback Model: Groq (faster, cost-effective backup)
Prompt Engineering: Few-shot learning with historical order examples
Confidence Calibration: Softmax outputs from LLM reasoning to produce 0-100% confidence scores

Data Infrastructure

Production Database: MongoDB Atlas with multi-region replication
Caching Layer: Redis Cluster for session state + recent orders
Time-Series Analysis: Python's statsmodels for ARIMA forecasting
Data Validation: Pydantic models ensuring data integrity

Deployment & Monitoring

Containerization: Docker with multi-stage builds for minimal image size
Orchestration: Railway.app for automatic scaling based on CPU/memory
Observability: LangSmith tracing for all AI calls; Prometheus metrics for infra
CI/CD: GitHub Actions with automated testing on every push

Key Technical Decisions & Rationale

Why LangGraph over traditional state machines?

Agents can make dynamic decisions about next steps, enabling adaptive workflows. Built-in memory/state management prevents information loss across steps. Reduces boilerplate code by 60% compared to manual orchestration.

Why MongoDB + Redis hybrid?

MongoDB: flexible schema for heterogeneous inventory items, automatic scaling. Redis: sub-millisecond lookups for recent orders, human-in-loop approvals. Better than single DB approach for latency-sensitive operations.

Why Gemini + Groq dual-model?

Gemini: superior reasoning for demand pattern analysis ($0.075/1M tokens input). Groq: 10x faster inference for simple calculations ($0.10/1M tokens). Failover strategy ensures uptime even during API disruptions.

Why human-in-loop for <95% confidence?

AI uncertainty manifests as edge cases; humans catch outliers LLM model can't. Telegram approval system reduces friction — instant mobile notifications. Audit trail for compliance and continuous model improvement.

Compliance-GPT

Enterprise RAG System with Zero-Hallucination Citations

Repo Live Demo

Architecture & Data Flow

User Query → FastAPI Endpoint

Rate Limited @ 30 req/min

Query Expansion

"breach" → ["personal data breach" + "Article 33 notification" + "72 hours" + "supervisory authority"]

Weaviate Vector Search

BM25 Keyword Index (exact term matching) + Semantic Vectors (cross-lingual understanding) → Returns top-5 relevant chunks with source metadata

Prompt Engineering (Citation-Aware)

"Use ONLY the provided context. If not found, say so. Include [Page X, File Y] inline citations."

Groq LLM Generation

7B Mixtral → 70B Llama for complex queries. Latency: 200-400ms (vs. 2-5s for GPT-4). Cost: $0.10/1M tokens (vs. $15/1M for GPT-4)

Citation Formatting & Response

"Article 33 GDPR requires notification within 72 hours [GDPR-EN.pdf, Page 34, Chunk 2]" → Response Cache (5min TTL, Redis) → JSON with Citations + Metadata

Critical Technical Components

Component	Purpose	Why It's Critical
Weaviate	Vector + keyword search	BM25 algorithm for exact matches + BERT embeddings for semantics
Query Expansion	Multi-term semantic understanding	LLM generates 5-10 synonym/related-term variants per query
Groq LLM	Fast, cost-effective generation	Mixtral-7B for simple queries, Llama-70B for complex regulatory parsing
Citation Engine	Source metadata preservation	Chunk-level provenance: filename + page number + character offsets
Security Layer	Enterprise hardening	Rate limiting (SlowAPI) + HTTPS enforcement + CORS (no wildcard) + admin auth
Prompt Injection Defense	Input sanitization	Pydantic validation + regex filtering for SQL/prompt attack patterns

Technology Stack (Deep Dive)

Knowledge Base Preparation

Document Ingestion: 1,987+ chunks from official regulation PDFs (EDPB, ICO, NIST)
Chunking Strategy: Overlapping chunks (size: 512 tokens, overlap: 64 tokens)
Metadata preservation: source filename, page numbers, regulation type, section headers
Embedding Model: HuggingFace sentence-transformers/all-MiniLM-L6-v2 (384-dim, compatible with Weaviate)

Retrieval-Generation Pipeline

Vector Database: Weaviate Cloud (managed service, auto-scaling)
Hybrid Search: Weaviate's built-in fusion algorithm (BM25 + semantic score combination)
LLM Orchestration: LangChain → Groq API
Fallback Strategy: If confidence <70%, trigger web search via DuckDuckGo API for newest regulations

Production Hardening

Rate Limiting: SlowAPI (30 req/min/IP), with exponential backoff
HTTPS Enforcement: Production environments block HTTP, cert auto-renewal via Certbot
CORS Protection: Whitelist specific origins (no * wildcard)
Admin Dashboard: Protected by token-based auth (FastAPI Security dependencies)
Audit Logging: Every query logged with user ID, timestamp, result quality score

Deployment & Observability

Containerization: Docker Compose for local dev (includes Weaviate + Groq proxy)
Live Environment: HuggingFace Spaces (free tier) with auto-redeployment on git push
Monitoring: Prometheus metrics (query latency P50/P95/P99, cache hit rate, hallucination detection)
CI/CD: GitHub Actions runs 80+ tests before deployment

Key Technical Decisions & Rationale

Why Weaviate over Pinecone/Qdrant?

Built-in BM25 eliminates need for separate keyword search infrastructure. Hybrid search (BM25 + semantic) reduces hallucinations in legal domain. No vendor lock-in; can self-host for on-premise compliance.

Why Groq instead of GPT-4 or Claude?

10x faster inference (200ms vs. 2-5s). 67x cheaper ($0.10 vs. $15 per 1M input tokens). Sufficient reasoning capability for regulation parsing. Free tier allows bootstrapping without large budgets.

Why citation-level provenance matters?

Legal liability: every claim must be traceable to official document. Audit trail: regulators require evidence of due diligence. User trust: transparent sourcing enables verification.

Why query expansion + fallback web search?

Regulations evolve; new amendments rare in documents but critical. Query expansion catches synonyms humans might use ("unauthorized access" → "breach"). Web search (EDPB official guidance) fills gaps in local knowledge base.

AudioRAG Enterprise

AI Audio Analytics with Multi-Tenant Security & Domain Expertise

Repo

Architecture & Data Flow

Audio Upload (MP3/WAV/OGG)

Raw Bytes → S3 / Local Storage

AssemblyAI Async Job

Speech-to-Text (99% accuracy), Speaker Diarization (who spoke when), PII Redaction (HIPAA/GDPR compliance option)

Transcription Split into Chunks

Preserve speaker identity: "[Speaker A]: ...", timestamp metadata for seeking, overlap chunks (256 tokens, overlap: 32)

Embedding Generation (Batch)

Model: BGE-Large (1024-dim, superior for domain docs), batch size: 100 (GPU optimized), Qdrant indexing async

Store in Qdrant Vector DB

Payload: metadata (speaker, timestamp, domain), Index type: HNSW (fast nearest-neighbor search), 3 replicas for HA

User Query (Multi-Tenant Isolation)

JWT decode → tenant_id extraction, query vector embedding (real-time), metadata filter: WHERE tenant_id = {authenticated_tenant}, Qdrant similarity search (top-20 results)

Critical Technical Components

Component	Purpose	Why It's Critical
AssemblyAI	Speech-to-Text engine	99% accuracy, speaker diarization, PII redaction for HIPAA/GDPR
Qdrant	Vector database with metadata filtering	HNSW indexing for fast nearest-neighbor search, 3 replicas for HA
SambaNova	Fast LLM inference	Domain-specific vocabulary injection for Healthcare, Legal, Finance
BGE-Large	Embedding model (1024-dim)	Superior performance for domain documents, GPU-optimized batch processing
JWT + RBAC	Multi-tenant security	Tenant isolation via JWT token, role-based access control
Redis	Session + cache layer	Fast session management and query result caching

Technology Stack (Deep Dive)

Audio Processing

AssemblyAI async transcription with webhook callbacks
Speaker diarization preserving 'who said what' context
PII redaction for sensitive data (HIPAA/GDPR)
Support for MP3, WAV, OGG formats

Search & Retrieval

BGE-Large embeddings (1024-dim) for domain-specific accuracy
Qdrant vector DB with HNSW indexing and metadata filtering
Multi-tenant isolation: queries filtered by authenticated tenant_id
Domain-specific vocabulary injection (Healthcare, Legal, Finance)

Security & Multi-Tenancy

JWT-based authentication with tenant_id extraction
Role-Based Access Control (RBAC) for team management
Audit logging: every query logged with user ID, timestamp, quality score
Data isolation: tenants can never access each other's audio data

Infrastructure

Docker containerization for consistent deployments
Redis for session management and query caching
S3-compatible storage for raw audio files
Async processing pipeline for non-blocking uploads

Key Technical Decisions & Rationale

Why AssemblyAI over Whisper?

Built-in speaker diarization saves weeks of integration. PII redaction out-of-the-box for compliance. 99% accuracy with async webhook callbacks for scale.

Why Qdrant over Weaviate for this project?

Superior metadata filtering (critical for multi-tenant isolation). HNSW indexing optimized for high-dimensional BGE-Large vectors. Native tenant_id filtering in query API.

Why BGE-Large over MiniLM?

1024-dim vs 384-dim captures more semantic nuance for domain-specific audio content. Superior performance on domain documents (legal, medical, financial terminology). Worth the compute cost for enterprise accuracy requirements.

Why multi-tenant architecture?

Enterprise customers require data isolation. RBAC enables team hierarchies (admin, analyst, viewer). Audit trails satisfy compliance requirements for regulated industries.

Technical Skills

Tools and technologies I work with daily

AI/ML & LLMs

LangChainLangGraphRAG PipelinesPrompt EngineeringFine-tuningPyTorchTensorFlowHugging FaceRAGAS EvaluationFew-shot Learning

Backend & APIs

FastAPIPythonREST APIsPydanticRedisSlowAPIAsync/AwaitRate Limiting

Databases

MongoDB AtlasWeaviateQdrantPostgreSQLPineconeRedis Cluster

DevOps & Deployment

DockerDocker ComposeCI/CDGitHub ActionsRailwayHuggingFace SpacesPrometheusLangSmith

LLM Providers

OpenAI GPT-4oGoogle Gemini 2.0Groq (Mixtral/Llama)SambaNovaClaudeLlama

Security & Monitoring

JWT AuthRBACCORS ProtectionPrompt Injection DefenseAudit LoggingDistributed Tracing

My Approach to Building AI Systems

Stable, deployment-ready systems with Docker, CI/CD, automated testing, and monitoring for real-world impact

"Stable, deployment-ready AI systems with Docker, CI/CD, automated testing, and monitoring for real-world impact"

Business Requirements & Problem Validation

Define core business problem and quantifiable ROI metrics. Research existing solutions and identify competitive advantages. Establish success criteria (latency SLAs, accuracy thresholds, cost per prediction).

Reliability & Quality Strategy

Implement hallucination reduction mechanisms (RAG, fine-tuning, prompt engineering, validation layers). Design robustness strategies (error handling, fallback mechanisms, circuit breakers). Define evaluation metrics aligned with business KPIs.

System Design & Architecture

Optimize for latency, throughput, and scalability requirements. Evaluate model selection vs. API trade-offs (local deployment vs. cloud APIs). Design infrastructure (compute resources, caching, database schema). Plan cost optimization and resource utilization.

Evaluation & Benchmarking

Establish baseline metrics (accuracy, precision, recall, F1-score, latency). Perform comparative analysis against baselines and competitors. Conduct A/B testing and validation on hold-out datasets. Use production-representative data for realistic assessment.

Integration & Deployment

Design clean APIs with comprehensive documentation. Ensure backward compatibility and semantic versioning. Implement CI/CD pipelines for automated testing and deployment. Plan rollout strategy (blue-green or canary deployments).

Production Monitoring & Observability

Monitor real-time metrics (latency, error rates, token usage, cost). Implement comprehensive logging and distributed tracing. Set up alerting for SLA violations and anomalies. Handle concurrent users with rate limiting and graceful degradation.

Continuous Improvement

Analyze production data and user feedback. Iterate on model performance with real-world insights. Optimize costs and performance based on deployment data. Maintain feedback loops for model retraining and feature development.

Core Beliefs

PrincipleWhy It MattersHow I Apply It

Stability + Deployment

Anyone can run notebooks. Few can build stable, deployment-ready systems.

Every project I build has deployment-ready architecture (Docker, FastAPI, testing) from day one.

Hallucination Reduction

70-85% of AI initiatives fail to meet expected outcomes due to trust issues.

My RAG systems use citation-backed answers — every claim is traceable to a source.

Indic AI First

600M+ potential Indic language market, <0.1% AI coverage.

Building localization pipelines at Pratilipi for underserved language users.

Latency is UX

Users don't wait. If it's slow, it's broken.

Hybrid search (BM25 + semantic), caching, and optimized inference pipelines to ensure sub-second response times.

Measure Everything

Can't improve what you can't measure.

RAGAS evaluation, latency monitoring, accuracy benchmarks in every project.

Vision & Future Goals

5-Year Plan: Evolving into a full-fledged AI/ML Architect

GOAL

Evolve into a full-fledged AI/ML Architect specializing in AI Ops, MLOps, and LLMOps — building the stable, scalable backends that power the next generation of AI systems.

I am not just building models — I am building the infrastructure that makes them reliable.

AI Ops & Observability

Mastering the art of monitoring, logging, and debugging complex AI pipelines in production.

LLM Ops

Building robust evaluation frameworks and deployment pipelines for Large Language Models.

Scalable Backends

Architecting distributed systems that can handle millions of inferences with high availability.

Let's Connect

Actively seeking AI/ML roles — let's build something amazing together

LOCATION: Bengaluru, India (Open to Remote)

Or a dynamic startup who needs a reliable person to grow your startup with Tech, AI Operations and business goals and brings in ambitions to grow with your startup. I am the right guy for you to build the next Tech Giant.

collabwithhemantgenai@gmail.com LinkedIn GitHub Hugging Face

View Resume on Google Drive