AI Operations & Localization Consultant @ Pratilipi Comics

Hemant Sudarshan

Aspiring AI Engineer specializing in MLOps, LLMOps & Agentic Systems

CS Graduate @ DSU Bangalore

Building Production-Ready AI Systems

6+ AI Systems Built | 1 Patent + 2 Publications

6+ Stable Deployment-Ready AI Systems1 Patent + 2 PublicationsLLM & RAG ExpertAI Operations & Localization
Role: AI Localization ConsultantAI Operations & LocalizationSystems: 6+ Stable Deployment ReadyResearch: 1 Patent + 2 Publications

What I Bring to Your Table

Solving real AI problems with production-ready solutions

Build RAG Systems

Citation-backed, hallucination-reduced RAG pipelines

Compliance-GPT, AudioRaG Enterprise

Integrate LLMs

Prompt engineering, multi-model orchestration

GPT-4o, Claude, Llama, Gemini projects

Ship to Production

FastAPI + Docker + Cloud deployments

6+ systems with Docker, CI/CD, monitoring

Multimodal AI

Vision + Audio + Text fusion systems

Medical Chatbot, Audio-RAG

Indic Language AI

Hindi/Telugu/Kannada NLP for Indic market

Pratilipi Comics localization

Research + Execution

Academic rigor meets industry speed

1 Patent + 2 Publications

Featured AI Projects & Systems

Production-ready systems built with modern AI architecture

PythonFastAPIDockerMongoDBRedisReactPyTorchTensorFlow
Multi-Agent AI

Agentic Inventory Restocking Service

Multi-Agent AI System for Autonomous Inventory Management. Traditional inventory systems trigger false alarms by not distinguishing between genuine supply crises and natural demand fluctuations, leading to unnecessary restocking and inventory bloat.

System Objectives

  • Autonomously analyze demand patterns using time-series forecasting
  • Differentiate between crisis situations and declining demand trends
  • Generate purchase/transfer orders with confidence scoring (0-100%)
  • Reduce manual overhead by 95% through AI-driven decision-making
LangGraphMongoDBFastAPIGemini/GroqRedisSlowAPIPydanticDocker
  • Dual-model (Gemini 2.0 + Groq) for 99.9% availability with automatic failover
  • Human-in-loop for <95% confidence decisions via Telegram inline approve/reject
  • Multi-channel notifications (Telegram Bot, Slack Webhooks, Web Dashboard)
  • Confidence scoring (0-100%) via LLM reasoning with softmax outputs
  • Safety stock calculations to distinguish expected fluctuations from true shortages
  • Rate limiting via SlowAPI (30 req/min per IP) for DDoS protection
  • LangSmith tracing for all AI calls; Prometheus metrics for infra
RAG System

Compliance-GPT

Enterprise RAG System with Zero-Hallucination Citations. Compliance professionals spend 200+ hours per quarter manually searching GDPR, CCPA, PCI-DSS regulations, costing organizations $300K+ annually. Hallucination in general-purpose ChatGPT creates legal liability.

System Objectives

  • Provide citation-backed compliance answers in <2 seconds (vs. 20+ minutes manual)
  • Achieve 100% accuracy through retrieval-backed generation (eliminate hallucinations)
  • Support multi-regulation queries (GDPR, CCPA, PCI-DSS, HIPAA, SOX)
  • Enable audit trails for compliance documentation
WeaviateFastAPIGroqDockerLangChainHuggingFaceSlowAPIPydantic
  • Hybrid search (BM25 + semantic) via Weaviate's built-in fusion algorithm
  • 1,987+ regulation chunks from official EDPB, ICO, NIST PDFs with citation provenance
  • 80+ automated tests with CI/CD pipeline via GitHub Actions
  • Query expansion: LLM generates 5-10 synonym/related-term variants per query
  • Groq LLM: Mixtral-7B for simple queries, Llama-70B for complex regulatory parsing
  • Citation engine with chunk-level provenance: filename + page number + character offsets
  • Prompt injection defense: Pydantic validation + regex filtering for SQL/prompt attack patterns
  • Fallback web search via DuckDuckGo API when confidence <70%
Audio + RAG

AudioRAG Enterprise

AI Audio Analytics with Multi-Tenant Security & Domain Expertise. Organizations accumulate massive audio archives (meetings, calls, interviews) but lack tools to efficiently query insights at scale.

System Objectives

  • Transcribe audio (with speaker diarization) at scale
  • Enable semantic search over audio content in 2-3 seconds
  • Support domain-specific vocabularies (Healthcare, Legal, Finance)
  • Multi-tenant architecture with RBAC and audit logging
AssemblyAIQdrantSambaNovaRedisFastAPIBGE-LargeJWTDocker
  • Speaker diarization (who spoke when) with timestamp metadata for seeking
  • Multi-tenant RBAC architecture with JWT-based tenant isolation
  • Healthcare, Legal, Finance domain-specific vocabulary support
  • BGE-Large embeddings (1024-dim) superior for domain docs, batch size 100 GPU optimized
  • Qdrant HNSW indexing with 3 replicas for high availability
  • PII Redaction option for HIPAA/GDPR compliance
  • Overlapping chunks (256 tokens, overlap: 32) preserving speaker identity
  • Audit logging for every query with user ID, timestamp, result quality score
Deepfake + Fake News Detection

TruthTracker (AntiAi)

Multi-modal AI system for detecting deepfakes and misinformation using EfficientNet-B4 for image forensics paired with NLP-based fake news analysis.

EfficientNet-B4FastAPIReactPyTorch
  • EfficientNet-B4 for image forensic analysis
  • NLP-based fake news detection pipeline
  • Real-time processing with FastAPI backend
Multi-Agent AI

AI Real Estate Agent

Automated property search and investment insights using multi-agent AI with web scraping, market analysis, and personalized recommendations.

Gemini AIFirecrawlRedisPython
  • Autonomous property search and filtering
  • Investment ROI analysis with market data
  • Multi-agent coordination for complex queries

Detailed technical deep dives for the top 3 featured projects below — including architecture diagrams, tech stack analysis, and key technical decisions.

Deep Dive: Featured Projects

Architecture diagrams, tech stack analysis, and key technical decisions

Agentic Inventory Restocking Service

Multi-Agent AI System for Autonomous Inventory Management

Architecture & Data Flow

A

Inventory Trigger (CSV/MongoDB)

Historical demand (6-12 months), current stock levels, lead times & reorder points (ROP), safety stock calculations

B

LangGraph Agentic Workflow → Data Loader

Loads historical demand, current stock, lead times, reorder points, safety stock calculations

C

AI Reasoning Engine

Gemini 2.0 (with Groq fallback): analyze demand trends, detect anomalies & demand spikes, crisis vs. natural decline logic, confidence scoring via prompting

D

Action Generator

Structured JSON output: Purchase Orders (external suppliers), Warehouse Transfer Orders (internal), confidence & reasoning trail

E

Multi-Channel Notifications

Telegram Bot (inline approve/reject), Slack Webhooks (team channels), Web Dashboard (real-time monitoring)

Critical Technical Components

ComponentPurposeWhy It's Critical
LangGraphAgentic orchestration frameworkEnables autonomous multi-step decision workflows; state management across agent steps
Gemini 2.0 + Groq FallbackLLM backbone for reasoningDual-model approach ensures 99.9% availability; Gemini for complex analysis, Groq for cost efficiency
MongoDB AtlasDocument-oriented databaseFlexible schema for inventory items; auto-scaling handled by Atlas
Safety Stock CalculationsDemand variance quantificationDistinguishes between expected fluctuations and true shortages (confidence scoring)
FastAPI + SlowAPIProduction backend + rate limitingSub-second response times; DDoS protection; typed Python models via Pydantic
Redis CacheHigh-speed order lookup1000x faster than MongoDB for recent orders; reduces database load

Technology Stack (Deep Dive)

Backend Infrastructure
  • FastAPI with async/await for handling 1000+ concurrent requests
  • Session-based auth for dashboard, API-key based for external integrations
  • SlowAPI middleware (30 req/min per IP) to prevent abuse
AI/ML Engine
  • Primary Model: Google Gemini 2.0 (Reasoning mode enabled for complex inventory analysis)
  • Fallback Model: Groq (faster, cost-effective backup)
  • Prompt Engineering: Few-shot learning with historical order examples
  • Confidence Calibration: Softmax outputs from LLM reasoning to produce 0-100% confidence scores
Data Infrastructure
  • Production Database: MongoDB Atlas with multi-region replication
  • Caching Layer: Redis Cluster for session state + recent orders
  • Time-Series Analysis: Python's statsmodels for ARIMA forecasting
  • Data Validation: Pydantic models ensuring data integrity
Deployment & Monitoring
  • Containerization: Docker with multi-stage builds for minimal image size
  • Orchestration: Railway.app for automatic scaling based on CPU/memory
  • Observability: LangSmith tracing for all AI calls; Prometheus metrics for infra
  • CI/CD: GitHub Actions with automated testing on every push

Key Technical Decisions & Rationale

Why LangGraph over traditional state machines?

Agents can make dynamic decisions about next steps, enabling adaptive workflows. Built-in memory/state management prevents information loss across steps. Reduces boilerplate code by 60% compared to manual orchestration.

Why MongoDB + Redis hybrid?

MongoDB: flexible schema for heterogeneous inventory items, automatic scaling. Redis: sub-millisecond lookups for recent orders, human-in-loop approvals. Better than single DB approach for latency-sensitive operations.

Why Gemini + Groq dual-model?

Gemini: superior reasoning for demand pattern analysis ($0.075/1M tokens input). Groq: 10x faster inference for simple calculations ($0.10/1M tokens). Failover strategy ensures uptime even during API disruptions.

Why human-in-loop for <95% confidence?

AI uncertainty manifests as edge cases; humans catch outliers LLM model can't. Telegram approval system reduces friction — instant mobile notifications. Audit trail for compliance and continuous model improvement.

Compliance-GPT

Enterprise RAG System with Zero-Hallucination Citations

Architecture & Data Flow

1

User Query → FastAPI Endpoint

Rate Limited @ 30 req/min

2

Query Expansion

"breach" → ["personal data breach" + "Article 33 notification" + "72 hours" + "supervisory authority"]

3

Weaviate Vector Search

BM25 Keyword Index (exact term matching) + Semantic Vectors (cross-lingual understanding) → Returns top-5 relevant chunks with source metadata

4

Prompt Engineering (Citation-Aware)

"Use ONLY the provided context. If not found, say so. Include [Page X, File Y] inline citations."

5

Groq LLM Generation

7B Mixtral → 70B Llama for complex queries. Latency: 200-400ms (vs. 2-5s for GPT-4). Cost: $0.10/1M tokens (vs. $15/1M for GPT-4)

6

Citation Formatting & Response

"Article 33 GDPR requires notification within 72 hours [GDPR-EN.pdf, Page 34, Chunk 2]" → Response Cache (5min TTL, Redis) → JSON with Citations + Metadata

Critical Technical Components

ComponentPurposeWhy It's Critical
WeaviateVector + keyword searchBM25 algorithm for exact matches + BERT embeddings for semantics
Query ExpansionMulti-term semantic understandingLLM generates 5-10 synonym/related-term variants per query
Groq LLMFast, cost-effective generationMixtral-7B for simple queries, Llama-70B for complex regulatory parsing
Citation EngineSource metadata preservationChunk-level provenance: filename + page number + character offsets
Security LayerEnterprise hardeningRate limiting (SlowAPI) + HTTPS enforcement + CORS (no wildcard) + admin auth
Prompt Injection DefenseInput sanitizationPydantic validation + regex filtering for SQL/prompt attack patterns

Technology Stack (Deep Dive)

Knowledge Base Preparation
  • Document Ingestion: 1,987+ chunks from official regulation PDFs (EDPB, ICO, NIST)
  • Chunking Strategy: Overlapping chunks (size: 512 tokens, overlap: 64 tokens)
  • Metadata preservation: source filename, page numbers, regulation type, section headers
  • Embedding Model: HuggingFace sentence-transformers/all-MiniLM-L6-v2 (384-dim, compatible with Weaviate)
Retrieval-Generation Pipeline
  • Vector Database: Weaviate Cloud (managed service, auto-scaling)
  • Hybrid Search: Weaviate's built-in fusion algorithm (BM25 + semantic score combination)
  • LLM Orchestration: LangChain → Groq API
  • Fallback Strategy: If confidence <70%, trigger web search via DuckDuckGo API for newest regulations
Production Hardening
  • Rate Limiting: SlowAPI (30 req/min/IP), with exponential backoff
  • HTTPS Enforcement: Production environments block HTTP, cert auto-renewal via Certbot
  • CORS Protection: Whitelist specific origins (no * wildcard)
  • Admin Dashboard: Protected by token-based auth (FastAPI Security dependencies)
  • Audit Logging: Every query logged with user ID, timestamp, result quality score
Deployment & Observability
  • Containerization: Docker Compose for local dev (includes Weaviate + Groq proxy)
  • Live Environment: HuggingFace Spaces (free tier) with auto-redeployment on git push
  • Monitoring: Prometheus metrics (query latency P50/P95/P99, cache hit rate, hallucination detection)
  • CI/CD: GitHub Actions runs 80+ tests before deployment

Key Technical Decisions & Rationale

Why Weaviate over Pinecone/Qdrant?

Built-in BM25 eliminates need for separate keyword search infrastructure. Hybrid search (BM25 + semantic) reduces hallucinations in legal domain. No vendor lock-in; can self-host for on-premise compliance.

Why Groq instead of GPT-4 or Claude?

10x faster inference (200ms vs. 2-5s). 67x cheaper ($0.10 vs. $15 per 1M input tokens). Sufficient reasoning capability for regulation parsing. Free tier allows bootstrapping without large budgets.

Why citation-level provenance matters?

Legal liability: every claim must be traceable to official document. Audit trail: regulators require evidence of due diligence. User trust: transparent sourcing enables verification.

Why query expansion + fallback web search?

Regulations evolve; new amendments rare in documents but critical. Query expansion catches synonyms humans might use ("unauthorized access" → "breach"). Web search (EDPB official guidance) fills gaps in local knowledge base.

AudioRAG Enterprise

AI Audio Analytics with Multi-Tenant Security & Domain Expertise

Architecture & Data Flow

1

Audio Upload (MP3/WAV/OGG)

Raw Bytes → S3 / Local Storage

2

AssemblyAI Async Job

Speech-to-Text (99% accuracy), Speaker Diarization (who spoke when), PII Redaction (HIPAA/GDPR compliance option)

3

Transcription Split into Chunks

Preserve speaker identity: "[Speaker A]: ...", timestamp metadata for seeking, overlap chunks (256 tokens, overlap: 32)

4

Embedding Generation (Batch)

Model: BGE-Large (1024-dim, superior for domain docs), batch size: 100 (GPU optimized), Qdrant indexing async

5

Store in Qdrant Vector DB

Payload: metadata (speaker, timestamp, domain), Index type: HNSW (fast nearest-neighbor search), 3 replicas for HA

6

User Query (Multi-Tenant Isolation)

JWT decode → tenant_id extraction, query vector embedding (real-time), metadata filter: WHERE tenant_id = {authenticated_tenant}, Qdrant similarity search (top-20 results)

Critical Technical Components

ComponentPurposeWhy It's Critical
AssemblyAISpeech-to-Text engine99% accuracy, speaker diarization, PII redaction for HIPAA/GDPR
QdrantVector database with metadata filteringHNSW indexing for fast nearest-neighbor search, 3 replicas for HA
SambaNovaFast LLM inferenceDomain-specific vocabulary injection for Healthcare, Legal, Finance
BGE-LargeEmbedding model (1024-dim)Superior performance for domain documents, GPU-optimized batch processing
JWT + RBACMulti-tenant securityTenant isolation via JWT token, role-based access control
RedisSession + cache layerFast session management and query result caching

Technology Stack (Deep Dive)

Audio Processing
  • AssemblyAI async transcription with webhook callbacks
  • Speaker diarization preserving 'who said what' context
  • PII redaction for sensitive data (HIPAA/GDPR)
  • Support for MP3, WAV, OGG formats
Search & Retrieval
  • BGE-Large embeddings (1024-dim) for domain-specific accuracy
  • Qdrant vector DB with HNSW indexing and metadata filtering
  • Multi-tenant isolation: queries filtered by authenticated tenant_id
  • Domain-specific vocabulary injection (Healthcare, Legal, Finance)
Security & Multi-Tenancy
  • JWT-based authentication with tenant_id extraction
  • Role-Based Access Control (RBAC) for team management
  • Audit logging: every query logged with user ID, timestamp, quality score
  • Data isolation: tenants can never access each other's audio data
Infrastructure
  • Docker containerization for consistent deployments
  • Redis for session management and query caching
  • S3-compatible storage for raw audio files
  • Async processing pipeline for non-blocking uploads

Key Technical Decisions & Rationale

Why AssemblyAI over Whisper?

Built-in speaker diarization saves weeks of integration. PII redaction out-of-the-box for compliance. 99% accuracy with async webhook callbacks for scale.

Why Qdrant over Weaviate for this project?

Superior metadata filtering (critical for multi-tenant isolation). HNSW indexing optimized for high-dimensional BGE-Large vectors. Native tenant_id filtering in query API.

Why BGE-Large over MiniLM?

1024-dim vs 384-dim captures more semantic nuance for domain-specific audio content. Superior performance on domain documents (legal, medical, financial terminology). Worth the compute cost for enterprise accuracy requirements.

Why multi-tenant architecture?

Enterprise customers require data isolation. RBAC enables team hierarchies (admin, analyst, viewer). Audit trails satisfy compliance requirements for regulated industries.

Technical Skills

Tools and technologies I work with daily

AI/ML & LLMs

LangChainLangGraphRAG PipelinesPrompt EngineeringFine-tuningPyTorchTensorFlowHugging FaceRAGAS EvaluationFew-shot Learning

Backend & APIs

FastAPIPythonREST APIsPydanticRedisSlowAPIAsync/AwaitRate Limiting

Databases

MongoDB AtlasWeaviateQdrantPostgreSQLPineconeRedis Cluster

DevOps & Deployment

DockerDocker ComposeCI/CDGitHub ActionsRailwayHuggingFace SpacesPrometheusLangSmith

LLM Providers

OpenAI GPT-4oGoogle Gemini 2.0Groq (Mixtral/Llama)SambaNovaClaudeLlama

Security & Monitoring

JWT AuthRBACCORS ProtectionPrompt Injection DefenseAudit LoggingDistributed Tracing

My Approach to Building AI Systems

Stable, deployment-ready systems with Docker, CI/CD, automated testing, and monitoring for real-world impact

"Stable, deployment-ready AI systems with Docker, CI/CD, automated testing, and monitoring for real-world impact"

01

Business Requirements & Problem Validation

Define core business problem and quantifiable ROI metrics. Research existing solutions and identify competitive advantages. Establish success criteria (latency SLAs, accuracy thresholds, cost per prediction).

02

Reliability & Quality Strategy

Implement hallucination reduction mechanisms (RAG, fine-tuning, prompt engineering, validation layers). Design robustness strategies (error handling, fallback mechanisms, circuit breakers). Define evaluation metrics aligned with business KPIs.

03

System Design & Architecture

Optimize for latency, throughput, and scalability requirements. Evaluate model selection vs. API trade-offs (local deployment vs. cloud APIs). Design infrastructure (compute resources, caching, database schema). Plan cost optimization and resource utilization.

04

Evaluation & Benchmarking

Establish baseline metrics (accuracy, precision, recall, F1-score, latency). Perform comparative analysis against baselines and competitors. Conduct A/B testing and validation on hold-out datasets. Use production-representative data for realistic assessment.

05

Integration & Deployment

Design clean APIs with comprehensive documentation. Ensure backward compatibility and semantic versioning. Implement CI/CD pipelines for automated testing and deployment. Plan rollout strategy (blue-green or canary deployments).

06

Production Monitoring & Observability

Monitor real-time metrics (latency, error rates, token usage, cost). Implement comprehensive logging and distributed tracing. Set up alerting for SLA violations and anomalies. Handle concurrent users with rate limiting and graceful degradation.

07

Continuous Improvement

Analyze production data and user feedback. Iterate on model performance with real-world insights. Optimize costs and performance based on deployment data. Maintain feedback loops for model retraining and feature development.

Core Beliefs

Stability + Deployment

Anyone can run notebooks. Few can build stable, deployment-ready systems.

Every project I build has deployment-ready architecture (Docker, FastAPI, testing) from day one.

Hallucination Reduction

70-85% of AI initiatives fail to meet expected outcomes due to trust issues.

My RAG systems use citation-backed answers — every claim is traceable to a source.

Indic AI First

600M+ potential Indic language market, <0.1% AI coverage.

Building localization pipelines at Pratilipi for underserved language users.

Latency is UX

Users don't wait. If it's slow, it's broken.

Hybrid search (BM25 + semantic), caching, and optimized inference pipelines to ensure sub-second response times.

Measure Everything

Can't improve what you can't measure.

RAGAS evaluation, latency monitoring, accuracy benchmarks in every project.

Vision & Future Goals

5-Year Plan: Evolving into a full-fledged AI/ML Architect

GOAL

Evolve into a full-fledged AI/ML Architect specializing in AI Ops, MLOps, and LLMOps — building the stable, scalable backends that power the next generation of AI systems.

I am not just building models — I am building the infrastructure that makes them reliable.

AI Ops & Observability

Mastering the art of monitoring, logging, and debugging complex AI pipelines in production.

LLM Ops

Building robust evaluation frameworks and deployment pipelines for Large Language Models.

Scalable Backends

Architecting distributed systems that can handle millions of inferences with high availability.

Let's Connect

Actively seeking AI/ML roles — let's build something amazing together

OPEN TO: AI/ML Intern | Entry-Level AI Engineer | LLMOps Entry Level | Entry Level RAG Systems Developer | Founder Office - AI Engineer | Product Development Associate or Intern
LOCATION: Bengaluru, India (Open to Remote)

Or a dynamic startup who needs a reliable person to grow your startup with Tech, AI Operations and business goals and brings in ambitions to grow with your startup. I am the right guy for you to build the next Tech Giant.