Kshitij
Mishra
AI Quality Analyst | LLM Evaluation | Data Analyst | Prompt Engineer
AI Quality Analyst at Tavus AI, evaluating production LLM and voice AI systems. Reduced hallucination rates by ~30%, optimized 10+ retrieval workflows, and cut review turnaround by 20%.
I make AI systems
more reliable.
I'm an AI Quality Analyst based in Noida, India. My day-to-day at Tavus AI involves evaluating production LLM and voice AI outputs, building structured feedback loops, and designing test frameworks that catch problems before they reach users.
Proficient in Python and SQL for data analysis, fluent in LLM evaluation methodology, and experienced with voice AI tooling including Whisper, Google STT, and ElevenLabs. Outside core work I run a parallel track in quantitative crypto market analysis — studying volatility regimes and market microstructure.
I also maintain kshitij.info — a personal portfolio and engineering blog where I regularly publish technical articles on topics from Kalman filters to crypto microstructure and data pipeline design.
Seeking roles in AI/ML engineering, data analysis, QA testing, or prompt engineering where structured evaluation thinking matters.
Work history
- Evaluated 500+ AI-generated voice and video outputs per month, maintaining production quality standards across live workflows.
- Reduced model hallucination rate by ~30% by designing structured feedback reports delivered to engineering and research teams.
- Built and optimised LLM prompt test suites, improving output consistency and correctness across 10+ retrieval workflows.
- Developed scoring rubrics and benchmarking criteria to compare prompt variants and track output degradation over time.
- Supported deployment readiness of 5+ AI features by validating performance across real-world multi-scenario test cases.
- Streamlined evaluation pipelines with cross-functional teams, cutting review turnaround time by 20%.
- Analyzed customer behavior and partner performance data to identify service gaps, improving efficiency by 15%.
- Maintained structured weekly performance reports used by operations leadership for data-driven decision-making.
- Optimised partner onboarding workflow using data insights, reducing average onboarding time by 10%.
Things I've built
AI Voice & Output
Evaluation Framework
Designed a production-style evaluation system for LLM and voice AI outputs with structured scoring for hallucination detection, logical consistency, and response quality. Built reusable evaluation pipelines and scoring frameworks to improve output reliability and consistency across testing workflows.
Crypto Market
Analysis System
Built a Streamlit-based crypto analytics system to track volatility regimes, liquidity conditions, and trend structure using time-series analysis and Python data pipelines. Added statistical indicators and visualization layers to turn raw market data into usable insights.
User Failure &
Retention Dashboard
Built a data-driven dashboard to analyze user failure patterns and their impact on retention in AI-based workflows. Tracked retry cycles, drop-off points, and cohort retention across 500 users to identify key bottlenecks affecting user success and experience.
Engineering log
Open to new opportunities.
Looking for roles in AI/ML engineering, data analysis, QA testing, or prompt engineering. Based in Noida, India — I respond to every message.