Hi, I'm Advaith.
Data Scientist & ML Engineer with 3+ years of experience building AI-driven solutions — from predictive models and RAG pipelines to scalable data platforms. Currently pursuing my M.S. in Computer Science at Arizona State University with a perfect 4.0 GPA.
I'm a Master's student in Computer Science at Arizona State University, graduating in May 2026. Before grad school, I spent 3+ years at Wabtec Corporation as a Data Scientist, where I built ML models for locomotive pricing, automated OCR pipelines, and helped design operational chatbots.
Currently, I'm working as a Software Developer at the ASU Office of University Affairs, where I engineer RAG-based AI assistants, build NLP-driven pipelines, create Tableau visualizations, and ensure WCAG 2.x and ADA compliance. I'm passionate about the intersection of AI, data engineering, and applied machine learning.
I thrive in fast-paced environments where I can combine analytical thinking with full-stack engineering to deliver solutions that make a real difference.
Professional Journey
From engineering intern to data scientist — building impactful solutions across industries.
- Engineered a RAG-based AI assistant using large language models, vector search, and Python to automate activity creation workflows, cutting manual content entry time by 60%.
- Contributed to the Collaboratory web platform by implementing bug fixes and UI enhancements using JavaScript, React, Golang, Docker, and PostgreSQL, improving system usability and overall application stability.
- Engineered an LLM-powered onboarding chatbot using LangChain, GPT-4, ChromaDB, and HuggingFace Sentence Transformers — implementing a RAG pipeline over internal PDF/Markdown docs to answer process queries and auto-generate personalized onboarding schedules, reducing agenda creation time by 30%.
- Developed 9 Tableau visualizations for St. Mary's Food Bank analyzing donation patterns, volume trends, and regional distribution, surfacing insights that informed donor outreach strategy.
- Built a resume screening pipeline in Python using PyPDF2 and regex-based keyword extraction to parse candidate PDFs, score applicants against job description criteria, and auto-populate Excel trackers via openpyxl — reducing manual screening time by 75% for student worker hiring.
- Achieved WCAG 2.x and ADA compliance across the Collaboratory platform by implementing ARIA labeling, keyboard navigation, and semantic HTML for diverse user accessibility.
- Designed and deployed an ML pricing model using Random Forest to forecast locomotive part costs, achieving 87% prediction accuracy and enabling data-driven procurement decisions across the supply chain.
- Created a production NLP classification pipeline using TF-IDF and SVM to automatically extract root causes from unstructured engineering logs, improving diagnostic accuracy by 32% and accelerating troubleshooting workflows.
- Implemented an ETL pipeline leveraging Fuzzy Matching and Cosine Similarity to deduplicate 120K customer records in a large-scale MDM dataset, reducing data redundancy by 40% and cutting downstream processing time by 25%.
- Designed an OCR document processing service using AWS Textract, S3, and EC2 to extract and structure data from unstructured documents at scale, reducing manual data entry time by 30%.
- Owned Jenkins and Chef CI/CD workflows across multiple services and conducted systematic code reviews, maintaining production stability and ensuring zero-downtime releases.
Things I've Built
A selection of projects combining AI, data engineering, and software development.
Developed an end-to-end AI assistant for medical Q&A on diabetes and cardiovascular diseases using a retrieval-augmented generation architecture with vector search, achieving 85% relevance score with sub-8-second response time.
Built a production-grade intrusion detection system on the NSL-KDD dataset (125K+ records), implementing automated retraining pipelines, model lifecycle management, real-time inference APIs, and drift detection.
Applied ARIMA-based time series modeling on multi-region historical epidemiological data to forecast dengue case counts, enabling health resource planning.
Built an end-to-end ELT pipeline to ingest and transform Airbnb listings data, incorporating data quality tests, and generated analytics on a Streamlit dashboard.
Technical Arsenal
Technologies and tools I use to build AI-powered, data-driven applications.
Academic Background
Let's Connect
I'm always open to discussing new opportunities, interesting projects, or just saying hi.