Open to Opportunities

Hi, I'm Advaith.

Data Scientist & ML Engineer with 3+ years of experience building AI-driven solutions — from predictive models and RAG pipelines to scalable data platforms. Currently pursuing my M.S. in Computer Science at Arizona State University with a perfect 4.0 GPA.

3+
Years Experience
4.0
GPA at ASU

I'm a Master's student in Computer Science at Arizona State University, graduating in May 2026. Before grad school, I spent 3+ years at Wabtec Corporation as a Data Scientist, where I built ML models for locomotive pricing, automated OCR pipelines, and helped design operational chatbots.

Currently, I'm working as a Software Developer at the ASU Office of University Affairs, where I engineer RAG-based AI assistants, build NLP-driven pipelines, create Tableau visualizations, and ensure WCAG 2.x and ADA compliance. I'm passionate about the intersection of AI, data engineering, and applied machine learning.

I thrive in fast-paced environments where I can combine analytical thinking with full-stack engineering to deliver solutions that make a real difference.

🤖
AI & Machine Learning
RAG, LLMs, NLP, Deep Learning
📊
Data Science
Predictive Modeling, Analytics
🔧
Data Engineering
ETL/ELT, Pipelines, Cloud
💻
Full-Stack Dev
React, Go, Flask, Docker

Professional Journey

From engineering intern to data scientist — building impactful solutions across industries.

Software Developer
ASU Office of University Affairs
Mar 2025 – Present
  • Engineered a RAG-based AI assistant using large language models, vector search, and Python to automate activity creation workflows, cutting manual content entry time by 60%.
  • Contributed to the Collaboratory web platform by implementing bug fixes and UI enhancements using JavaScript, React, Golang, Docker, and PostgreSQL, improving system usability and overall application stability.
  • Engineered an LLM-powered onboarding chatbot using LangChain, GPT-4, ChromaDB, and HuggingFace Sentence Transformers — implementing a RAG pipeline over internal PDF/Markdown docs to answer process queries and auto-generate personalized onboarding schedules, reducing agenda creation time by 30%.
  • Developed 9 Tableau visualizations for St. Mary's Food Bank analyzing donation patterns, volume trends, and regional distribution, surfacing insights that informed donor outreach strategy.
  • Built a resume screening pipeline in Python using PyPDF2 and regex-based keyword extraction to parse candidate PDFs, score applicants against job description criteria, and auto-populate Excel trackers via openpyxl — reducing manual screening time by 75% for student worker hiring.
  • Achieved WCAG 2.x and ADA compliance across the Collaboratory platform by implementing ARIA labeling, keyboard navigation, and semantic HTML for diverse user accessibility.
Python JavaScript/React LLMs RAG PostgreSQL Tableau Docker WCAG/ADA
Data Scientist
Wabtec Corporation
Jul 2021 – Aug 2024
  • Designed and deployed an ML pricing model using Random Forest to forecast locomotive part costs, achieving 87% prediction accuracy and enabling data-driven procurement decisions across the supply chain.
  • Created a production NLP classification pipeline using TF-IDF and SVM to automatically extract root causes from unstructured engineering logs, improving diagnostic accuracy by 32% and accelerating troubleshooting workflows.
  • Implemented an ETL pipeline leveraging Fuzzy Matching and Cosine Similarity to deduplicate 120K customer records in a large-scale MDM dataset, reducing data redundancy by 40% and cutting downstream processing time by 25%.
  • Designed an OCR document processing service using AWS Textract, S3, and EC2 to extract and structure data from unstructured documents at scale, reducing manual data entry time by 30%.
  • Owned Jenkins and Chef CI/CD workflows across multiple services and conducted systematic code reviews, maintaining production stability and ensuring zero-downtime releases.
Python Random Forest TF-IDF SVM AWS Textract S3 EC2 CI/CD

Things I've Built

A selection of projects combining AI, data engineering, and software development.

🏥
TrustMed AI
85% relevance · <8s response

Developed an end-to-end AI assistant for medical Q&A on diabetes and cardiovascular diseases using a retrieval-augmented generation architecture with vector search, achieving 85% relevance score with sub-8-second response time.

RAG AWS Python Pinecone LangGraph LLaMA
🛡️
Intrusion Detection System
>0.95 ROC-AUC

Built a production-grade intrusion detection system on the NSL-KDD dataset (125K+ records), implementing automated retraining pipelines, model lifecycle management, real-time inference APIs, and drift detection.

MLflow Airflow FastAPI Docker Evidently
🦟
Dengue Outbreak Forecasting

Applied ARIMA-based time series modeling on multi-region historical epidemiological data to forecast dengue case counts, enabling health resource planning.

Python ARIMA Time Series
🏠
Airbnb Listings ELT Pipeline

Built an end-to-end ELT pipeline to ingest and transform Airbnb listings data, incorporating data quality tests, and generated analytics on a Streamlit dashboard.

Snowflake dbt SQL Streamlit

Technical Arsenal

Technologies and tools I use to build AI-powered, data-driven applications.

Languages
Python SQL R JavaScript/TypeScript Go Shell Scripting
Machine Learning
Scikit-learn TensorFlow PyTorch Random Forest SVM TF-IDF ARIMA Apache Spark
Generative AI & NLP
RAG LLaMA LangGraph Prompt Engineering Vector Search Named Entity Recognition
Databases & Data Engineering
PostgreSQL Snowflake Pinecone dbt ETL/ELT Pipelines Fuzzy Matching MDM
Cloud & Infrastructure
AWS (S3, EC2, Textract) Azure Docker Jenkins Git CI/CD
Data Visualization & Analytics
Tableau Matplotlib Microsoft Excel Statistical Modeling A/B Testing
Frameworks & APIs
Flask FastAPI React REST APIs

Academic Background

Master of Science, Computer Science
Arizona State University
📍 Tempe, AZ, USA 📅 Aug 2024 – May 2026
CGPA: 4.00 / 4.00
Coursework: Statistical Machine Learning, Data Mining, Data Visualization, Data Processing, Data Structures, Software Testing
B.Tech, Computer Science
Manipal Academy of Higher Education
📍 Manipal, India 📅 Jul 2017 – Jul 2021
CGPA: 8.08 / 10.00
Coursework: Machine Learning, Pattern Recognition, Artificial Intelligence, Design of Algorithms, Distributed Database Systems

Let's Connect

I'm always open to discussing new opportunities, interesting projects, or just saying hi.