Yashaswini portrait

Hello, my name is

Yashaswini Madineni

I build at the intersection of AI and software engineering, creating solutions that scale and think. I'm a Master's student in Applied Data Science at San Jose State University, and I bring 2 years of experience as a Software Engineer working on ETL pipelines, distributed systems, and cloud-native applications. My projects focus on Generative AI, Agentic AI, LLMs, deep learning, scalable ML systems, and Big Data workflows, where I've combined AI and data engineering to solve real-world challenges.

I'm passionate about building solutions that bring together AI and software systems, and I'm actively seeking full-time opportunities in Software Engineering (AI/ML), AI Engineering, and Data Engineering.

When I'm not building AI agents, I'm acting like one by automating job applications at scale!

Computer Science
Bachelor of Engineering

2018–2022

Data Science Intern
Tequed Labs

Jan 2022 - May 2022

Software Engineer
Tata Consultancy Services

2022–2023

Masters in Applied Data Science
San Jose State University

2023–2025

Things I've Built

LLM-Powered Text-to-SQL with AI Agents

A GenAI pipeline that translates natural language into SQL queries using fine-tuned LLaMA-3 and Mistral on the Spider dataset. Integrated LangChain agents (generation, clarification with Gemini, optimization with OpenAI GPT-3.5), and deployed with a Flask backend and HTML/CSS/JS frontend. Model hosted on Hugging Face with Docker & CI/CD support.

LangChain LLaMA-3 Mistral Gemini OpenAI GPT-3.5 Flask SQLite HTML CSS Docker CI/CD
GitHub ↗

Agentic AI Systems: MultiModal RAG, Stable Diffusion Fine-Tuning & Travel Assistant

Developed an end-to-end AI system combining multi-modal RAG, generative model fine-tuning, and an agentic AI travel assistant. The RAG system retrieved text & images from PDFs and used Gemini for grounded Q&A, Stable Diffusion was fine-tuned with LoRA for custom image generation, and a multi-agent assistant (Flight, Weather, Hotel, Itinerary) orchestrated APIs to build dynamic travel plans.

Python PyMuPDF Sentence Transformers FAISS Gemini Stable Diffusion v1.5 LoRA PEFT CLIP Inception Score Requests Agentic AI
GitHub ↗

Substance Use Prediction & Pattern Analysis using Machine Learning

Designed an end-to-end Machine learning pipeline for analyzing and predicting substance use patterns. Conducted extensive data preprocessing and exploratory data analysis (EDA), applied PCA for dimensionality reduction, KMeans clustering for unsupervised insights, and multiple classifiers (Random Forest, Logistic Regression, Decision Trees, KNN) for supervised prediction. Addressed class imbalance with ADASYN to improve minority class recall.

Python Pandas NumPy Matplotlib Seaborn PCA KMeans Random Forest Logistic Regression KNN Imbalanced-learn (ADASYN)
GitHub ↗

Crypto Market Analytics with Big Data

Analyzed 48GB of Binance cryptocurrency trading data (2020–2024) using AWS EC2, Databricks, Apache Spark, Pandas, and PyArrow. Built scalable pipelines for data cleaning, preprocessing, and transformation, followed by volatility analysis, trading patterns, and liquidity insights. Developed interactive Tableau dashboards for visualization, showcasing distributed systems, cloud deployment, and big data engineering for financial analytics.

AWS EC2 Databricks Apache Spark Pandas PyArrow SQL Tableau Python
GitHub ↗

MediPredict — AI Full-Stack Healthcare Prediction System

Built a full-stack distributed healthcare application for symptom-based disease prediction. Integrated a React frontend with a Flask backend, multiple ML models, and secure cloud deployment on GCP. The system supports authentication, interactive consultations, and automated health report generation.

React Flask REST APIs Scikit-learn LogReg / SVM / RF / KNN Pandas NumPy Google Cloud App Engine Cloud Storage IAM
GitHub ↗

Deep Learning for Skin Cancer Segmentation & Classification (ViT-UNet)

Implemented a hybrid Vision Transformer (ViT) + U-Net model for automated skin cancer analysis. The system performs both segmentation (pixel-level lesion masks) and classification (benign vs. malignant), trained on ISIC dermoscopic datasets. Combined BCE + Dice loss for segmentation and weighted cross-entropy for classification to handle class imbalance, achieving strong performance on ISIC 2016 & 2017 benchmarks.

Deep Learning PyTorch Vision Transformer (ViT) U-Net ISIC Dataset Scikit-learn NumPy Pandas Matplotlib
GitHub ↗

Get In Touch

If you would like to work together or discuss an opportunity for work, feel free to reach out at or connect with me on LinkedIn.