AI E-Mail Assistant
An AI-powered email assistant was designed to automate drafting, summarizing, and responding to emails using natural language processing techniques. The system integrates transformer-based models to streamline communication workflows and enhance productivity.
Cancer Diagnosis: Malignant vs. Benign Tumors
A machine learning pipeline was developed to classify tumors as malignant or benign using the Breast Cancer Wisconsin dataset. The models achieved over 98% accuracy, demonstrating the effectiveness of predictive algorithms in supporting oncology diagnostics.
Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines
A machine learning model was developed to predict H1N1 and seasonal flu vaccination uptake using survey data from the CDC. The solution achieved high predictive accuracy by analyzing behavioral, demographic, and health-related features to identify key drivers of vaccine hesitancy.
Heart Disease Prediction Using Neural Networks and Deep Learning
A deep learning model was built to predict the presence of heart disease using clinical data from the UCI Heart Disease dataset. The neural network achieved strong predictive performance, demonstrating the potential of advanced algorithms in supporting cardiovascular risk assessment.
ADIA Lab Market Prediction
A machine learning model was developed to predict stock market movements using high-frequency trading data. The approach leveraged feature engineering and a Siamese Neural Network to capture complex patterns in market behavior for improved forecasting accuracy.
Swiss Re: Predict Accident Risk Score for Unique Postcode
A predictive model was developed to estimate accident risk scores for unique postcodes using demographic and environmental data. The approach enables identification of high-risk areas, supporting targeted interventions and resource allocation.
Google Cloud Data Engineering Summit: Data Engineering Championship
A scalable data processing pipeline was built for the Data Engineering Championship, focusing on transforming airline and weather datasets into features for predictive modeling. The solution applied advanced ETL techniques, feature engineering, and temporal data alignment, achieving a top-10 leaderboard finish (7th out of 86) with a mean absolute error of 1.539.
Deloitte Presents Machine Learning Challenge: Predict Loan Defaulters
A deep neural network was developed to predict loan defaults using a dataset of over 67,000 records and 35 features. The model incorporated advanced feature engineering, normalization, and a multi-layer architecture with SELU activations.
Dare in Reality Hackathon 2021: Predict Lap Timings for Qualifying Session
An ensemble machine learning solution was developed to predict Formula E qualifying lap times using driver, track, and weather data. Combining Random Forest, Gradient Boosting, and a deep neural network, the approach achieved a top-15% leaderboard finish (46th out of 346) with an RMSLE of 0.479.
NHANES Part 1: Supplements
An exploratory analysis of the NHANES dataset was conducted to evaluate whether supplement use correlates with reduced healthcare visits. Results showed no significant decrease; in fact, individuals taking supplements reported slightly more healthcare visits annually, even after outlier trimming for robustness.
Mars Datascrape
A Python-based web scraper was developed to collect Mars-related news and data from multiple websites, storing the results in a MongoDB database. A Flask application serves this data dynamically, allowing users to fetch and display the latest updates with a single click.
National Crime Rates
This analysis explored correlations between violent crime rates and socioeconomic factors—median income, graduation rates, and poverty levels—across five major U.S. cities and national averages. While strong relationships were found for graduation and poverty rates, no consistent correlation emerged between these factors and violent crime, highlighting the complexity of urban crime dynamics.
Weather Analysis
A global weather analysis was performed by generating random geographic coordinates, mapping them to the nearest cities using CitiPy, and retrieving weather data via API. Findings showed a clear relationship between latitude and temperature, while humidity, wind speed, and cloudiness exhibited no significant latitudinal trends.
Pharmaceutical Analysis
An analysis of a 45-day drug trial in mice evaluated the effectiveness of four treatments on tumor growth and metastasis. Capomulin demonstrated the highest efficacy, significantly reducing tumor volume, slowing metastatic spread, and improving survival rates compared to other drugs and placebo.