5 years business/data analyst using SQL, Tableau, and SAS
2 years data scientist using SQL, Tableau, Python, Git, and Jupyter Notebook
Love using data to answer business questions, deliver effective and impactful solutions. Highly analytical and detail-oriented. Skilled in data analysis and visualization, and proficient at machine learning algorithms such as supervised learning and unsupervised learning. Excellent written and oral communication skills built from years of presenting analysis results to senior management.
My LinkedIn
Forecast Monthly Vehicle Sales (Click me –>)
Toolkit: Time series forecasting, Python, pandas, numpy, matplotlib, seaborn, scipy, statsmodels, sklearn
Using time series forecasting, analyzed monthly vehicle sales in the United States between January 1976 and November 2019, found the overall trend over time and made forecasts for vehicle sales in December 2019 and May 2020.

Predict Sparkify User Churn (Click me –>)
Toolkit: Supervised learning (classification), Python, pyspark, pandas, numpy, matplotlib, seaborn, AWS EMR
Preventing churn is key to improving revenue for Sparkify, a fictitious subscription-based music streaming company. Using PySpark on AWS, I analyzed 12GB data from Sparkify and built a machine learning model to predict user churn with an F score of 0.56.

Recommendation Engine with IBM (Click me –>)
Toolkit: Machine learning (collaborative filtering), Python, pandas, numpy, matplotlib
Recommending articles that are most pertinent to specific users is beneficial to both service providers and users. Using three mainstream approaches - (1) Rank Based, (2) User-User Based Collaborative Filtering, and (3) Matrix Factorization, I developed an article recommendation engine for the IBM Watson Studio platform.

Classify Disaster Response (Click me –>)
Toolkit: Supervised learning (classification), Python, scikit-learn, pandas, numpy, plotly, NLP, SQL, SQLAlchemy, HTML, Flask
During disaster events, sending messages to appropriate disaster relief agencies on a timely manner is critical. Using NLP and machine learning pipeline, I built a model that classifies disaster messages into 36 categories with an F score of 0.65. A webapp was also developed.

Airbnb List Price Analysis (Click me –>)
Toolkit: Supervised learning (regression), Python, scikit-learn, pandas, numpy, matplotlib, seaborn
I analyzed Airbnb list prices in Boston and Seattle to answer three questions:
Q1 - Is there any trend in the Airbnb list price? How do prices compare between Boston and Seattle?
Q2 - Should you become a super host?
Q3 - What factors affect the Airbnb list price?

Identify Customer Segmentation (Click me –>)
Toolkit: Unsupervised learning (PCA, clustering), Python, scikit-learn, pandas, numpy, matplotlib, searborn
The goal of this project is to help a mail-order sales company in Germany to identify segments of the population that form its core customer base. I applied unsupervised learning techniques (PCA, clustering) on demographic and spending data of 0.89 million German households. Among them, two segments were identified and can be used for direct marketing campaigns that should bring the highest expected returns.

Flower Image Classification (Click me –>)
Toolkit: Deep learning, Python, PyTorch, numpy, matplotlib, seaborn, GPU
Using a pre-trained deep learning model densenet121, I built an image classification application and used the trained model to recognize 112 species of flowers with 89% accuracy.

Finding Donors for CharityML (Click me –>)
Toolkit: Supervised learning (classification), Python, scikit-learn, pandas, numpy
Used supervised learning algorithms, I built a machine learning model for a fictitious charity organization (CharityML) that can predict potential charity donors (individual makes more than $50,000 annually) with 86% accuracy and an F-score of 0.73.
