Fall 2022
STAT 430
Fundamental of Deep Learning (Python)
Lyrics Generator
(Report)(Presentation)
(Github)
-
• Built a neural network structure (LSTM) with
PyTorch to generate lyrics with a dataset containing 6
million songs from Kaggle
• Trained and tested the model with different
hyperparameters (Sequence length / Batch size) by
genres and discovered longer sequence length not
necessarily guarantee better model performances
• Generated sample lyrics and observed Rap to
be relatively short and negative whereas Country and
R&B to be long and romantic
STAT 432
Basics of Statistical Learning (R)
Beijing Housing Prices
(Report)
(Presentation)
-
• Conducted both supervised and unsupervised
analysis with a Beijing housing price dataset from
Kaggle to understand price trends in Beijing, China
• Applied diverse regression algorithms such as
LASSO and XGBoost to predict 'price' and figured out
Number of rooms / Age & Condition / Distance from an
epicenter to be the most significant predictors
• Clustered the data using K-Means, grouped the
observations as 'over'/'under' priced within their
clusters and discovered the trend of 'overpriced'
properties to be relatively old and close to the
epicenter
STAT 448
Advanced Data Analysis (SAS)
LING 402
Tools & Tech Speech & Language Process (Python, Linux
(Bash))