Spring 2023
CS 277 - Algorithms & Data Structures for Data Science
Introduction to elementary concepts in algorithms and classical data structures with a focus on their applications in Data Science. Topics include algorithm analysis (ex: Big-O notation), elementary data structures (ex: lists, stacks, queues, trees, and graphs), basics of discrete algorithm design principles (ex: greedy, divide and conquer, dynamic programming), and discussion of discrete and continuous optimization.
Textbook: Introduction to Algorithms, Third Edition, by Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein
Programming Language: Python
STAT 430 - Baseball Analytics
This is a reading, seminar, and project based course on the intersection of baseball, statistics, and data science. In this course you will learn how to conduct relevant data analyses with a focus on how to quantify and visualize aspects of baseball play associated with winning games. You will also learn about the statistical history of baseball with an emphasis on comparing players across eras. Founding principles, intensive data analysis, and advanced statistical methods will be discussed for both directions. The analyses that you conduct will also develop your coding ability and critical thinking skills as a statistician and data scientist. Furthermore, practical advantages, limitations, and comparisons of methods will be discussed. If you are interested in quantifying how good Mike Trout is or in ranking the careers of Barry Bonds, Willie Mays, and Babe Ruth, then this is the course for you.
Textbook: Analyzing Baseball Data with R, Second Edition, by Max Marchi and Jim Albert
Programming Language: R
STAT 440 - Statistical Data Management
The critical elements of data storage, data cleaning, and data extractions that ultimately lead to data analysis are presented. Includes basic theory and methods of databases, auditing and querying databases, as well as data management and data preparation using standard large-scale statistical software. Students will gain competency in the skills required in storing, cleaning, and managing data, all of which are required prior to data analysis.
Textbook: Data Wrangling with R, by BC Boehmke / R for Data Science, by Hadley Wickham and Garrett Grolemund
Programming Language: R
Fall 2022
STAT 430 - Fundamentals of Deep Learning
Deep Learning methods are rapidly becoming ingrained within everyday life. These methods strive to reveal patterns within the data. This course provides a foundation for developing and applying deep learning models through a study of its theory and application using a leading modeling framework. This course will primarily use Python. Students should understand key programming tenets like: loops, if-else statements, functions, and so on. This course will utilize a blended learning environment that necessitates a more hands-on lab aspect with short video segments introducing the idea.
Textbook: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville / Deep Learning with PyTorch, by Eli Stevens, Luca Antiga, Thomas Viehmann
Programming Language: Python
STAT 432 - Basics of Statistical Learning
Topics in supervised and unsupervised learning are covered, including logistic regression, support vector machines, classification trees and nonparametric regression. Model building and feature selection are discussed for these techniques, with a focus on regularization methods, such as lasso and ridge regression, as well as methods for model selection and assessment using cross validation. Cluster analysis and principal components analysis are introduced as examples of unsupervised learning.
Textbook: An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani / Statistical Learning and Machine Learning with R, by Ruoqing Zhu
Programming Language: R
STAT 448 - Advanced Data Analysis
Several of the most widely used techniques of data analysis are discussed with an emphasis on statistical computing. Topics include linear regression, analysis of variance, generalized linear models, and analysis of categorical data. In addition, an introduction to data mining is provided considering classification, model building, decision trees, and cluster analysis.
Textbook: A Handbook of Statistical Analyses using SAS, Third Edition, by Der and Everitt
Programming Language: SAS
LING 402 - Tools & Tech Speech & Language Process
Introduction to aspects of the tools and methods of studies in speech and natural language processing (NLP), with a focus on programming for NLP and speech applications, statistical methods for data analysis, and tools for displaying and manipulating speech data.
Textbook: Natural Language Processing with Python, by Bird, S., Klein, E. and Loper, E. / The Linux Command Line: A Complete Introduction, by William, E. and Shotts, Jr. / How to Think Like a Computer Scientist, by Elkner et al
Programming Language: Linux (Bash), Python
Spring 2022
CS 307 - Modeling & Learning in Data Science
Introduction to the use of classical approaches in data modeling and machine learning in the context of solving data-centric problems. A broad coverage of fundamental models is presented, including linear models, unsupervised learning, supervised learning, and deep learning. A significant emphasis is placed on the application of the models in Python and the interpretability of the results.
Textbook: Probability and Statistics for Computer Science & Applied Machine Learning, by David Forsyth
Programming Language: Python
STAT 385 - Statistical Programming Methods
Statisticians must be savvy in programming methods useful to the wide variety of analysis that they will be expected to perform. This course provides the foundation for writing and packaging statistical algorithms through the creation of functions and object oriented programming. Fundamental programming techniques and considerations will be emphasized. Students will also create dynamic reports that encapsulate their implemented algorithms. Students must have access to a computer on which they can install software.
Textbook: Hands-On Programming with R, by Garrett Grolemund / R for Data Science, by Hadley Wickham & Garrett Grolemund / Mastering Shiny, by Hadley Wickham
Programming Language: R
STAT 443 - Professional Statistics
This project-based course emphasizes written, visual, and oral communication of statistical results and conclusions. An introduction to statistical consulting is also provided. Additional topics include introductions to statistical methodologies in industry and aspects of careers in statistics.
Textbook: A Career in Statistics: Beyond the Numbers, by Gerald J. Hahn & Necip Doganaksoy / Statistical Consulting: A Guide to Effective Communication, by Janice Derr
Programming Language: R
Fall 2021
STAT 207 - Data Science Exploration
Explores the data science pipeline from hypothesis formulation, to data collection and management, to analysis and reporting. Topics include data collection, preprocessing and checking for missing data, data summary and visualization, random sampling and probability models, estimating parameters, uncertainty quantification, hypothesis testing, multiple linear and logistic regression modeling, classification, and machine learning approaches for high dimensional data analysis. Students will learn how to implement the methods using Python programming and Git version control.
Textbook: Python Data Science Handbook, by Jake VanderPlas
Programming Language: Python
STAT 425 - Statistical Modeling I
This is the foundation for advanced statistical modeling with a focus on multiple strategies for analyzing data. The course explores linear regression, least squares estimates, F-tests, analysis of residuals, regression diagnostics, transformations, model building, generalized and weighted least squares, PCA, A/B testing, randomization tests, ANOVA, random effects, mixed effects, and longitudinal data. Statistical computing is an integral part of the course.
Textbook: Linear Models with R & Extending the Linear Model with R, Second Edition, by Julian Faraway
Programming Language: R
STAT 430 - Unsupervised Learning
Unsupervised learning is a type of machine learning that deals with finding patterns in data without the use of labeled examples. Two major unsupervised learning techniques, clustering and dimensionality reduction, will be covered with a focus on methods, evaluation metrics, and interpretation of results. The methodologies enable discovery of and inference about hidden insights contained in high-dimensional unlabeled data. Applications on real and artificial datasets are emphasized using programming languages such as Python.
Textbook: Introduction to Data Mining, Second Edition, by P. Tan, M. Steinbach, A. Karpatne, and V. Kumar (2018).
Programming Language: Python
Spring 2021
STAT 212 - Biostatistics
Application of statistical reasoning and statistical methodology to biology. Topics include descriptive statistics, graphical methods, experimental design, probability, statistical inference and regression. In addition, techniques of statistical computing are covered.
Textbook: Course Notes for STAT212: Biostatistics, by Kelly Findley
Programming Language: R
STAT 410 - Statistics & Probability II
Continuation of STAT 400. Includes moment-generating functions, transformations of random variables, normal sampling theory, sufficiency, best estimators, maximum likelihood estimators, confidence intervals, most powerful tests, unbiased tests, and chi-square tests.
Textbook: Introduction to Mathematical Statistics, Seventh Edition, by Robert V. Hogg, Joseph W. McKean, Allen T. Craig.
STAT 420 - Methods of Applied Statistics
Systematic, calculus-based coverage of the more widely used methods of applied statistics, including simple and multiple regression, correlation, analysis of variance and covariance, multiple comparisons, goodness of fit tests, contingency tables, nonparametric procedures, and power of tests; emphasizes when and why various tests are appropriate and how they are used.
Textbook: Applied Statistics with R, by David Dalpiaz.
Programming Language: R
Fall 2020
MATH 415 - Applied Linear Algebra
Introductory course emphasizing techniques of linear algebra with applications to engineering; topics include matrix operations, determinants, linear equations, vector spaces, linear transformations, eigenvalues, and eigenvectors, inner products and norms, orthogonality, equilibrium, and linear dynamical systems.
Textbook: Linear Algebra and its Applications, Fourth Edition, by Gilbert Strang.
STAT 107 - Data Science Discovery
Data Science Discovery is the intersection of statistics, computation, and real-world relevance. As a project-driven course, students perform hands-on-analysis of real-world datasets to analyze and discover the impact of the data. Throughout each experience, students reflect on the social issues surrounding data analysis such as privacy and design.
Programming Language: Python
STAT 400 - Statistics & Probability I
Introduction to mathematical statistics that develops probability as needed; includes the calculus of probability, random variables, expectation, distribution functions, central limit theorem, point estimation, confidence intervals, and hypothesis testing. Offers a basic one-term introduction to statistics and also prepares students for STAT 410.
Textbook: Probability and Statistical Inference, Ninth Edition, by Robert V. Hogg, Elliot A. Tanis, Dale L. Zimmerman.