Spring 2025

LING 413 - Corpus Linguistics

Introduction to computational methods for analyzing large natural language corpora. Students will learn the computational skills necessary to build, validate, and analyze corpora with the goal of exploring linguistic phenomena and testing linguistic theories. Corpus linguistics as a field undertakes natural experiments to learn about language using the unelicited production of speakers. This course focuses specifically on computational corpus linguistics, which uses methods from natural language processing to expand the scale of corpus-based experiments.

Textbook:

Programming Language: Python

PHIL 460 - Philosophy of Statistics

Introduction to philosophical issues related to statistics. Topics include the interpretation of probability, the difference between description and inference, the notion of evidential support, the relationship between statistics and inductive logic, the use and abuse of mathematical models, the nature of randomness and chance, the role of values in statistical modeling and decision, and the ethical practice of statistics.

Textbook:

Fall 2024

STAT 430 - Practice of Applied Statistics

It can be difficult to recognize how to properly apply statistical methods to answer complex research questions found "in the wild". This process can be very different from answering the relatively well-defined and straightforward questions encountered when first learning statistical methods. This course teaches students how to formulate complex research questions into precise statistical ones, and how to choose, learn, and implement appropriate statistical procedures for answering those questions. Topics in this course include the core framework of statistics, surveys of statistical methods, guided data analysis, and effective written and oral communication. The idea that motivates this course is that the methods of statistics are constantly changing, and the ones that will likely dominate the landscape a decade from now likely haven't been invented yet. Therefore, this course tries to teach the principles of data analysis rather than the mathematics of specific methods, which can make it easier to adapt to a fast-moving field. This course can be thought of as a capstone to a typical undergraduate statistics curriculum, but may also be beneficial for masters students.

Textbook:

Programming Language: R

STAT 571 - Multivariate Analysis

Inference in multivariate statistical populations emphasizing the multivariate normal distribution; derivation of tests, estimates, and sampling distributions; and examples from the natural and social sciences.

Textbook: Multivariate Statistics Old School, 2015, John I. Marden

Programming Language: R

Spring 2024

STAT 431 - Applied Bayesian Analysis

Introduction to the concepts and methodology of Bayesian statistics, for students with fundamental knowledge of mathematical statistics. Topics include Bayes' rule, prior and posterior distributions, conjugacy, Bayesian point estimates and intervals, Bayesian hypothesis testing, noninformative priors, practical Markov chain Monte Carlo, hierarchical models and model graphs, and more advanced topics as time permits. Implementations in R and specialized simulation software.

Textbook: Bayesian Statistical Methods, Brian J. Reich and Sujit K. Ghosh / Bayesian Data Analysis, 3rd Edition, by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin.

Programming Language: R

STAT 510 - Mathematical Statistics

Provides a graduate level foundation in fundamental mathematical statistics topics including order statistics, exponential families, sufficiency, Rao-Blackwell theorem, Cramer-Rao lower bound, point estimation, hypothesis testing and interval estimation, likelihood and Bayesian methods, and large-sample asymptotics.

Textbook: Statistical Inference, 2nd Edition, by George Casella, Roger L. Berger / Mathematical Statistics Old School, by John I. Marden

STAT 530 - Bioinformatics

Introduction to statistical methods used in the analysis of genomic data. Methods are organized around data types commonly found in biological experiments, such as genotype data, gene expression levels, histone modifications, and microbiome data. Emphasis on statistical understanding. Practical implementation will be illustrated in R.

Relevant Webpages: (Spatial Transcriptomic) (Seurat)

Programming Language: R

Fall 2023

STAT 426 - Statistical Modeling II

This is a continuation in the study of advanced statistical modeling techniques with a focus on categorical data. The course explores logistic regression, generalized linear models, goodness-of-fit, link functions, count regression, log-linear models, probability models for contingency tables, and ordinal response models. Statistical computing is an integral part of the course.

Textbook: Categorical Data Analysis, 3rd Edition, by Alan Agresti

Programming Language: R

STAT 430 - Time Series ML

The course aims at helping students to be able to solve machine learning problems related to time series data. Students with some or no previous knowledge of time series analysis and/or machine learning will get to know main algorithms of learning methods related to time series analysis, and will be able to use R/python packages to design, test, and implement ML algorithms to time series data, mainly focused on (but not restricted to) the financial field.

Textbook: Time Series Analysis and Its Applications with R Examples, 4th Edition, by Robert H. Shumway / Machine learning for time-series with Python : forecast, predict, and detect anomalies with state-of-the-art machine learning methods, by Ben Auffarth

Programming Language: R, Python

STAT 542 - Statistical Learning

Modern techniques of predictive modeling, classification, and clustering are discussed. Examples of these are linear regression, nonparametric regression, kernel methods, regularization, cluster analysis, classification trees, neural networks, boosting, discrimination, support vector machines, and model selection. Applications are discussed as well as computation and theory.

Textbook: The Elements of Statistical Learning, by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie

Programming Language: R, Python