Text Analysis Course

This page hosts material for a mini-course in text analysis and machine learning, with a focus on legal and political applications.

Syllabus
Code Base (zip file, scripts numbered according to accompanying slides)
Course Assignment

Required Reading

Natural Language Processing in Python

Hands-on Machine Learning with Scikit-Learn and TensorFlow

Mostly Harmless Econometrics

Recommended Reading

A primer on neural network models for natural language processing

Speech and Language Processing, Third Edition

Python Installation

Python Installation Instructions

Python Configuration Instructions

Acknowledgements

Thanks to Chris Bail, Brandon Stewart, Piero Molino, and Michael McMahon for useful slide decks, on which some of these  lectures are based.

Part 0 — Introduction

Slides – 0.1 – Introduction

Part 1 — From Documents to Features

Slides – 1.1 – Introducing Corpora

Slides – 1.2 – Features

Part 2 — Describing the Feature Matrix

Slides – 2.1 – Topic Models

Slides – 2.2 – Embeddings Models

Slides – 2.3 – Similarity and Clustering

Part 3 — Supervised Learning with Text Data

Slides – 3.1 – Regression

Slides – 3.2 – Classification

Part 4 — Neural Nets and Deep Learning

(coming soon)

Slides – 4.1 – Deep Learning with Keras

Slides – 4.2 – Autoencoders

Slides – 4.3 – Convolutional Neural Nets

Slides – 4.4 – Recurrent Neural Nets

Part 5 — Research Design with Text Data

Slides – 5.1 – Research Design

Slides – 5.2 – Course Recap