Text Analysis Course

This page hosts material for a mini-course in text analysis and machine learning, with a focus on legal and political applications.

Syllabus
Code Base (zip file, scripts numbered according to accompanying slides)
Course Assignment

Required Reading

Natural Language Processing in Python

Hands-on Machine Learning with Scikit-Learn and TensorFlow

Mostly Harmless Econometrics

Recommended Reading

Speech and Language Processing, Third Edition

A primer on neural network models for natural language processing

Python Installation

Windows Instructions

MacOS Instructions

Linux Instructions (Ubuntu VM Installation Tutorial)

conda config --add channels conda-forge
conda install nltk pandas sklearn gensim spacy tensorflow-gpu wordcloud seaborn unidecode 
python -m spacy download en

Acknowledgements

Thanks to Chris Bail, Brandon Stewart, Piero Molino, and Michael McMahon for useful slide decks, on which some of these  lectures are based.

Part 0 — Introduction

Slides – 0.1 – Introduction

Part 1 — From Documents to Features

Slides – 1.1 – Introducing Corpora

Slides – 1.2 – Features

Part 2 — Describing the Feature Matrix

Slides – 2.1 – Topic Models

Slides – 2.2 – Embeddings Models

Slides – 2.3 – Similarity and Clustering

Part 3 — Supervised Learning with Text Data

Slides – 3.1 – Regression

Slides – 3.2 – Classification

Part 4 — Neural Nets and Deep Learning

(coming soon)

Slides – 4.1 – Deep Learning with Keras

Slides – 4.2 – Autoencoders

Slides – 4.3 – Convolutional Neural Nets

Slides – 4.4 – Recurrent Neural Nets

Part 5 — Research Design with Text Data

Slides – 5.1 – Research Design

Slides – 5.2 – Course Recap