Skip to main content

Computational Text Analysis is the method of using computational tools to analyse and discover insights from large amounts of text. This research based course will introduce students to methods and tools used in computational text analysis, aka text as data. The course will focus on a subset of computational methods that derive from statistical modeling and computational linguistics that are most commonly applied to analyze texts at scale.

Students will learn how to use quantitative methods to discover, measure, and infer concepts and phenomena from large amounts of text. The course will involve hands-on analysis of real-world textual datasets from social media (Twitter and Reddit), newswire (Wall Street Journal or NYTimes), and other corpora. This class is ideal for students who are interested in learning how to aggregate large amount of text and apply statistical methods to discover, measure, and infer phenomena from text. Some prior programming experience is expected, though all necessary skills, including an overview of Unix and Python, will be covered in the beginning of the course.

Course number
BC COMS 2710 - students from all majors are welcome!
This course likely does not count for the CS Major
Instructor
Adam Poliak
Teaching Assistants
Course Staff
Website
https://coms2710.barnard.edu/
Discussion Forum
Slack Requires signing up via a barnard.edu or columbia.edu email
Time and place
Summer A May 3rd - June 18th
Time and Day: MTWR 10:45am-12:20pm
Zoom link: Check courseworks or Slack for Zoom link
Location: Milbank 207
Office Hours
Times
Prerequisites
Prior programming experience. Some prior programming course that are suitable include BC1016, BC3050, W1004, E1006, W1002, STAT UN2102
Modes of Thinking Requirement
Thinking Quantitatively and Empirically
Thinking Technologically and Digitally
Textbook & Course Readings
Each lecture has an accompanying reading that will be posted to the schedule. Some lectures will have accompanying optional reading related to the lecture’s topic.
Many of the accompanying readings will be from the following freely available textbooks:
  1. Text Analysis in Python for Social Scientists. Digital copies are available to download with UNI login.
  2. Jurafsky and Martin, Speech and Language Processing (3rd ed. draft) (online copy)
  3. Computational Text Analysis Textbook is an online textbook developed by Melanie Walsh at Cornell for her Intro to Culturual Analytics course.
  4. Dive Into Data Science developed by Parker Addison and Justin Eldridge at UCSD for their Principles of Data Science course.
Grading
This is a project-based course. The majority of your grade will be based on assignments.
  • Final Project - 35%
  • Homeworks - 30%
  • Daily Tutorials - 20%
  • Weekly Reading Reviews - 15%
  • Participation - 5%

As social science and humanities research heavily incorporates textual data, numerous classes similar to this exist. Here I will try to keep a running list of related courses.