Computational Text Analysis is the method of using computational tools to analyse and discover insights from large amounts of text. This research based course will introduce students to methods and tools used in computational text analysis, aka text as data.
The course will focus on a subset of computational methods that derive from statistical modeling and computational linguistics that are most commonly applied to analyze texts at scale.
Students will learn how to use quantitative methods to discover, measure, and infer concepts and phenomena from large amounts of text.
The course will involve hands-on analysis of real-world textual datasets from social media (Twitter and Reddit), newswire (Wall Street Journal or NYTimes), and other corpora. This class is ideal for students who are interested in learning how to aggregate large amount of text and apply statistical methods to discover, measure, and infer phenomena from text.
Some prior programming experience is expected, though all necessary skills, including an overview of Unix and Python, will be covered in the beginning of the course.
- Course number
- BC COMS 2710 - students from all majors are welcome!
- This course likely does not count for the CS Major
- Instructor
- Adam Poliak
- Teaching Assistants
- Course Staff
- Website
- https://coms2710.barnard.edu/
- Discussion Forum
- Slack Requires signing up via a barnard.edu or columbia.edu email
- Time and place
- Summer A May 3rd - June 18th
- Time and Day: MTWR 10:45am-12:20pm
- Zoom link: Check courseworks or Slack for Zoom link
- Location: Milbank 207
- Office Hours
- Times
- Prerequisites
- Prior programming experience. Some prior programming course that are suitable include BC1016, BC3050, W1004, E1006, W1002, STAT UN2102
- Modes of Thinking Requirement
- Thinking Quantitatively and Empirically
- Thinking Technologically and Digitally
- Textbook & Course Readings
- Each lecture has an accompanying reading that will be posted to the schedule. Some lectures will have accompanying optional reading related to the lecture’s topic.
- Many of the accompanying readings will be from the following freely available textbooks:
- Text Analysis in Python for Social Scientists. Digital copies are available to download with UNI login.
- Jurafsky and Martin, Speech and Language Processing (3rd ed. draft) (online copy)
- Computational Text Analysis Textbook
is an online textbook developed by
Melanie Walsh at Cornell for her Intro to Culturual Analytics course.
- Dive Into Data Science developed by Parker Addison and Justin Eldridge at UCSD for their Principles of Data Science course.
- Grading
- This is a project-based course.
The majority of your grade will be based on assignments.
- Final Project - 35%
- Homeworks - 30%
- Daily Tutorials - 20%
- Weekly Reading Reviews - 15%
- Participation - 5%
As social science and humanities research heavily incorporates textual data, numerous classes similar to this exist.
Here I will try to keep a running list of related courses.
- Classes that require prior statistical and programmings knowledge:
- Text as Data by Chris Bail (Sociology, Public Policy and Data Science) at Duke
- Text as Data by Justin Grimmer (Political Science) at Stanford
- Text as Data by Tamar Mitts (Political Science) at
Columbia SIPA
- Classes that do not require statistical and programming knowledge:
- Classes that focus on Social Media:
- Misc: