Skip to content

twarc

twarc is a command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively). It also has an ecosystem of plugins for doing things with the collected data.

See the twarc documentation for running commands: twarc2 and twarc1 for using the v1.1 API. If you aren't sure about which one to use you'll want to start with twarc2 since the v1.1 is scheduled to be retired.

Install

If you have python installed, you can install twarc using:

pip3 install twarc

Once installed, you should be able to use the twarc and twarc2 command line utilities, or use it as a Python library - check the examples here for that.

Other Tools

Twarc is purpose build for working with the twitter API for archiving and studying digital trace data. It is not built as a general purpose API library for Twitter. While the primary use is academic, it works just as well with "Standard" v2 API and "Premium" v1.1 APIs.

For a list of general purpose Twitter Libraries in different languages see the Twitter Documentation. For Python, TwitterAPI and tweepy are both up to date and maintained. They also support v2 APIs, and their data format with expansions may differ from twarc. There is also a reference implementation of the v2 Academic Access Search and v1.1 Premium Search from Twitter here. The v2 version of this script is compatible with twarc.

For R there is academictwitteR. Unlike twarc, it focuses solely on querying the Twitter Academic Research Product Track v2 API endpoint. Data gathered in twarc can be imported into R for analysis as a dataframe if you export the data into CSV using twarc-csv.

Getting Help

Check the tutorials to get started, or follow along with this recorded stream introducing twarc. If you run into trouble, feel free to make a post on the Twarc Repository or on the Twitter Developer Forums.