Skip to content

twarc2

twarc2 is a command line tool and Python library for archiving Twitter JSON data. Each tweet is represented as a JSON object that was returned from the Twitter API. Since Twitter's introduction of their v2 API the JSON representation of a tweet is conditional on the types of fields and expansions that are requested. twarc2 does the work of requesting the highest fidelity representation of a tweet by requesting all the available data for tweets.

Tweets are streamed or stored as line-oriented JSON. twarc2 will handle Twitter API's rate limits for you. In addition to letting you collect tweets twarc can also help you collect users and hydrate tweet ids. It also has a collection of plugins you can use to do things with the collected JSON data (such as converting it to CSV).

twarc2 was developed as part of the Documenting the Now project which was funded by the Mellon Foundation.

Install

Before using twarc you will need to create an application and attach it to an project on your Twitter Developer Portal. A "Project" is like a container for an "Application" with a specific purpose.

If you have Academic Access you should see an "Academic Research" Project, if not, you should see only "Standard" Project. Academic Access is a separate endpoint, see here for notes on this.

Once you've created your application, note down the Bearer token, and or the consumer key, consumer secret, which may also be called API Key and API Secret and then optionally click to generate an access token and access token secret. With these four variables in hand you are ready to start using twarc.

  1. install Python 3
  2. pip install twarc:
pip install --upgrade twarc

Homebrew (macOS only)

For macOS users, you can also install twarc via Homebrew:

brew install twarc

Windows

If you installed with pip and see a "failed to create process" when running twarc try reinstalling like this:

python -m pip install --upgrade --force-reinstall twarc

Quickstart:

First you're going to need to tell twarc about your application API keys and grant access to one or more Twitter accounts:

twarc2 configure

Then try out a search:

twarc2 search "blacklivesmatter" results.jsonl

Or maybe you'd like to collect tweets as they happen?

twarc2 filter "blacklivesmatter" results.jsonl

See below for the details about these commands and more.

Configure

Once you've got your Twitter developer access set up you can tell twarc what they are with the configure command.

twarc2 configure

This will store your credentials in your home directory so you don't have to keep entering them in. You can most of twarc's functionality by simply configuring the bearer token, but if you want it to be complete you can enter in the API key and API secret.

You can also the keys in the system environment (CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET) or using command line options (--consumer-key, --consumer-secret, --access-token, --access-token-secret).

This uses Twitter's tweets/search/recent and tweets/search/all endpoints to download pre-existing tweets matching a given query. This command will search for any tweets mentioning blacklivesmatter from the 7 days.

twarc2 search "blacklivesmatter" results.jsonl

If you have access to the Academic Research Product Track you can search the full archive of tweets by using the --archive option.

twarc2 search --archive "blacklivesmatter" results.jsonl

The queries can be a lot more expressive than matching a single term. For example this query will search for tweets containing either blacklivesmatter or blm that were sent to the user \@deray.

twarc2 search "(blacklivesmatter OR blm) to:deray" results.jsonl

The best way to get familiar with Twitter's search syntax is to consult Twitter's Building queries for Search Tweets documentation.

You also should definitely check out Igor Brigadir's excellent reference guide to the Twitter Search syntax: Advanced Search on Twitter. There are lots of hidden gems in there that the advanced search form doesn't make readily apparent.

Limit

Because there is a 500,000 tweet limit (5, or sometimes 10 million for Academic Research Track) you may want to limit the number of tweets you retrieve by using --limit:

twarc2 search --limit 5000 "blacklivesmatter" results.jsonl

Time

You can also limit to a particular time range using --start-time and/or --end-time, which can be especially useful in conjunction with --archive when you are searching for historical tweets.

twarc2 search --start-time 2014-07-17 --end-time 2014-07-24 '"eric garner"' tweets.jsonl

If you leave off --start-time or --end-time it will be open on that side. So for example to get all "eric garner" tweets before 2014-07-24 you would just leave off the --start-time:

twarc2 search --end-time 2014-07-24 '"eric garner"' tweets.jsonl

Searches

Searches works like the search command, but instead of taking a single query, it reads from a file containing many queries. You can use the same limit and time options just like a single search command, but it will be applied to every query.

The input file for this command needs to be a plain text file, with one line for each query you want to run, for example you might have a file called animals.txt with the following lines:

cat
dog
mouse OR mice

Note that each line will be passed through directly to the Twitter API - if you have quoted strings, they will be treated as a phrase search by the Twitter API, which might not be what you intended.

If you run the following searches command, animals.json will contain at least 100 tweets for each query in the input file:

twarc2 searches --limit 100 animals.txt animals.json

You can use the --archive and --start-time flags just like a regular search command too, in this case to search the full archive of all tweets for the first day of 2020:

twarc2 searches --archive --start-time 2020-01-01 --end-time 2020-01-02 animals.txt animals.json

You can also use the --counts-only flag to check volumes first. This produces a csv file in the same format as the counts command with the --csv flag, with the addition of a column containing the query for that row.

twarc2 searches --counts-only animals.txt animals_counts.csv

One more thing - if you have a lot searches you want to run, you might want to consider using the --combine-queries flag. This combines consecutive queries into the file into a single longer query, meaning you issue fewer API calls and potentially collect fewer duplicate tweets that match more than one query. Using this on the animals.txt file as input will combine the three queries into the single longer query (cat) OR (dog) OR (mouse OR mice), and only issue one logical query.

twarc2 searches --combine-queries animals.txt animals_combined.json

Stream

The stream command will use Twitter's API tweets/search/stream endpoint to collect tweets as they happen. In order to use it you first need to create one or more [rules]. For example:

twarc2 stream-rules add blacklivesmatter

You can list your active stream rules:

twarc2 stream-rules list

And you can collect the data from the stream, which will bring down any tweets that match your rules:

twarc2 stream stream.jsonl

When you want to stop you use ctrl-c. This only stops the stream but doesn't delete your stream rule. To remove a rule you can:

twarc2 stream-rules delete blacklivesmatter

Sample

Use the sample command to listen to Twitter's tweets/sample/stream API for a "random" sample of recent public statuses. The sampling is based on the millisecond part of the tweet timestamp.

twarc2 sample sample.jsonl

Users

If you have a file of user ids you can fetch the user metadata for them with the users command:

twarc users users.txt users.jsonl

If the file contains usernames instead of user ids you can use the --usernames option:

twarc2 users --usernames users.txt users.jsonl

Followers

You can fetch the followers of an account using the followers command:

twarc2 followers deray users.jsonl

Following

To get the users that a user is following you can use following:

twarc2 following deray users.jsonl

The result will include exactly one user id per line. The response order is reverse chronological, or most recent followers first.

Timeline

The timeline command will use Twitter's user timeline API to collect the most recent tweets posted by the user indicated by screen_name.

twarc2 timeline deray tweets.jsonl

Conversation

You can retrieve a conversation thread using the tweet ID at the head of the conversation:

twarc2 conversation 266031293945503744 > conversation.jsonl

Dehydrate

The dehydrate command generates an id list from a file of tweets:

twarc2 dehydrate tweets.jsonl tweet-ids.txt

Hydrate

twarc's hydrate command will read a file of tweet identifiers and write out the tweet JSON for them using Twitter's tweets API endpoint:

twarc2 hydrate ids.txt tweets.jsonl

The input file, ids.txt is expected to be a file that contains a tweet identifier on each line, without quotes or a header:

919505987303886849
919505982882844672
919505982602039297

Twitter API's Terms of Service discourage people from making large amounts of raw Twitter data available on the Web. The data can be used for research and archived for local use, but not shared with the world. Twitter does allow files of tweet identifiers to be shared, which can be useful when you would like to make a dataset of tweets available. You can then use Twitter's API to hydrate the data, or to retrieve the full JSON for each identifier. This is particularly important for verification of social media research.

Places

The search and stream APIs allow you to search by places. But in order to use them you need to know the identifier for a specific place. twarc's places command will let you search by the place name, geo coordinates, or ip address. For example:

twarc2 places Ferguson

Which will output something like:

$ twarc2 places Ferguson                 
Ferguson, MO, United States [id=0a62ce0f6aa37536]
Ruisseau-Ferguson, Qu├ębec, Canada [id=25283a1f59449e8f]
Ferguson, Victoria, Australia [id=2538e66b7e5c082c]
Ferguson Road Initiative, Dallas, United States [id=368aad647311292a]
Ferguson, Western Australia, Australia [id=45f20c78d803ad84]
Ferguson, PA, United States [id=00c92e14361c9674]
Ferguson, KY, United States [id=0190ea5612aaae32]

You can then use one of the ids in a search:

twarc2 search "place:0a62ce0f6aa37536" tweets.jsonl

You can also search by geo-coordinates (lat,lon) and IP address. If you would prefer to see the full JSON response with the bounding boxes use the --json option.

Command Line Usage

Below is what you see when you run twarc2 --help.

twarc2

Collect data from the Twitter V2 API.

Usage:

twarc2 [OPTIONS] COMMAND [ARGS]...

Options:

  --consumer-key TEXT         Twitter app consumer key (aka "App Key")
  --consumer-secret TEXT      Twitter app consumer secret (aka "App Secret")
  --access-token TEXT         Twitter app access token for user
                              authentication.
  --access-token-secret TEXT  Twitter app access token secret for user
                              authentication.
  --bearer-token TEXT         Twitter app access bearer token.
  --app-auth / --user-auth    Use application authentication or user
                              authentication. Some rate limits are higher with
                              user authentication, but not all endpoints are
                              supported.  [default: app-auth]
  -l, --log TEXT
  --verbose
  --metadata / --no-metadata  Include/don't include metadata about when and
                              how data was collected.  [default: metadata]
  --config FILE               Read configuration from FILE.
  --help                      Show this message and exit.

compliance-job

Create, retrieve and list batch compliance jobs for Tweets and Users.

Usage:

twarc2 compliance-job [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

create

Create a new compliance job and upload tweet IDs.

Usage:

twarc2 compliance-job create [OPTIONS] {tweets|users} INFILE [OUTFILE]

Options:

  --job-name TEXT     A name or tag to help identify the job.
  --wait / --no-wait  Wait for the job to finish and download the results.
                      Wait by default.
  --hide-progress     Hide the Progress bar. Default: show progress, unless
                      using pipes.
  --help              Show this message and exit.

download

Download the compliance job with the specified ID.

Usage:

twarc2 compliance-job download [OPTIONS] JOB [OUTFILE]

Options:

  --wait / --no-wait  Wait for the job to finish and download the results.
                      Wait by default.
  --hide-progress     Hide the Progress bar. Default: show progress, unless
                      using pipes.
  --help              Show this message and exit.

get

Returns status and download information about the job ID.

Usage:

twarc2 compliance-job get [OPTIONS] JOB

Options:

  --json-output  Return the raw json content from the API.
  --verbose      Show all URLs and metadata.
  --help         Show this message and exit.

list

Returns a list of compliance jobs by job type and status.

Usage:

twarc2 compliance-job list [OPTIONS] [[tweets|users]]

Options:

  --status [created|in_progress|complete|failed]
                                  Filter by job status. Only one of 'created',
                                  'in_progress', 'complete', 'failed' can be
                                  specified. If not set, returns all.
  --json-output                   Return the raw json content from the API.
  --verbose                       Show all URLs and metadata.
  --help                          Show this message and exit.

configure

Set up your Twitter app keys.

Usage:

twarc2 configure [OPTIONS]

Options:

  --help  Show this message and exit.

conversation

Retrieve a conversation thread using the tweet id.

Usage:

twarc2 conversation [OPTIONS] TWEET_ID [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --archive                       Use the full archive (requires Academic
                                  Research track)
  --max-results INTEGER           Maximum number of tweets per API response
  --limit INTEGER                 Maximum number of tweets to save
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

conversations

Fetch the full conversation threads that the input tweets are a part of. Alternatively the input can be a line oriented file of conversation ids.

Usage:

twarc2 conversations [OPTIONS] [INFILE] [OUTFILE]

Options:

  --conversation-limit INTEGER    Maximum number of tweets to return per-
                                  conversation
  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --archive                       Use the full archive (requires Academic
                                  Research track)
  --max-results INTEGER           Maximum number of tweets per API response
  --limit INTEGER                 Maximum number of tweets to save
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

counts

Return counts of tweets matching a query.

Usage:

twarc2 counts [OPTIONS] QUERY [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --archive                       Count using the full archive (requires
                                  Academic Research track)
  --granularity [day|hour|minute]
                                  Aggregation level for counts. Can be one of:
                                  day, hour, minute. Default is hour.
  --limit INTEGER                 Maximum number of days of results to save
                                  (minimum is 30 days)
  --text                          Output the counts as human readable text
  --csv                           Output counts as CSV
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

dehydrate

Extract tweet or user IDs from a dataset.

Usage:

twarc2 dehydrate [OPTIONS] [INFILE] [OUTFILE]

Options:

  --id-type [tweets|users]  IDs to extract - either 'tweets' or 'users'.
  --hide-progress           Hide the Progress bar. Default: show progress,
                            unless using pipes.
  --help                    Show this message and exit.

flatten

"Flatten" tweets, or move expansions inline with tweet objects and ensure that each line of output is a single tweet.

Usage:

twarc2 flatten [OPTIONS] [INFILE] [OUTFILE]

Options:

  --hide-progress  Hide the Progress bar. Default: show progress, unless using
                   pipes.
  --help           Show this message and exit.

followers

Get the followers for a given user.

Usage:

twarc2 followers [OPTIONS] USER [OUTFILE]

Options:

  --limit INTEGER        Maximum number of followers to save. Increments of
                         1000 or --max-results if set.
  --max-results INTEGER  Maximum number of users per page. Default is 1000.
  --hide-progress        Hide the Progress bar. Default: show progress, unless
                         using pipes.
  --help                 Show this message and exit.

following

Get the users that a given user is following.

Usage:

twarc2 following [OPTIONS] USER [OUTFILE]

Options:

  --limit INTEGER        Maximum number of friends to save. Increments of 1000
                         or --max-results if set.
  --max-results INTEGER  Maximum number of users per page. Default is 1000.
  --hide-progress        Hide the Progress bar. Default: show progress, unless
                         using pipes.
  --help                 Show this message and exit.

hydrate

Hydrate tweet ids.

Usage:

twarc2 hydrate [OPTIONS] [INFILE] [OUTFILE]

Options:

  --no-context-annotations  By default twarc gets all available data. This
                            leaves out context annotations (Twitter API limits
                            --max-results to 100 if these are requested).
                            Setting this makes --max-results 500 the default.
                            NOTE: This argument is mutually exclusive with
                            arguments: [--expansions, --tweet-fields, --media-
                            fields, --place-fields, --minimal-fields,
                            --counts-only, --user-fields, --poll-fields].
  --minimal-fields          By default twarc gets all available data. This
                            option requests the minimal retrievable amount of
                            data - only IDs and object references are
                            retrieved. Setting this makes --max-results 500
                            the default. NOTE: This argument is mutually
                            exclusive with  arguments: [--no-context-
                            annotations, --expansions, --tweet-fields,
                            --media-fields, --place-fields, --counts-only,
                            --user-fields, --poll-fields].
  --expansions TEXT         Comma separated list of expansions to retrieve.
                            Default is all available.
  --tweet-fields TEXT       Comma separated list of tweet fields to retrieve.
                            Default is all available.
  --user-fields TEXT        Comma separated list of user fields to retrieve.
                            Default is all available.
  --media-fields TEXT       Comma separated list of media fields to retrieve.
                            Default is all available.
  --place-fields TEXT       Comma separated list of place fields to retrieve.
                            Default is all available.
  --poll-fields TEXT        Comma separated list of poll fields to retrieve.
                            Default is all available.
  --hide-progress           Hide the Progress bar. Default: show progress,
                            unless using pipes.
  --help                    Show this message and exit.

mentions

Retrieve max of 800 of the most recent tweets mentioning the given user.

Usage:

twarc2 mentions [OPTIONS] USER_ID [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

places

Search for places by place name, geo coordinates or ip address.

Usage:

twarc2 places [OPTIONS] VALUE [OUTFILE]

Options:

  --type [name|geo|ip]            How to search for places (defaults to name)
  --granularity [neighborhood|city|admin|country]
                                  What type of places to search for (defaults
                                  to neighborhood)
  --max-results INTEGER           Maximum results to return
  --json                          Output raw JSON response
  --help                          Show this message and exit.

sample

Fetch tweets from the sample stream.

Usage:

twarc2 sample [OPTIONS] [OUTFILE]

Options:

  --no-context-annotations  By default twarc gets all available data. This
                            leaves out context annotations (Twitter API limits
                            --max-results to 100 if these are requested).
                            Setting this makes --max-results 500 the default.
                            NOTE: This argument is mutually exclusive with
                            arguments: [--expansions, --tweet-fields, --media-
                            fields, --place-fields, --minimal-fields,
                            --counts-only, --user-fields, --poll-fields].
  --minimal-fields          By default twarc gets all available data. This
                            option requests the minimal retrievable amount of
                            data - only IDs and object references are
                            retrieved. Setting this makes --max-results 500
                            the default. NOTE: This argument is mutually
                            exclusive with  arguments: [--no-context-
                            annotations, --expansions, --tweet-fields,
                            --media-fields, --place-fields, --counts-only,
                            --user-fields, --poll-fields].
  --expansions TEXT         Comma separated list of expansions to retrieve.
                            Default is all available.
  --tweet-fields TEXT       Comma separated list of tweet fields to retrieve.
                            Default is all available.
  --user-fields TEXT        Comma separated list of user fields to retrieve.
                            Default is all available.
  --media-fields TEXT       Comma separated list of media fields to retrieve.
                            Default is all available.
  --place-fields TEXT       Comma separated list of place fields to retrieve.
                            Default is all available.
  --poll-fields TEXT        Comma separated list of poll fields to retrieve.
                            Default is all available.
  --limit INTEGER           Maximum number of tweets to save
  --help                    Show this message and exit.

search

Search for tweets. For help on how to write a query see https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

Usage:

twarc2 search [OPTIONS] QUERY [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --archive                       Use the full archive (requires Academic
                                  Research track)
  --max-results INTEGER           Maximum number of tweets per API response
  --limit INTEGER                 Maximum number of tweets to save
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

searches

Execute each search in the input file, one at a time.

The infile must be a file containing one query per line. Each line will be passed through directly to the Twitter API - unlike the timelines command quotes will not be removed.

Input queries will be deduplicated - if the same literal query is present in the file, it will still only be run once.

It is recommended that this command first be run with --counts-only, to check that each of the queries is retrieving the volume of tweets expected, and to avoid consuming quota unnecessarily.

Usage:

twarc2 searches [OPTIONS] [INFILE] [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --archive                       Use the full archive (requires Academic
                                  Research track)
  --max-results INTEGER           Maximum number of tweets per API response
  --limit INTEGER                 Maximum number of tweets to save
  --counts-only                   Only retrieve counts of tweets matching the
                                  search, not the tweets themselves. outfile
                                  will be a CSV containing the counts for all
                                  of the queries in the input file.
  --combine-queries               Merge consecutive queries into a single OR
                                  query. For example, if the three rows in
                                  your file are: banana, apple, pear then a
                                  single query ((banana) OR (apple) OR (pear))
                                  will be issued.
  --granularity [day|hour|minute]
                                  Aggregation level for counts (only used when
                                  --count-only is used). Can be one of: day,
                                  hour, minute. Default is day.
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

stream

Fetch tweets from the live stream.

Usage:

twarc2 stream [OPTIONS] [OUTFILE]

Options:

  --limit INTEGER           Maximum number of tweets to return
  --no-context-annotations  By default twarc gets all available data. This
                            leaves out context annotations (Twitter API limits
                            --max-results to 100 if these are requested).
                            Setting this makes --max-results 500 the default.
                            NOTE: This argument is mutually exclusive with
                            arguments: [--expansions, --tweet-fields, --media-
                            fields, --place-fields, --minimal-fields,
                            --counts-only, --user-fields, --poll-fields].
  --minimal-fields          By default twarc gets all available data. This
                            option requests the minimal retrievable amount of
                            data - only IDs and object references are
                            retrieved. Setting this makes --max-results 500
                            the default. NOTE: This argument is mutually
                            exclusive with  arguments: [--no-context-
                            annotations, --expansions, --tweet-fields,
                            --media-fields, --place-fields, --counts-only,
                            --user-fields, --poll-fields].
  --expansions TEXT         Comma separated list of expansions to retrieve.
                            Default is all available.
  --tweet-fields TEXT       Comma separated list of tweet fields to retrieve.
                            Default is all available.
  --user-fields TEXT        Comma separated list of user fields to retrieve.
                            Default is all available.
  --media-fields TEXT       Comma separated list of media fields to retrieve.
                            Default is all available.
  --place-fields TEXT       Comma separated list of place fields to retrieve.
                            Default is all available.
  --poll-fields TEXT        Comma separated list of poll fields to retrieve.
                            Default is all available.
  --help                    Show this message and exit.

stream-rules

List, add and delete rules for your stream.

Usage:

twarc2 stream-rules [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

add

Create a new stream rule to match a value. Rules can be grouped with optional tags.

Usage:

twarc2 stream-rules add [OPTIONS] VALUE

Options:

  --tag TEXT  a tag to help identify the rule
  --help      Show this message and exit.

delete

Delete the stream rule that matches a given value.

Usage:

twarc2 stream-rules delete [OPTIONS] VALUE

Options:

  --help  Show this message and exit.

delete-all

Delete all stream rules!

Usage:

twarc2 stream-rules delete-all [OPTIONS]

Options:

  --help  Show this message and exit.

list

List all the active stream rules.

Usage:

twarc2 stream-rules list [OPTIONS]

Options:

  --display-ids  display the rule ids
  --help         Show this message and exit.

timeline

Retrieve recent tweets for the given user.

Usage:

twarc2 timeline [OPTIONS] USER_ID [OUTFILE]

Options:

  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --use-search                    Use the search/all API endpoint which is not
                                  limited to the last 3200 tweets, but
                                  requires Academic Product Track access.
  --exclude-retweets              Exclude retweets from timeline
  --exclude-replies               Exclude replies from timeline
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --limit INTEGER                 Maximum number of tweets to return
  --help                          Show this message and exit.

timelines

Fetch the timelines of every user in an input source of tweets. If the input is a line oriented text file of user ids or usernames that will be used instead.

The infile can be:

- A file containing one user id per line (either quoted or unquoted)
- A JSONL file containing tweets collected in the Twitter API V2 format

Usage:

twarc2 timelines [OPTIONS] [INFILE] [OUTFILE]

Options:

  --limit INTEGER                 Maximum number of tweets to return
  --timeline-limit INTEGER        Maximum number of tweets to return per-
                                  timeline
  --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets created after UTC time (ISO
                                  8601/RFC 3339),  e.g.  --start-time
                                  "2021-01-01T12:31:04"
  --end-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S]
                                  Match tweets sent before UTC time (ISO
                                  8601/RFC 3339),  e.g.  --end-time
                                  "2021-01-01T12:31:04"
  --since-id INTEGER              Match tweets sent after tweet id
  --until-id INTEGER              Match tweets sent prior to tweet id
  --use-search                    Use the search/all API endpoint which is not
                                  limited to the last 3200 tweets, but
                                  requires Academic Product Track access.
  --exclude-retweets              Exclude retweets from timeline
  --exclude-replies               Exclude replies from timeline
  --no-context-annotations        By default twarc gets all available data.
                                  This leaves out context annotations (Twitter
                                  API limits --max-results to 100 if these are
                                  requested). Setting this makes --max-results
                                  500 the default. NOTE: This argument is
                                  mutually exclusive with  arguments:
                                  [--expansions, --tweet-fields, --media-
                                  fields, --place-fields, --minimal-fields,
                                  --counts-only, --user-fields, --poll-
                                  fields].
  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--no-context-annotations,
                                  --expansions, --tweet-fields, --media-
                                  fields, --place-fields, --counts-only,
                                  --user-fields, --poll-fields].
  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.
  --hide-progress                 Hide the Progress bar. Default: show
                                  progress, unless using pipes.
  --help                          Show this message and exit.

tweet

Look up a tweet using its tweet id or URL.

Usage:

twarc2 tweet [OPTIONS] TWEET_ID [OUTFILE]

Options:

  --no-context-annotations  By default twarc gets all available data. This
                            leaves out context annotations (Twitter API limits
                            --max-results to 100 if these are requested).
                            Setting this makes --max-results 500 the default.
                            NOTE: This argument is mutually exclusive with
                            arguments: [--expansions, --tweet-fields, --media-
                            fields, --place-fields, --minimal-fields,
                            --counts-only, --user-fields, --poll-fields].
  --minimal-fields          By default twarc gets all available data. This
                            option requests the minimal retrievable amount of
                            data - only IDs and object references are
                            retrieved. Setting this makes --max-results 500
                            the default. NOTE: This argument is mutually
                            exclusive with  arguments: [--no-context-
                            annotations, --expansions, --tweet-fields,
                            --media-fields, --place-fields, --counts-only,
                            --user-fields, --poll-fields].
  --expansions TEXT         Comma separated list of expansions to retrieve.
                            Default is all available.
  --tweet-fields TEXT       Comma separated list of tweet fields to retrieve.
                            Default is all available.
  --user-fields TEXT        Comma separated list of user fields to retrieve.
                            Default is all available.
  --media-fields TEXT       Comma separated list of media fields to retrieve.
                            Default is all available.
  --place-fields TEXT       Comma separated list of place fields to retrieve.
                            Default is all available.
  --poll-fields TEXT        Comma separated list of poll fields to retrieve.
                            Default is all available.
  --pretty                  Pretty print the JSON
  --help                    Show this message and exit.

users

Get data for user ids or usernames.

Usage:

twarc2 users [OPTIONS] [INFILE] [OUTFILE]

Options:

  --no-context-annotations  By default twarc gets all available data. This
                            leaves out context annotations (Twitter API limits
                            --max-results to 100 if these are requested).
                            Setting this makes --max-results 500 the default.
                            NOTE: This argument is mutually exclusive with
                            arguments: [--expansions, --tweet-fields, --media-
                            fields, --place-fields, --minimal-fields,
                            --counts-only, --user-fields, --poll-fields].
  --minimal-fields          By default twarc gets all available data. This
                            option requests the minimal retrievable amount of
                            data - only IDs and object references are
                            retrieved. Setting this makes --max-results 500
                            the default. NOTE: This argument is mutually
                            exclusive with  arguments: [--no-context-
                            annotations, --expansions, --tweet-fields,
                            --media-fields, --place-fields, --counts-only,
                            --user-fields, --poll-fields].
  --expansions TEXT         Comma separated list of expansions to retrieve.
                            Default is all available.
  --tweet-fields TEXT       Comma separated list of tweet fields to retrieve.
                            Default is all available.
  --user-fields TEXT        Comma separated list of user fields to retrieve.
                            Default is all available.
  --media-fields TEXT       Comma separated list of media fields to retrieve.
                            Default is all available.
  --place-fields TEXT       Comma separated list of place fields to retrieve.
                            Default is all available.
  --poll-fields TEXT        Comma separated list of poll fields to retrieve.
                            Default is all available.
  --usernames
  --hide-progress           Hide the Progress bar. Default: show progress,
                            unless using pipes.
  --help                    Show this message and exit.

version

Return the version of twarc that is installed.

Usage:

twarc2 version [OPTIONS]

Options:

  --help  Show this message and exit.