Plugins

twarc v1 collected a set of utilities for working with tweet json in the utils directory of the git repository. This was a handy way to develop and share snippets of code. But some utilities had different dependencies which weren't managed in a uniform way. Some of the utilities had slightly different interfaces. They needed to be downloaded from GitHub manually and weren't easily accessible at the command line if you remembered where you put them.

With twarc2 these utilities are now installable as plugins, which are made available as subcommands using the same twarc2 command line. Plugins are published separately from twarc on PyPI and are installed with pip. Here is a list of some known plugins (if you write one please let us know so we can add it to this list):

twarc-ids: a simple example of printing the ids for tweets to use as a reference for creating plugins
twarc-csv: export tweets to CSV, which is probably the first thing a researcher will want to do
twarc-videos: extract videos from tweets
twarc-network: visualize tweets and users as a network graph
twarc-timeline-archive: routinely download tweet timelines for a list of users
twarc-hashtags: create a report of hashtags that are used in collected tweet data
Write your own, and let us know so we can add it here!

Writing a Plugin

The twarc-ids plugin provides an example of how to write plugins. This reference plugin simply reads collected tweet JSON data and writes out the tweet identifiers. First you install the plugin:

pip install twarc-ids

and then you use it:

twarc2 ids tweets.json > ids.txt

Internally twarc's command line is implemented using the click library. The click-plugins module is what manages twarc2 plugins. Basically you import click and implement your plugin as you would any other click utility, for example:

import json
import click

@click.command()
@click.argument('infile', type=click.File('r'), default='-')
@click.argument('outfile', type=click.File('w'), default='-')
def ids(infile, outfile):
    """
    Extract tweet ids from tweet JSON.
    """
    for line in infile:
        tweet = json.loads(line)
        click.echo(t['data']['id'], file=outfile)

Note that the plugin takes input file infile and writes to an output file outfile which default to stdin and stdout respectively. This allows plugin utilities to be used as part of pipelines. You can add options using the standard facilities that click provides if your plugin needs them.

If your plugin needs to talk to the Twitter API then just add the @click.pass_obj decorator which will ensure that the first parameter in your function will be a Twarc2 client that is configured to use the client's keys.

@click.command()
@click.argument('infile', type=click.File('r'), default='-')
@click.argument('outfile', type=click.File('w'), default='-')
@click.pass_obj
def ids(twarc_client, infile, outfile):
    # do something with the twarc client here

Finally you just need to create a setup.py file for your project that looks something like this:

import setuptools

setuptools.setup(
    name='twarc-ids',
    version='0.0.1',
    url='https://github.com/docnow/twarc-ids',
    author='Ed Summers',
    author_email='ehs@pobox.com',
    py_modules=['twarc_ids'],
    description='A twarc plugin to read Twitter data and output the tweet ids',
    install_requires=['twarc'],
    setup_requires=['pytest-runner'],
    tests_require=['pytest'],
    entry_points='''
        [twarc.plugins]
        ids=twarc_ids:ids
    '''
)

The key part here is the entry_points section which is what allows twarc2 to discover twarc.plugins dynamically at runtime, and also defines how the subcommand maps to the plugin's function.

It's good practice to include a test or two for your plugin to ensure it works over time. Check out the example here for how to test command line utilities easily with click.

To publish your plugin on PyPi:

pip install twine
python setup.py sdist
twine upload dist/*
# enter pypi login details