Open and public archive of political social media talk

@tim and I were discussing how it would be useful to follow a political figure’s social media, without actually connecting to them on said network. It is important to know what they say, without signaling agreement or having to use a proprietary system.

In that vain we are brainstorming methods to extract Twitter feeds into a usable format.

Here is an initial scan of possible tools to extract user feeds:

  • Archive My Tweets - Archive your tweets to easily browse and search them - all on your own website and in your control.
  • Corebird - Easy access to the Twitter REST API, Collections API, Streaming API, TON (Object Nest) API and Twitter Ads API — all from one PHP library.
  • Ozh’ Tweet Archiver - Import and archive your tweets with WordPress
  • Phirehose - PHP interface to Twitter Streaming API
  • Python Twitter - A Python wrapper around the Twitter API.
  • Tweepy - An easy-to-use Python library for accessing the Twitter API.
  • - A script to download all of a user’s tweets into a csv

Ideally we could use the streaming API for users to collect the messages and save them for publishing in a different system like WordPress, where we can leverage search and other tools for discovery.

The hosting for WordPress doesn’t really fit with the needs of a streaming capture service, and if we use WordPress to poll we run the risk of hitting a rate limit fairly quickly.

Initial thoughts on pipeline for this is to set up something to save the user stream into some storage, then periodically run a process that turns that data into a feed that we can slurp up from WordPress.

Adding to this (because I’ve used this several times before):

I strongly agree. I could build everything we need for the twitter API connection + fetching + storage in a couple hours. Then it’s just a matter of what format we want the output of that to be.

I’m thinking it’ll be a really simple service that will look into stored queries like:

  • Type: user, val: @timotheus, Fetch Interval: 30 min
  • Type: tag, val: #bluebeanie, Fetch Interval: 60 min

See if any are due to be fetched, and if so, fetch em! Do pagination to grab all results of each type back to the last ID we have recorded for that type (to get all tweets since last fetch), then store them.

Then have an API interface where you pass GET parameters to get the stored tweet data out.

A simple cURL command I just wrote to get the final url after redirects. It works!


$d['initial_url'] = '';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$d['final_url'] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

This outputs:

    [initial_url] =>
    [final_url] =>