Open and public archive of political social media talk

maiki · November 13, 2016, 10:30pm

@tim and I were discussing how it would be useful to follow a political figure’s social media, without actually connecting to them on said network. It is important to know what they say, without signaling agreement or having to use a proprietary system.

In that vain we are brainstorming methods to extract Twitter feeds into a usable format.

Here is an initial scan of possible tools to extract user feeds:

Archive My Tweets - Archive your tweets to easily browse and search them - all on your own website and in your control.
Corebird - Easy access to the Twitter REST API, Collections API, Streaming API, TON (Object Nest) API and Twitter Ads API — all from one PHP library.
Ozh’ Tweet Archiver - Import and archive your tweets with WordPress
Phirehose - PHP interface to Twitter Streaming API
Python Twitter - A Python wrapper around the Twitter API.
Tweepy - An easy-to-use Python library for accessing the Twitter API.
tweet_dumper.py - A script to download all of a user’s tweets into a csv

Ideally we could use the streaming API for users to collect the messages and save them for publishing in a different system like WordPress, where we can leverage search and other tools for discovery.

The hosting for WordPress doesn’t really fit with the needs of a streaming capture service, and if we use WordPress to poll we run the risk of hitting a rate limit fairly quickly.

Initial thoughts on pipeline for this is to set up something to save the user stream into some storage, then periodically run a process that turns that data into a feed that we can slurp up from WordPress.

tim · November 14, 2016, 6:13pm

Adding to this (because I’ve used this several times before):

Twitter API 1.1 PHP wrapper - simple PHP wrapper around the Twitter API

I strongly agree. I could build everything we need for the twitter API connection + fetching + storage in a couple hours. Then it’s just a matter of what format we want the output of that to be.

I’m thinking it’ll be a really simple service that will look into stored queries like:

Type: user, val: @timotheus, Fetch Interval: 30 min
Type: tag, val: #bluebeanie, Fetch Interval: 60 min

See if any are due to be fetched, and if so, fetch em! Do pagination to grab all results of each type back to the last ID we have recorded for that type (to get all tweets since last fetch), then store them.

Then have an API interface where you pass GET parameters to get the stored tweet data out.

tim · November 14, 2016, 6:38pm

A simple cURL command I just wrote to get the final url after redirects. It works!

<?php

$d['initial_url'] = 'https://t.co/NOKpf0iHpR';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_exec($ch);
$d['final_url'] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
print_r($d);

This outputs:

Array
(
    [initial_url] => https://t.co/NOKpf0iHpR
    [final_url] => https://tim.hithlonde.com/2016/announcing-js-space/
)

Topic		Replies	Views
Does ADHD comics have an RSS feed? Mediaclub	4	307	August 10, 2020
Discover mastodon backup Quest Board mastodon , warez	0	341	July 16, 2018
Riding the Feedback Loop maiki	1	410	April 30, 2021
FeedWordPress Webcraft wordpress-plugins , feedwordpress	0	164	October 20, 2013
Data engine: tell me when something interesting has happened Webcraft mediawiki , databases , knowledge , engineering	1	286	June 26, 2021

Open and public archive of political social media talk

Related topics