Who is talking about pizza in London?

Some of my students are interested in using Twitter data for their projects, and some others are interested in developing network applications. In this post I show how to retrieve data from the Twitter stream and how to manipulate it with server-side Javascript using node.js. In particular: I will build a stream of all the geo-tagged tweets from London talking about pizza.

DISCLAIMER: I am not an advocate of node.js. The same thing could be done in Python, Ruby, you-name-it. I am using node.js because 1) my students are familiar with Java and 2) the code is very short.

OK let’s start from a quick introduction to the Twitter stream API: there is a sample stream available from https://stream.twitter.com/1/statuses/sample.json. This stream will return a random sample of tweets (notice: this is a lot of data!). You can connect to this stream in a number of ways, the easiest is from the command line using curl (this works on Mac and Linux, I don’t know in Windows. You obiously need to have a Twitter account for this):

$ curl --user YOURUSERNAME:YOURPASSWORD https://stream.twitter.com/1.1/statuses/sample.json

This will display a long list of tweets in JSON format. You can even point your browser to this address and see what happens. From the command line you could redirect the output to a file to be processed at a later stage, but it is probably a better idea to retrieve only the tweets that are relevant to you.

In my case, I am interested in tweets originating from London and containing certain keywords. Twitter provides an end-point for this, see https://dev.twitter.com/docs/api/1.1/post/statuses/filter. In particular, one can retrieve all the tweets containing specific keywords, or originating from a certain location (assuming the user provides this information). Again, there are a number of ways to connect to this end-point. If you want to use Python, there is a very simple class called tweetstream that can help you with this; otherwise, you can still use curl passing the parameters in the appropriate way:

curl --user YOURUSERNAME:YOURPASSWORD -d "track=pizza" https://stream.twitter.com/1.1/statuses/filter.json

Irrespective of the technology you use to connect, there are two key things to keep in mind:

  1. The filtered stream is not a sample, it’s the full set of tweets satisfying your query (subject to certain limits).
  2. If you provide more than one condition (e.g. a location and a list of keywords), the API will return all the tweets satisying any of the two. If you want the tweets that satisfy both conditions (e.g., all the tweets talking about pizza in London) you need to retrieve all the tweets from London, and then filter them locally. This is what I am going to do below.

Let’s now move to retrieving the tweets and displaying them in real time. Download node.js for your platform from this link: http://nodejs.org/download/ (I am using the binary version 0.10.1)

Extract the archive and try to write a simple http server. This is taken from http://nodejs.org/ and is a simple http reserve running on port 8080 and returning the string Hello World! to all requests:

1
2
3
4
5
6
var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8080, 'YOUR IP ADDRESS HERE');
console.log('Server running at http://YOUR IP ADDRESS HERE:8080/');

(obviously: put your IP address in the right place and make sure you don’t have firewalls blocking port 8080 etc). Save this code in a file, for instance test.js, and then run it with:

$ /path/to/node/bin/node test.js

Point your browser to the IP address on port 8080 and you should see Hello World!

The next step is to connect to the Twitter streaming API using node.js. To this end, I use this client: https://github.com/ttezel/twit. Just get it and install it (follow the on-line instructions, it should be very easy).

Before connecting to the Twitter API, go to https://dev.twitter.com/apps/new and create an application. You need to get a consumer_key, a consumer_secret, an access_token, and an access_token_secret. At this point you can connect to the Twitter streaming filter with the following code:

1
2
3
4
5
6
7
8
9
10
11
12
var Twit = require('twit')
var T = new Twit({
    consumer_key:         '...'
  , consumer_secret:      '...'
  , access_token:         '...'
  , access_token_secret:  '...'
})
var london =  [ '-0.489','51.28','0.236','51.686']
var stream = T.stream('statuses/filter', { locations: london })
stream.on('tweet', function (tweet) {
  console.log(tweet)
})

Fill in your keys, save the file and try to run it: you should see a stream of tweets from London (the bounding box is derived from openstreetmap).

I now put the two things together: I will stream the tweets using an http server. The code is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
var Twit = require('twit')
var http = require('http')
var url = require("url")
 
var T = new Twit({
    consumer_key:         '...'
  , consumer_secret:      '...'
  , access_token:         '...'
  , access_token_secret:  '...'
})
 
var london = [ '-0.489','51.28','0.236','51.686']
var stream = T.stream('statuses/filter', { locations: london })
 
var server = http.createServer(function (req, res) {
  var uri = url.parse(req.url).pathname;
  if(uri === "/pizzastream") {
    console.log("Request received");
    stream.on('tweet', function(tweet) {
      if(tweet.text.toLowerCase().indexOf("pizza") > -1) {
        res.write(tweet.user.name+" says: "+tweet.text);
      }
    });
  }
});
server.listen(8080);

The code is as follows: line 13 creates a variable called stream that includes the stream originating from the Twitter API. Line 15 creates a variable called http for the http server. In line 16 and 17 we check whether the requested URL is /pizzastream. If this is the case, every time there is new data on the Twitter stream (line 19), if the text of the tweet contains the word “pizza” (line 20), I write the name of the user and the text of the tweet to the http stream (line 21). Notice how the value of the various fields is accessed using tweet.text and tweet.user.name (other fields are available, check the full JSON output). You can check the actual output by pointing your browser to the address of the machine running this code, for instance http://192.168.0.10:8080/pizzastream.

The code should be improved to include an “else” statement for the if condition in line 17. Also, the stream should be kept “alive” when there is no data: have a look at the documentation available at this link: http://nodejs.org/api/stream.html

Leave a Reply

Your email address will not be published. Required fields are marked *


six × 5 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>