Category Archives: csd2220

Some experiments with PostgreSQL and simple Twitter analysis

A few months ago I prepared a bit of  material for the second year course “Software Development” in which I used Java and MongoDB to perform some simple Twitter data analysis. The material introduced MapReduce, JSON and other simple analysis tasks (hourly activity, geo location, most retweeted tweet, etc.). Recently, we installed a new virtual machine with PostgreSQL and Python. In parallel, some colleagues asked me to collect tweets about:

  1. The India’s Daughter documentary
  2. Elections in Nigeria.

The first dataset is approximately 1 GB of JSON data (250K tweets), while the second is approximately 11 GB of JSON data (2 million tweets). Is this something that can analysed using a standard SQL-based approach, on a default installation of PostgreSQL? How long will it take? The hardware is a virtual machine with 16 GB of RAM, 2 processors @2.2MHz, 120 GB of disk space. The operating system is Ubuntu 14.04, with PostgreSQL 9.3.6 and Python 2.7.

The starting point is a set of (gzipped) JSON files, each one containing 20,000 tweets. Instead of defining a PostgreSQL table with all the possible fields, I decided to follow the approach described here https://bibhas.in/blog/postgresql-swag-json-data-type-and-working-with-twitter-data/ creating exactly the same table with just two columns: the tweet ID (primay key), and the whole JSON tweet:

CREATE TABLE tweet
( tid CHARACTER VARYING NOT NULL, DATA json,
  CONSTRAINT tid_pkey PRIMARY KEY (tid) )

If you knew in advance the kind of analysis required it would be more efficient to import only those JSON entries that are required, say the field “created_at” if you were only interested in analysing traffic rates. After creating this table in PostgreSQL, let’s import the tweets. I do this using a simple Python script:

import json
import psycopg2
 
conn = psycopg2.connect("YOUR-CONNECTION-STRING")
cur = conn.cursor()
 
with open("YOURJSONFILE") as f:
  for tweetline in f:
    try:
      tweet = json.loads(tweetline)
    except ValueError, e:
      pass
    else:
      cur.execute("INSERT INTO tweet (tid, data) VALUES (%s, %s)", (tweet['id'], json.dumps(tweet), ))
 
conn.commit()
cur.close()
conn.close()

Exercise: handle the possible exceptions when executing the insert (non-unique key, for instance). Extend the script to read the file name from the command line and wrap everything in a bash script to iterate over the compressed JSON files.

I didn’t time this script, but it took a couple of minutes on the small dataset and probably around 10 minutes on the larger dataset. This could be improved by using a COPY instead of doing an INSERT, but I found this acceptable as it has to be done only once. Now that we have the data, let’s  try to stress a bit the machine. Let’s start with extracting the hourly activity in what is probably the most inefficient way one could think of: group by a substring of a JSON object converted to string, as follows:

SELECT SUBSTRING(data->> 'created_at'::text FROM 1 FOR 13), 
       COUNT(*) 
 FROM tweet GROUP BY 
   SUBSTRING(data->> 'created_at'::text FROM 1 FOR 13);

This extracts the “created_at” field from the JSON object, converts it to a string, and then it takes the substring from position 1 for 13 characters. The full “created_at” field is something like Tue Apr 07 17:21:04 +0000 2015, and the substring is therefore Tue Apr 07 17. Let’s call this on the small dataset (250K tweets) and ask psql to EXPLAIN ANALYZE the query:

explain ANALYZE select substring(data->> 'created_at'::text 
[...]
 Total runtime: 14602.845 ms

Not bad, just a bit more than 14 seconds! What about the large dataset? How long will it take on 2 million tweets?

explain ANALYZE select substring(data->> 'created_at'::text 
[...]
 Total runtime: 96553.320 ms

1 minute and a half, again pretty decent if you only run it once. I exported the result as a csv file and plotted, this is the result for the Nigeria elections:

nigeria-hourlyactivity

If you need to run multiple queries involving date and times it may be a bit boring to wait nearly 2 minutes each time; can the situation be improved? Yes, as I said above, if you know what you are looking for, then you can optimise for it. Let’s focus on the “created_at” field: instead of digging it out from the JSON object, we could alter the table and add a new column “created_at” of type timestamp and then populate it, as follows:

ALTER TABLE tweet ADD COLUMN created_at TIMESTAMP;
UPDATE tweet SET 
  created_at = to_timestamp(SUBSTRING(data->> 'created_at'::text FROM 5 FOR 9)||' 2015','
               Mon DD HH24 YYYY');

The update step will require a bit of time (some minutes, unfortunately I didn’t time it). But now the “group by” query above is executed in 458 ms (less than half a second) on 250K tweets and in 2.365 seconds on 2 million tweets.

Let’s try to extract the location of geo-tagged tweets. First of all, I count all of them on 250K tweets:

explain analyze select count(*) from tweet where data->>'geo' <> ''; 
[...] Total runtime: 14408.711 ms

For the large dataset the execution time is 147 seconds, a bit long but still acceptable for 2M tweets.

We can then extract the actual coordinates as follows:

SELECT data->'geo'->'coordinates'->0 AS lat, 
       data->'geo'->'coordinates'->1 AS lon 
  FROM tweet 
  WHERE data->>'geo' <> '';

You can run this query from the command line and generate a CSV file, as follows:

psql -t -A -F"," -c "select data->'geo'->'coordinates'->0 as lat, \
  data->'geo'->'coordinates'->1 as lon from tweet \
  where data->>'geo' <> '';" 

Exercise: redirect the output to a file, massage it a little bit to incorporate the coordinates in an HTML file using heatmap.js.

As usual, only approximately 1% of the tweets are geo-tagged. This means more or less 2,500 locations for the India’s Daughter dataset, and approximately 20K locations for the Nigerian elections. These are some plots obtained with these results.

indiasdaughter-allgeo

India’s Daughter: heatmap of geo-tagged tweets

nigeria-geotagged

Nigeria elections: heatmap (Nigeria scale)

world-nigeria-geotagged

Nigeria elections: heatmap (World view)

PostgreSQL can also be used to do text analysis. Again, we can use the approach described at https://bibhas.in/blog/postgresql-swag-part-2-indexing-json-data-type-and-full-text-search/. You can find more details about the queries used below at this other link: http://shisaa.jp/postset/postgresql-full-text-search-part-1.html. Let’s start by creating an index on the text of each tweet:

CREATE INDEX "idx_text" ON tweet USING gin(to_tsvector('english', data->>'text'));

The magic keywords here are gin (Generalized Inverted Index, which is a type of PostgreSQL index) and to_tsvector. This last function is a tokenizer for a string, and performs all the stemming using an English dictionary in this case (use the second link above if you want to know the details of all this). The index can be used to find tweets containing specific keywords, in the following way:

SELECT COUNT(tid) FROM tweet WHERE 
  to_tsvector('english', data->>'text') @@ to_tsquery('Buhari');

Notice the special operator @@ used to match a vector of tokens with a specific keyword (you could also use logical operators here!). More interestingly, we can use this approach to compute the most used words in the tweets, a typical MapReduce job:

SELECT * FROM ts_stat('SELECT to_tsvector(''english'',data->>''text'') from tweet') 
  ORDER BY nentry DESC;

This is a nested query: first we extract all the tokens, then we use the ts_stat function on all the tokens (see http://www.postgresql.org/docs/9.3/static/textsearch-features.html). The execution time is pretty reasonable: 28 seconds for 250K tweets and 168 seconds for 2M tweets. These are the top 10 words (with number of occurrences) for the India’s Daughter dataset (stemmed, with number of tweets in which they occur):

  • indiasdaught, 209773
  • india, 55879
  • ban, 49851
  • documentari, 40541
  • rape, 32269
  • bbc, 31186
  • watch, 25388
  • daughter, 19968
  • rapist, 20269
  • indian, 19417

These are the top 10 words for Nigerian elections dataset:

  • buhari, 894709
  • gej, 495743
  • jonathan, 337640
  • presid, 320757
  • gmb, 285770
  • vote, 273892
  • nigeria, 216294
  • elect, 221955
  • win, 182845
  • apc, 161968

How many tweets are retweets? We could take a shortcut and count the number of tweets starting with RT (we could also use the token RT, which is more efficient). Instead, let’s see what happens if we take the long way: we check whether the “id_str” of the “retweeted_status” field is not empty:

SELECT COUNT(*) FROM tweet 
   WHERE CAST(data->'retweeted_status'->'id_str' AS text) <> '';

The query takes a bit less than 20 seconds on 250K tweets and 124 seconds on 2M tweets. More than 162,000 tweets are retweets (64%) for the India’s Daughter dataset, and a bit more than 1,1M in the Nigerian elections dataset (56.5%).

We can also find the most retweeted tweets in each dataset. In the following inefficient query I group tweets by the id of the person being retweeted and I count the number of rows:

SELECT COUNT(*) AS nretweets, 
     MAX(CAST(data->'retweeted_status'->'user'->'screen_name' AS text)) 
        AS username, 
     CAST(data->'retweeted_status'->'id_str' AS text) AS retid, 
     MAX(CAST(data->'retweeted_status'->'text' AS text)) AS text 
  FROM tweet 
  GROUP BY retid 
  HAVING CAST(data->'retweeted_status'->'id_str' AS text) <> '' 
  ORDER BY nretweets DESC;

This is the most inefficient query so far: it takes 68 seconds on 250K tweets and 417 seconds (nearly 7 minutes) on 2M tweets.

These are the 2 most retweeted tweets among the 250K tweets on India’s Daughter:

  •  “Check out @virsanghvi’s take on #IndiasDaughter – https://t.co/BsxTkLK9Yy @adityan”, by mooziek, retweeted 1579 times
  • “Forget ban, #IndiasDaughter is must watch. Anyone who watches will understand devastation caused by regressive attitudes. Face it. Fix it.”, by chetan_bhagat, retweeted 1399 times

These are the 2 most retweeted tweets among the 2M tweets on Nigerian elections:

  • “‘Buhari’ is the fastest growing brand in Africa. RT if you agree”, by APCNigeria, retweeted 3460 time.
  • “Professor Joseph Chikelue Obi : ” The 2015 Nigerian Elections are Over. President Buhari must now quickly move into Aso Rock & Start Work “.”, by DrJosephObi, retweeted 3050 times.

Finally, let’s do a simple sentiment analysis using the vaderSentiment tool. This “is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media”. It is very easy to install using pip and very easy to use, as shown in the following piece of code:

from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
[...]
conn=psycopg2.connect("dbname='indiasdaughter'")
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute("""SELECT (data->'text') as message from tweet""")
[...]
rows = cur.fetchall()
for row in rows:
    vs = vaderSentiment(row['message'])
    print row['message'],",",vs['compound']
    print vs['pos'],",",vs['neu'],",",vs['neg']

You can compute the average sentiment by taking the average of vs['compound']. It takes 1 minute and 30 seconds to run this task on 250K tweets, whose average sentiment is -0.17 (slightly negative). It takes 11 minutes and 32 seconds to run the same code on 2M tweets about the Nigerian elections; for this set the average sentiment is (slightly positive).

In conclusion:

  • You can use PostgreSQL to perform simple twitter analysis  on a dataset of 2 million tweets.
  • I have shown some query patterns. These are not optimised, I’m sure performance can be improved in a number of ways.
  • Should you use a NoSQL database, Hadoop, MapReduce etc.? Definitely yes, so that you can learn another approach. However, if you are only interested in the results of an off-line analysis, you are already familiar with standard databases and your dataset is of a million of tweets or thereabout, then a standard PostgreSQL installation would work just fine.
  • What about MySQL? I don’t know, it would be interesting to compare the results, especially with version 5.6 and full text search.

Calling Java from C++ on Mac OSX Mavericks with JNI and Java 8.

For a small research project I need to invoke Java from C++. More precisely, in the C++ code I need to create an object by calling a constructor and then invoke some methods. Surprisingly, I was not able to find a lot of information on-line. There are a number of tutorials to call C/C++ from Java, but not much for the other direction and very little for Java 8 on Mac. This is a very quick summary of what I’ve done, taken mainly from http://www.ibm.com/developerworks/java/tutorials/j-jni/j-jni.html and modified appropriately. Let’s start from a simple Java class, something like this:

package com.example.simple;
 
import java.util.List;
import java.util.ArrayList;
 
public class SimpleJNITest {
  List values;
 
  public SimpleJNITest() {
    values = new ArrayList();
  }
 
  public void addValue(String v) {
    values.add(v);
  }
 
  public int getSize() {
    return values.size();
  }
 
  public void printValues() {
    for (String v: values) {
      System.out.println("Value: " + v);
    }
}

This is nothing special: a simple Java class with a constructors and a few methods to add values to a list, get the list size, and print the values of the list to standard output. Save this file to SimpleJNITest.java in the directory com/example/simple (I assume you are working at the root of this directory). Now, let’s suppose you need to do the following from a C++ file:

  • Create a new object of type SimpleJNITest.
  • Add some values to this object.
  • Retrieve the number of values currently stored.
  • Invoke the printValues method to print values to screen.

The solution is the following (comments in the code. Make sure you read all the comments carefully before asking questions!).

/* Remember to include this */
#include <jni.h>
#include <cstring>
 
/* This is a simple main file to show how to call Java from C++ */
 
int main()
{
  /* The following is a list of objects provided by JNI.
     The names should be self-explanatory...
   */
  JavaVMOption options[1]; // A list of options to build a JVM from C++
  JNIEnv *env;
  JavaVM *jvm;
  JavaVMInitArgs vm_args; // Arguments for the JVM (see below)
 
  // This will be used to reference the Java class SimpleJNITest
  jclass cls;
 
  // This will be used to reference the constructor of the class
  jmethodID constructor;
 
  // This will be used to reference the object created by calling
  // the constructor of the class above:
  jobject simpleJNITestInstance;
 
  // You may need to change this. This is relative to the location
  // of the C++ executable
  options[0].optionString = "-Djava.class.path=.:./";
 
  // Setting the arguments to create a JVM...
  memset(&vm_args, 0, sizeof(vm_args));
  vm_args.version = JNI_VERSION_1_6;
  vm_args.nOptions = 1;
  vm_args.options = options;
 
 
  long status = JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args);
 
  if (status != JNI_ERR) {
    // If there was no error, let's create a reference to the class.
    // Make sure that this is in the class path specified above in the
    // options array
    cls = env->FindClass("com/example/simple/SimpleJNITest");
 
    if(cls !=0) {
      printf("Class found \n");
      // As above, if there was no error...
 
      /* Let's build a reference to the constructor.
	 This is done using the GetMethodID of env.
	 This method takes 
	 (1) a reference to the class (cls)
	 (2) the name of the method. For a constructor, this is 
             <init>; for standard methods this would be the actual
             name of the method (see below).
	 (3) the signature of the method (its internal representation, to be precise). 
	     To get the signature the quickest way is to use javap. Just type:
             javap -s -p com/example/simple/SimpleJNITest.class
             And you will get the signatures of all the methods,
             including the constructor (check the "descriptor" field).
             In the case of the constructor, the signature is "()V", which
             means that the constructor does not take arguments and has no
             return value. See below for other examples.        
       */
      constructor = env->GetMethodID(cls, "<init>", "()V");
 
      if(constructor !=0 ) {
 
	printf("Constructor found \n");
	/* If there was no error, let's create an instance of the SimpleTestJNI by
	   calling its constructor
	*/
	jobject simpleJNITestInstance = env->NewObject(cls, constructor);
 
	if ( simpleJNITestInstance != 0 ) {
	  /* If there was no error, let's call some methods of the 
	     newly created instance
	  */
 
	  printf("Instance created \n");
 
	  // First of all, we create two Strings to be passed
	  // as argument to the addValue method.
	  jstring jstr1 = env->NewStringUTF("First string");
	  jstring jstr2 = env->NewStringUTF("Second string");
 
	  /* Then, we create a reference to the addValue method.
	     This is very similar to the reference to the constructor above.
	     As above, you can get the signature of the addValue method with javap.
	     In this case, the method takes a String as input and does not return
	     a value (V is for void)
	   */
 
	  jmethodID addValue = env->GetMethodID(cls, "addValue", "(Ljava/lang/String;)V");
 
	  /* Finally, let's call the method twice, with two different arguments.
	     (it would probably be a good idea to check for errors here... This is
	      left as a simple exercise to the student ;-).
	   */
	  env->CallObjectMethod(simpleJNITestInstance , addValue, jstr1);
	  env->CallObjectMethod(simpleJNITestInstance , addValue, jstr1);
 
	  /* Let's now call another Java method: printValues should print on screen
	     the content of the List of values in the object. The pattern is identical
	     to the one above, but no arguments are passed.
	  */
	  jmethodID printValues = env->GetMethodID(cls, "printValues", "()V");
	  env->CallObjectMethod(simpleJNITestInstance , printValues);
 
	  /* Finally, let's extract an int value from a method call. We need to
	     invoke CallIntMethod and we are going to use getSize of 
	     simpleJNITestInstance. Notice the signature: the 
	     method returns an int and it does not take arguments
	  */
	  jint listSize;
	  jmethodID getSize = env->GetMethodID(cls, "getSize", "()I");
	  listSize = env->CallIntMethod(simpleJNITestInstance , getSize);
 
	  printf("The size of the Java list is: %d\n", listSize);
 
        }
      }
      else {
	printf("I could not create constructor\n");
      }
    }
    jvm->DestroyJavaVM();
    printf("All done, bye bye!\n");
    return 0;
  }
  else
    return -1;
}

The code in itself is not too difficult. The main problem is getting the correct location of the various header files and libraries… To make the whole thing run, save the code above in a file (call it test.cpp) and do the following:

  • Compile the Java class: this is easy, just type
    javac com/example/simple/SimpleJNITest.java
  • Compile the C++ file: this is slightly more complicated because you need to specify various paths. The command on my machine (Mac OSX Mavericks, Java 1.8, gcc) is the following:
    g++  -o test \
     -I/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include \
     -I/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin \
     -L/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/server/ \
     test.cpp \
     -ljvm
    

    You can remove the backslash at the end of each line if you write the command on a single line. The two -I directives tell where to find jni.h and jni_md.h; the -L (and -l) directive are used by the linker to find libjvm.dylib. If the compilation is successful, you should get an executable called test.

  • Before running test, you need to set LD_LIBRARY_PATH with the following instruction:
    export LD_LIBRARY_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/server/
  • From the same terminal where you have executed the command above, just type ./test.

Et voila, job done, if everything is OK you should see something like this:

$ ./test 
Class found 
Constructor found 
Instance created 
Value: First string
Value: First string
The size of the Java list is: 2
All done, bye bye!

Building an action camera using a Raspberry Pi and Java

20140625_205058I’m in charge of preparing some material for the new second year “Software Development” course here at Middlesex. As part of the new Java course I thought it would be a good idea to start exploring the Raspberry Pi GPIO (General Purpose I/O) pins with Java. These pins are very easy to use in Python, but with Java they require a bit more work (not much, don’t worry and keep reading).

Instead of just doing the usual exercises with traffic lights and digital inputs, I thought that it would be a nice idea to build a more “concrete” application. As a result, I decided to build an action camera that could be mounted on my bike helmet (by pure chance soon after after GoPro IPO 30% increase…). My plan is to have:

  1. A digital input switch (to set the camera on/off)
  2. A couple of LEDs to show the status of the application
  3. The standard Raspberry camera (the quality is excellent!)
  4. A USB WiFi dongle to make the Raspberry Pi an access point. Well, I’m not planning to use the Raspberry Pi as a router, I just would like it to set up a wireless network to which one could connect with a phone or a laptop to download the videos that are captured (TODO: I would like to implement a Racket-based web server to view the videos, delete them, etc.).

The final result (camera mounted on helmet) is shown above. This is a video made with the above set-up and with an appropriate “Twinkle twinkle Little Star” tune, given the time at which it was taken:

https://www.youtube.com/watch?v=aFguIT2qkTs

This is a picture of the wiring, see below for details:

JvPiwiring

OK, let’s start. I assume you have a working Raspbian image, a wireless dongle that works with hostapd (see http://raspberry-at-home.com/hotspot-wifi-access-point/), an input switch, a couple of LEDs and some experience with Linux and, more importantly, with Java. I’m using Java 8, but version 7 should work fine as well, see http://www.rpiblog.com/2014/03/installing-oracle-jdk-8-on-raspberry-pi.html for installation instructions. First of all you need to familiarise with the GPIO pins. This is a close-up picture of the GPIO pins (with some pins connected):

JvPi-wiring-closeup

Forget about the clarity, simplicity and engineering beauty of Arduino pins…

  • First of all, there are no numbers on the pins. Check carefully the picture above and you should see “P1″ on one of them: this is the only number you’ll get on the board.
  • There are three ways (that I know) to number the GPIO pins, and in most cases numbering is not sequential (see below).
  • The numbering has changed between Revision 1 and Revision 2
  • Revision 2 has an additional set of pins (but these are only accessible on the P5 header: turn the Raspberry Pi upside down and look for small holes: this is the P5 header). In the picture above you can see two holes to the right of R2: this is the beginning of the P5 header.

Keeping all this in mind, have a look at the table available at this link: http://wiringpi.com/wp-content/uploads/2013/03/gpio1.png. The two central columns (header) provide a progressive numbering. The columns “Name” contain the labels that are called “Board” in Python GPIO (for instance: the fourth pin on the left column from the top is called GPIO-zero-seven and 0V means “ground”). The column BCM GPIO contains another numbering (this numbering has changed between revision 1 and revision 2; for instance, BCM pin 21 in revision 1 is BCM pin 27 in revision 2). Finally, there is a “WiringPi Pin” numbering and this is the one that we are going to use with Java below. If you carefully check the picture above you’ll see, from left to right that:

  • There is one green wire connected to 0 V (ground) on header 9 and a red wire connected to WiringPi pin 1 (corresponding to GPIO 01): these will  control the red LED.
  • There is a red wire connected to WiringPi pin 2 (corresponding to GPIO 02) and a yellow one to 0 V (ground) on header 14: these will control the green LED.
  • There is a white wire connected to WiringPi pin 14 (header 23) and a purple one to 0 V (ground) on header 25. These will be connected to the on/off switch using a PULL-UP resistor (more on this later).

If you didn’t give up reading and you reached this point: congratulations, we are nearly there :-). It is now time to go back to Java and the first thing you need to do is to download Pi4J (http://pi4j.com/), a  library to “provide a bridge between the native libraries and Java for full access to the Raspberry Pi“. Get the 1.0 snapshot available at https://code.google.com/p/pi4j/downloads/list and extract it somewhere. Add this location to your Java classpath when you compile and run the code below.

You are now ready to write your first Java application to control GPIO pins. Let’s start with a very simple loop to turn a LED on and off (the famous Blink example in Arduino):

import com.pi4j.io.gpio.GpioController;
import com.pi4j.io.gpio.GpioFactory;
import com.pi4j.io.gpio.GpioPinDigitalInput;
import com.pi4j.io.gpio.GpioPinDigitalOutput;
import com.pi4j.io.gpio.Pin;
import com.pi4j.io.gpio.PinPullResistance;
import com.pi4j.io.gpio.RaspiPin;
 
//[...] add your methods here, then:
  GpioController gpio = GpioFactory.getInstance();
  GpioPinDigitalOutput redLED = gpio.provisionDigitalOutputPin(RaspiPin.GPIO_01);
  while (true) {
    // Add a try/catch block around the following:
    redLED.high();
    Thread.sleep(1000);
    redLED.low();
    Thread.sleep(1000);
  }

In the code above, you first need to import a number of packages; then, you set a GPIO controller and define an output pin attached to WiringPi pin 1 (with RaspiPin.GPIO_01). Then, the infinite loop keeps turning the LED on and off. Have a look at the documentation available online for additional examples: Pi4J is really well realised and there are plenty of examples available. For our action camera we are going to connect a red LED to GPIO_01 and a green LED to GPIO_02. These are configured as output pins. We then need an input pin and, more importantly, we need to start (or stop) recording when the state of this input pin changes. Pi4J provides a very convenient interface to detect pin changes. In the following example, we first define an implementation of this interface in the OnOffListener class:

import com.pi4j.io.gpio.event.GpioPinListenerDigital;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
 
public class OnOffStateListener implements GpioPinListenerDigital {
 
        @Override
	public void handleGpioPinDigitalStateChangeEvent(GpioPinDigitalStateChangeEvent event) {
            // Just print on screen for the moment
            System.out.println("State has changed");
        }
}

We then attach this listener to an input pin, as follows:

  GpioPinDigitalInput onOffSwitch = gpio.provisionDigitalInputPin(RaspiPin.GPIO_14, PinPullResistance.PULL_UP);	
  onOffSwitch.addListener(new OnOffStateListener());

Here we first define an input pin for WiringPi pin 14 and then we attach the listener defined above to this pin. Note that I define the input with a PULL_UP resistor (if you don’t know what this means, have a look at the Arduino documentation before moving to the next step!). If you try this code, you should get a message every time the input pin changes its state.

Building the full application is now a matter of gluing together these pieces and some instructions to turn the video recording on or off at each state change of the input pin. This is the full code for the main application:

package uk.ac.mdx.cs.jvpi;
 
import com.pi4j.io.gpio.GpioController;
import com.pi4j.io.gpio.GpioFactory;
import com.pi4j.io.gpio.GpioPinDigitalInput;
import com.pi4j.io.gpio.GpioPinDigitalOutput;
import com.pi4j.io.gpio.Pin;
import com.pi4j.io.gpio.PinPullResistance;
import com.pi4j.io.gpio.RaspiPin;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
import com.pi4j.io.gpio.event.GpioPinListenerDigital;
 
public class JvPi {
 
	// This is the controller.
	private GpioController gpio;
 
	// The current pin mapping
	private static final Pin redPin =  RaspiPin.GPIO_01;
	private static final Pin greenPin = RaspiPin.GPIO_02;
	private static final Pin switchPin = RaspiPin.GPIO_14;
 
	// The pins to which we attach LEDs
	private GpioPinDigitalOutput red,green;
 
	// this is going to be an input PULL_UP, see below.
	private GpioPinDigitalInput onOffSwitch;
 
	// set to true when capturing
	private boolean capturing;
 
	// Main method
	public JvPi() {
		this.gpio = GpioFactory.getInstance();
		this.red = gpio.provisionDigitalOutputPin(redPin);
		this.green = gpio.provisionDigitalOutputPin(greenPin);
		this.onOffSwitch = gpio.provisionDigitalInputPin(switchPin, PinPullResistance.PULL_UP);
 
		// The listener takes care of turning on and off the camera and the red LED	
		onOffSwitch.addListener(new OnOffStateListener(this));
	}
 
	public boolean isCapturing() {
		return this.capturing;
	}
 
	public void toggleCapture() {
		this.capturing = !this.capturing;
	}
 
	public GpioPinDigitalOutput getRed() {
		return this.red;
	}
 
	public GpioPinDigitalOutput getGreen() {
		return this.green;
	}
 
	public static void main(String[] args) {
		JvPi jvpi = new JvPi();
		jvpi.getGreen().high();
		System.out.println("System started");
		while (true) {
			try {
				Thread.sleep(1000);
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
 
}

The main method simply creates a new instance of JvPi that, in turn, attaches a listener to the input WiringPi pin 14. This is the code for the listener:

package uk.ac.mdx.cs.jvpi;
 
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
 
import com.pi4j.io.gpio.event.GpioPinListenerDigital;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
 
public class OnOffStateListener implements GpioPinListenerDigital {
 
	private JvPi jvpi;
 
	private final String height = "720";
	private final String width = "960";
	private final String fps = "15";
	private final String destDir = "/home/pi/capture/";
 
	// Remember to add filename and extension!
	private final String startInstruction = "/usr/bin/raspivid -t 0 -h "+height+ " -w "+width+
			" -o "+destDir;
 
	private final String killInstruction = "killall raspivid";
 
	public OnOffStateListener(JvPi j) {
		this.jvpi = j;
	}
 
	@Override
	public void handleGpioPinDigitalStateChangeEvent(GpioPinDigitalStateChangeEvent event) {
        // display pin state on console
        if (this.jvpi.isCapturing()) {
          System.out.println("Killing raspivid");
          this.jvpi.getRed().low();
          killCapture();
        } else {
          System.out.println("Starting raspivid");
          this.jvpi.getRed().high();
          startCapture();
        }
        this.jvpi.toggleCapture();
 
    }
 
	private void killCapture() {
		executeCommand(this.killInstruction);
	}
 
	private void startCapture() {
		Date date = new Date() ;
		SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");
		String filename = this.startInstruction + "vid-"+dateFormat.format(date) + ".h264";
		executeCommand(filename);
	}
 
	private void executeCommand(String cmd) {
		Runtime r = Runtime.getRuntime();
		try {
			r.exec(cmd);
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
 
 
}

As you can see, the code invokes raspivid if it was not capturing and it kills the raspivid process if it was running (TODO: improve error checking :-)! A number of default options, such as resolution and frame rate, can be configured here. The video is recorded to a file whose name is obtained from the current system date and time.

I have used a very basic box to store everything and I have attached the box to the helmet using electric tape: this is definitely not the ideal solution, but it is good enough for a proof of concept.

As usual, drop me an email (or leave a comment) if you have questions!