Category Archives: software

Exploring the Bitcoin blockchain using Java

[This is a short summary of material that I prepared for final year project students]

I assume that you already have a vague idea of what a bitcoin is and you have a simple understanding of the mechanisms behind transactions: payments are made to addresses (that are anonymous, in the sense that they cannot be directly linked to a specific individual), and all transactions are public. Transactions are collected in blocks, and blocks are chained together in the blockchain.

You can think of the blockchain as a big database that is continuously updated and is accessible to everyone. You can download the full blockchain using a software like Bitcoin Core. After installing the software, it will take a couple of weeks for your installation to synchronise. Notice that, at the time of writing, the blockchain has a size of over 130 Gb, take this into consideration…

If you have blockchain data available (not necessarily the whole blockchain, you can also work on subsets of it), it can be analysed using Java. You could do all the work from scratch and read raw data from the files etc. Let’s skip this step and use a library instead. There are several options available in most programming languages. I’m going to use Java and the bitcoinj library. This is a big library that can be used to build applications like wallets, on-line payments, etc. I am going to use just its parsing features for this post.

First of all download the jar file for the library at https://bitcoinj.github.io/ (I’m using https://search.maven.org/remotecontent?filepath=org/bitcoinj/bitcoinj-core/0.14.4/bitcoinj-core-0.14.4-bundled.jar). Then, download SLF4J (https://www.slf4j.org/download.html), extract it, and get the file called slf4j-simple-x.y.z.jar (in my case: slf4j-simple-1.7.25.jar). Add these two jar files to your classpath and you are ready to go.

Let’s start from a simple example: compute (and then plot) the number of transactions per day. This is the code, heavily commented, just go through it.

import java.io.File;
import java.text.SimpleDateFormat;
import java.util.LinkedList;
import java.util.List;
import java.util.HashMap;
import java.util.Locale;
import java.util.Map;
 
import org.bitcoinj.core.Block;
import org.bitcoinj.core.Context;
import org.bitcoinj.core.NetworkParameters;
import org.bitcoinj.core.Transaction;
import org.bitcoinj.params.MainNetParams;
import org.bitcoinj.utils.BlockFileLoader;
 
 
public class SimpleDailyTxCount {
 
	// Location of block files. This is where your blocks are located.
	// Check the documentation of Bitcoin Core if you are using
        // it, or use any other directory with blk*dat files. 
	static String PREFIX = "/path/to/your/bitcoin/blocks/";
 
        // A simple method with everything in it
	public void doSomething() {
 
		// Just some initial setup
		NetworkParameters np = new MainNetParams();
		Context.getOrCreate(MainNetParams.get());
 
		// We create a BlockFileLoader object by passing a list of files.
		// The list of files is built with the method buildList(), see
		// below for its definition.
		BlockFileLoader loader = new BlockFileLoader(np,buildList());
 
		// We are going to store the results in a map of the form 
                // day -> n. of transactions
		Map<String, Integer> dailyTotTxs = new HashMap<>();
 
		// A simple counter to have an idea of the progress
		int blockCounter = 0;
 
		// bitcoinj does all the magic: from the list of files in the loader
		// it builds a list of blocks. We iterate over it using the following
		// for loop
		for (Block block : loader) {
 
			blockCounter++;
			// This gives you an idea of the progress
			System.out.println("Analysing block "+blockCounter);
 
			// Extract the day from the block: we are only interested 
                        // in the day, not in the time. Block.getTime() returns 
                        // a Date, which is here converted to a string.
			String day = new SimpleDateFormat("yyyy-MM-dd").format(block.getTime());
 
			// Now we start populating the map day -> number of transactions.
			// Is this the first time we see the date? If yes, create an entry
			if (!dailyTotTxs.containsKey(day)) {
				dailyTotTxs.put(day, 0);
			}
 
			// The following is highly inefficient: we could simply do
			// block.getTransactions().size(), but is shows you
			// how to iterate over transactions in a block
			// So, we simply iterate over all transactions in the
			// block and for each of them we add 1 to the corresponding
			// entry in the map
			for ( Transaction tx: block.getTransactions() ) {		    	
				dailyTotTxs.put(day,dailyTotTxs.get(day)+1);
			}
		} // End of iteration over blocks
 
		// Finally, let's print the results
		for ( String d: dailyTotTxs.keySet()) {
			System.out.println(d+","+dailyTotTxs.get(d));
		}
	}  // end of doSomething() method.
 
 
	// The method returns a list of files in a directory according to a certain
	// pattern (block files have name blkNNNNN.dat)
	private List<File> buildList() {
            List<File> list = new LinkedList<File>();
            for (int i = 0; true; i++) {
                File file = new File(PREFIX + String.format(Locale.US, "blk%05d.dat", i));
                if (!file.exists())
                    break;
                list.add(file);
            }
	    return list;
	}
 
 
	// Main method: simply invoke everything
	public static void main(String[] args) {
		SimpleDailyTxCount tb = new SimpleDailyTxCount();
		tb.doSomething();
	}
 
}

This code will print on screen a list of values of the form “date, number of transactions”. Just redirect the output to a file and plot it. You should get something like this (notice the nearly exponential growth in the number of daily transactions):

dailyTxsLog

Number of transactions per day (log scale)

I am very impressed by the performance of the library: scanning the whole blockchain with the code above took approximately 35 minutes on my laptop (2014 MacBook Pro), with the blockchain stored on an external HD connected using a USB2 port. It took approximately 100% of one processor and 1 Gb of RAM at most.

A slightly more complicated example took 55 minutes: computing the daily distribution of transaction size. This requires adding a further loop in the code above to explore all the transaction outputs (and a few counters along the way). The buckets are 0-10 USD, 10-50 USD, 50-200 USD, 200-500 USD, 500-2000 USD, 2000+ USD (BTC/USD exchange rate computed by taking the average of opening and close value for the day).

TxSize

Tx size distribution (in USD), October 2011 – July 2017

BBC micro:bit: a very short tutorial

[DISCLAIMER: this is an overly simplified summary, partially taken from a presentation that I gave to a PGCE course. Please refer to https://www.microbit.co.uk/ for proper documentation!]

1071px-bbc_microbit

(source: Wikipedia)

The BBC micro:bit is a microcontroller. You can think of a microcontroller as a simple device that has a set of inputs, a set of outputs, and a piece of code in the middle (called the firmware) that can read the input and write to output. The micro:bit has several input and output options, including: two buttons labelled A and B, 25 LEDs organised in a 5×5 matrix, accelerometer, compass, etc. Writing a program for the micro:bit means creating a firmware to control the output, usually as a reaction to certain input.

You can write programs for the micro:bit using various approaches, some of them are visual (similar to Scratch), some others are textual. This may be a bit confusing at first, but it gives you the opportunity to work with different age groups and to achieve several of the points in the new National Curriculum for Computing, such as “use 2 or more programming languages, at least one of which is textual” in Key Stage 3.

Remember that “writing a program” here means creating a new firmware for the microcontroller. This is done in two steps: first, the code is written and compiled on a standard computer. In fact, you write the code in a browser and you compile it by pressing a button. The result of this operation is a file with .hex extension that you download on your computer. Then, the file needs to be copied on the micro:bit. This last step is very simple: it is enough to connect the micro:bit with its USB cable and copy the file on it.

Suppose that you want to build an application that turns ON the top left LED  when the A button is pressed, and it turns it off when button B is pressed. If you browse to https://www.microbit.co.uk/create-code you will find four different ways to write this program. Let’s start with what I think is the simplest approach: the Microsoft Block Editor. Click on  New Project and you will get an environment that is very similar to Scratch.

The difficult point here is to “design, write and debug [a program] that accomplish[es] specific goals, including controlling or simulating physical systems”, as mentioned in the national curriculum for Key Stage 2. The student should realise that to obtain the behaviour described above some form of loop is needed, together with conditionals (or a mechanism to handle events, which is another option here). The following figure shows a possible solution using a forever loop and two if blocks.

msblockeditorIf you press run you can “test” the program in the browser. If you press compile, you will be able to download a file with .hex extension. Connect the micro:bit to your computer using the USB cable, copy the .hex file that you have downloaded to the micro:bit et voila, the microcontroller is now running the code that you wrote! It will run it forever, until you replace it with another .hex file. You can now disconnect the micro:bit from the computer. Try to power it with the battery pack and press the A button: you will see that the code is running.

You can achieve exactly the same result using MicroPython. Go back to https://www.microbit.co.uk/create-code and click on New Project under Python. You will be presented with a more-or-less plain text editor. You can write Python code, as shown in the following figure:

micropythonThe logic in the figure above is exactly the same: an infinite loop with two if conditions (in this case I have swapped button A and B so that you can verify that the new code is running). Again, as above, pressing the Download button will download a .hex file that you then need to copy to the micro:bit. This approach is less intuitive, as you don’t have a simulator and you can only find out bugs when you upload the code to the device. However, it will only tell you “invalid input” and it’s then up to you to find the problem :-)! You can find additional details about MicroPython here: https://www.microbit.co.uk/python-guide.

If you go back to https://www.microbit.co.uk/create-code you will find two other languages: CodeKingdom JavaScript and Microsoft Touch Develop. The former is essentially JavaScript but, differently from MicroPython, it provides good support with visual blocks that are translated to actual code, in addition to a simulator that can be used to test the code before uploading it to the device. Microsoft Touch Develop is similar in that it provides visual blocks that are then translated into textual code; the syntax of this language is simple enough to be considered nearly pseudo-code and a simulator is available as well.

Hopefully this introduction will allow you to get started. There are many possible extensions: there are built-in functions to create scrolling text on the display and to detect the position of the micro:bit; there is a compass and an accelerometer and a Bluetooth connection is available as well. Additional components can be attached using alligator clips or connected directly with the board, which supports analog input and output (through PWM pins, so that servo motors can be connected as well). You will find plenty of examples on the micro:bit website and online.

If you prefer, you can also program the micro:bit using a tablet or mobile phone: in this case the firmware is transferred using Bluetooth and you need to download an app to enable the transfer (available for Android and iOS).

You could also build applications on a mobile device that access the data on the micro:bit, but this requires programming both the micro:bit and the phone, something that is probably too advanced for this simple tutorial.

Asip now available through Arduino Library Manager

As you probably know if you have read this blog before, Asip is a protocol to control an Arduino board from a computer. The protocol is intended to be a more extensible replacement for Firmata. The idea is that you can control the board directly from a USB or network connection by means of simple plain text messages, such as I,p,13,1 to set pin 13 to high.

From today, installing Asip on an Arduino board is much simpler:

  • Make sure that the version of your Arduino IDE is at least 1.6.2 and open it.
  • Click on Sketch -> Include Libraries -> Manage Libraries to open the library manager.
  • Type asip in the search box and you should get something similar to the following imageArduino library manager for Asip
  • Click on the first link for simple I/O services (you can add support for additional services later with the second link).
  • Click on File -> Examples -> asip and open AsipIO.
  • Connect your board, upload the sketch and you are ready to go.

If you want to use servo motors, sonar distance sensors and piezo tones, in addition to the plain asip library install also the library asip-additional-services and check the examples.

We have developed client libraries in Racket, Java and Python. Contact me if you need additional details, we will try to add tutorials and further documentation as soon as we have time.

Using the Maplin Robotic Arm with Scratch

robotarm-loadtest

Lifting some heavy weights

Maplin is currently selling a robotic arm for £29.99. The arm comes in a box and it needs to be assembled, something that takes approx 2.5 hours. The arm can then be connected to a computer using a USB cable (don’t forget that it also needs 4 D-batteries!). The arm is pretty basic: it has 4 joints and a clamp to hold up to 130 g, but there are no position sensors. This means that you can only move the arm by telling a specific joint to move in a certain direction for a certain amount of time. Nevertheless, I think it is an interesting learning experience and a starting point for other projects (see this video to add position sensors using potentiometers: https://www.youtube.com/watch?v=HDy9jEGMk_k)

In terms of software, in spite of what the box says, if you are happy to run a small Python program, the arm is (unofficially) compatible with more or less any operating system that can talk to a USB port, given that the protocol has been reverse engineered. There are examples online using an Xcode project on Mac and Python on a Raspberry Pi with Linux.

I thought that the arm could provide an interesting example for kids using the off-line version of Scratch 2.0 (and for me to learn a bit about writing Scratch extensions). The extension mechanism for the off-line version seems simple enough: you only need to build a basic HTTP server that acts and replies to messages sent by a Scratch sketch, as described in this document. Again, there are several examples online using a range of programming languages, including Python, as exemplified by this Scratch extension to control Arduino boards.

By taking inspiration (i.e., by reusing) parts of code from the links above I managed to put together the extension in an hour or so. The code is available here: https://github.com/fraimondi/pymarash

This is what you need to do to run the extension on Mac (but I suspect the steps are very similar in Linux. Let me know if you have it running on a Windows machine):

  • Download the off-line version of Scratch 2.0 (obviously…)
  • Make sure that you have Python on your machine (I’ve tested Python 2.7).
  • Make sure that you have pip (the package manager for Python). Then, execute pip install --pre pyusb to install pyUSB.
  • Make sure that libusb is installed. I’ve installed it using homebrew on Mac, on a Ubuntu machine you probably want to use apt-get.
  • Download the code from https://github.com/fraimondi/pymarash. You only need two files: pymarash_http_server.py and MaplinArm.json.
  • Connect the arm and make sure that it is turned on. Then, open a terminal and launch the server with python pymarash_http_server.py. You should get a message confirming that the server is running. Leave it there running.
  • Open Scratch and, while pressing down the SHIFT key, click on File. Select the last item “Import experimental HTTP extension” and select the file MaplinArm.json.

If everything went well, when you click on “More Blocks” you should see something like this:

Scratch-pymarashNotice that the dot next to “Maplin Arm” needs to be green: this means that Scratch can talk to the extension. As usual, the proof is in the pudding: https://www.youtube.com/watch?v=tbl3G_gvfjE

 

Install and use OpenCV 3.0 on Mac OS X with Eclipse (Java)

A final year student is currently working on a Java project in Eclipse using OpenCV . As this is something that other students have asked me, this is a summary of what we have done by putting together a few tutorials available online:

Prerequisites: Mac OS X 10.10 and XCode 6. Before starting the installation, make sure you have:

  1. Apache Ant installed. You can install Ant using Homebrew. If you don’t have Homebrew, install it using the following command:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then, update brew with brew update and finally install Ant with brew install ant

  1. Make sure you have CMake installed. You can download a binary file for Mac here: http://www.cmake.org/download/ After extracting the .dmg file, copy it to the /Applications/ folder.

If you have Ant and CMake installed, download OpenCV 3.0 for Mac from this link: http://opencv.org/downloads.html. Extract the file and this will create a new directory called
opencv-3.0.0/ (or something similar if you use a more recent version). Open a terminal and navigate to this directory. You can now start the compilation process:

  • Run cmake with the following command:
    /Applications/CMake.app/Contents/bin/cmake CMakeLists.txt. It shouldn’t take long. Check the output and make sure that java is listed as one of the modules to be installed.
  • Type make and go for a cup of tea, the compilation process will require a few minutes…

If everything goes well you should be able to compile everything and you can now start Eclipse. I’m using Eclipse Luna but I guess the process is very similar for other versions. Following the instructions available at http://docs.opencv.org/doc/tutorials/introduction/java_eclipse/java_eclipse.html, let’s create a user library and add it to a project that will make use of OpenCV:

  • In Eclipse, open the menu Eclipse -> Preferences -> Java -> Build Path -> User Libraries.  Click “New” and enter a name, I’m using opencv-3.0.0.

Screen Shot 2015-09-04 at 20.44.11

  • Click on the name of the library so that it becomes blue, then click on the right on “Add external JARs”. Browse to the directory where you have compiled OpenCV, open the bin/ directory and select “opencv-300.jar”. The screen should look more or less like this:

Screen Shot 2015-09-04 at 20.47.15

  • Now click on Native library location (None) so that it becomes blue, then click on Edit and you should get something similar to this:

Screen Shot 2015-09-04 at 20.48.32

  • Click on “External Folder…”, and again select the directory where you have compiled OpenCV and click on the lib/ directory.  Confirm and press OK (3 times).

If you want to use the OpenCV Java API you need to create an Eclipse (Java) project and add the library created above:

Select File -> New -> Java Project. You can use any name you want, say opencv-test.

Right-click on the newly created project, select properties, then Library -> Add Library… -> User Library. Tick “opencv-3.0.0″, press finish and then OK.

You are now ready to test that everything went well. You can start with the following simple code taken from http://docs.opencv.org/doc/tutorials/introduction/java_eclipse/java_eclipse.html:

import org.opencv.core.Core;
import org.opencv.core.CvType;
import org.opencv.core.Mat;
 
public class Hello
{
   public static void main( String[] args )
   {
      System.loadLibrary( Core.NATIVE_LIBRARY_NAME );
      Mat mat = Mat.eye( 3, 3, CvType.CV_8UC1 );
      System.out.println( "mat = " + mat.dump() );
   }
}

If this code works it should print a simple matrix. If you want to try something slightly more interesting, you could try to detect a face with the following code, taken from https://blog.openshift.com/day-12-opencv-face-detection-for-java-developers/ and adapted for OpenCV 3 (make sure to change the string constants in the source code below!)

import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.MatOfRect;
import org.opencv.core.Point;
import org.opencv.core.Rect;
import org.opencv.core.Scalar;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;
import org.opencv.objdetect.CascadeClassifier;
 
public class FaceDetector {
 
    public void run() {
 
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
        System.out.println("Starting...");
 
        // Change this path as appropriate for your configuration.
        CascadeClassifier faceDetector = new CascadeClassifier("/PATH/TO/opencv-3.0.0/data/haarcascades/haarcascade_frontalface_alt.xml");
 
        // Change this path as appropriate, pointing it to an image with at least a face...
        Mat image = 
                Imgcodecs.imread("/Users/franco/franco.jpg");
 
        MatOfRect faceDetections = new MatOfRect();
        faceDetector.detectMultiScale(image, faceDetections);
 
        System.out.println(String.format("Detected %s faces", faceDetections.toArray().length));
 
        for (Rect rect : faceDetections.toArray()) {
            Imgproc.rectangle(image, new Point(rect.x, rect.y), new Point(rect.x + rect.width, rect.y + rect.height),
                    new Scalar(0, 255, 0));
        }
 
        // Change this path as appropriate for your system. 
        String filename = "/Users/franco/ouput.png";
        System.out.println(String.format("Done. Writing %s", filename));
        Imgcodecs.imwrite(filename, image);
    }
 
    public static void main (String[] args) {
    	FaceDetector fd = new FaceDetector();
    	fd.run();
    }
}

Some experiments with PostgreSQL and simple Twitter analysis

A few months ago I prepared a bit of  material for the second year course “Software Development” in which I used Java and MongoDB to perform some simple Twitter data analysis. The material introduced MapReduce, JSON and other simple analysis tasks (hourly activity, geo location, most retweeted tweet, etc.). Recently, we installed a new virtual machine with PostgreSQL and Python. In parallel, some colleagues asked me to collect tweets about:

  1. The India’s Daughter documentary
  2. Elections in Nigeria.

The first dataset is approximately 1 GB of JSON data (250K tweets), while the second is approximately 11 GB of JSON data (2 million tweets). Is this something that can analysed using a standard SQL-based approach, on a default installation of PostgreSQL? How long will it take? The hardware is a virtual machine with 16 GB of RAM, 2 processors @2.2MHz, 120 GB of disk space. The operating system is Ubuntu 14.04, with PostgreSQL 9.3.6 and Python 2.7.

The starting point is a set of (gzipped) JSON files, each one containing 20,000 tweets. Instead of defining a PostgreSQL table with all the possible fields, I decided to follow the approach described here https://bibhas.in/blog/postgresql-swag-json-data-type-and-working-with-twitter-data/ creating exactly the same table with just two columns: the tweet ID (primay key), and the whole JSON tweet:

CREATE TABLE tweet
( tid CHARACTER VARYING NOT NULL, DATA json,
  CONSTRAINT tid_pkey PRIMARY KEY (tid) )

If you knew in advance the kind of analysis required it would be more efficient to import only those JSON entries that are required, say the field “created_at” if you were only interested in analysing traffic rates. After creating this table in PostgreSQL, let’s import the tweets. I do this using a simple Python script:

import json
import psycopg2
 
conn = psycopg2.connect("YOUR-CONNECTION-STRING")
cur = conn.cursor()
 
with open("YOURJSONFILE") as f:
  for tweetline in f:
    try:
      tweet = json.loads(tweetline)
    except ValueError, e:
      pass
    else:
      cur.execute("INSERT INTO tweet (tid, data) VALUES (%s, %s)", (tweet['id'], json.dumps(tweet), ))
 
conn.commit()
cur.close()
conn.close()

Exercise: handle the possible exceptions when executing the insert (non-unique key, for instance). Extend the script to read the file name from the command line and wrap everything in a bash script to iterate over the compressed JSON files.

I didn’t time this script, but it took a couple of minutes on the small dataset and probably around 10 minutes on the larger dataset. This could be improved by using a COPY instead of doing an INSERT, but I found this acceptable as it has to be done only once. Now that we have the data, let’s  try to stress a bit the machine. Let’s start with extracting the hourly activity in what is probably the most inefficient way one could think of: group by a substring of a JSON object converted to string, as follows:

SELECT SUBSTRING(data->> 'created_at'::text FROM 1 FOR 13), 
       COUNT(*) 
 FROM tweet GROUP BY 
   SUBSTRING(data->> 'created_at'::text FROM 1 FOR 13);

This extracts the “created_at” field from the JSON object, converts it to a string, and then it takes the substring from position 1 for 13 characters. The full “created_at” field is something like Tue Apr 07 17:21:04 +0000 2015, and the substring is therefore Tue Apr 07 17. Let’s call this on the small dataset (250K tweets) and ask psql to EXPLAIN ANALYZE the query:

explain ANALYZE select substring(data->> 'created_at'::text 
[...]
 Total runtime: 14602.845 ms

Not bad, just a bit more than 14 seconds! What about the large dataset? How long will it take on 2 million tweets?

explain ANALYZE select substring(data->> 'created_at'::text 
[...]
 Total runtime: 96553.320 ms

1 minute and a half, again pretty decent if you only run it once. I exported the result as a csv file and plotted, this is the result for the Nigeria elections:

nigeria-hourlyactivity

If you need to run multiple queries involving date and times it may be a bit boring to wait nearly 2 minutes each time; can the situation be improved? Yes, as I said above, if you know what you are looking for, then you can optimise for it. Let’s focus on the “created_at” field: instead of digging it out from the JSON object, we could alter the table and add a new column “created_at” of type timestamp and then populate it, as follows:

ALTER TABLE tweet ADD COLUMN created_at TIMESTAMP;
UPDATE tweet SET 
  created_at = to_timestamp(SUBSTRING(data->> 'created_at'::text FROM 5 FOR 9)||' 2015','
               Mon DD HH24 YYYY');

The update step will require a bit of time (some minutes, unfortunately I didn’t time it). But now the “group by” query above is executed in 458 ms (less than half a second) on 250K tweets and in 2.365 seconds on 2 million tweets.

Let’s try to extract the location of geo-tagged tweets. First of all, I count all of them on 250K tweets:

explain analyze select count(*) from tweet where data->>'geo' <> ''; 
[...] Total runtime: 14408.711 ms

For the large dataset the execution time is 147 seconds, a bit long but still acceptable for 2M tweets.

We can then extract the actual coordinates as follows:

SELECT data->'geo'->'coordinates'->0 AS lat, 
       data->'geo'->'coordinates'->1 AS lon 
  FROM tweet 
  WHERE data->>'geo' <> '';

You can run this query from the command line and generate a CSV file, as follows:

psql -t -A -F"," -c "select data->'geo'->'coordinates'->0 as lat, \
  data->'geo'->'coordinates'->1 as lon from tweet \
  where data->>'geo' <> '';" 

Exercise: redirect the output to a file, massage it a little bit to incorporate the coordinates in an HTML file using heatmap.js.

As usual, only approximately 1% of the tweets are geo-tagged. This means more or less 2,500 locations for the India’s Daughter dataset, and approximately 20K locations for the Nigerian elections. These are some plots obtained with these results.

indiasdaughter-allgeo

India’s Daughter: heatmap of geo-tagged tweets

nigeria-geotagged

Nigeria elections: heatmap (Nigeria scale)

world-nigeria-geotagged

Nigeria elections: heatmap (World view)

PostgreSQL can also be used to do text analysis. Again, we can use the approach described at https://bibhas.in/blog/postgresql-swag-part-2-indexing-json-data-type-and-full-text-search/. You can find more details about the queries used below at this other link: http://shisaa.jp/postset/postgresql-full-text-search-part-1.html. Let’s start by creating an index on the text of each tweet:

CREATE INDEX "idx_text" ON tweet USING gin(to_tsvector('english', data->>'text'));

The magic keywords here are gin (Generalized Inverted Index, which is a type of PostgreSQL index) and to_tsvector. This last function is a tokenizer for a string, and performs all the stemming using an English dictionary in this case (use the second link above if you want to know the details of all this). The index can be used to find tweets containing specific keywords, in the following way:

SELECT COUNT(tid) FROM tweet WHERE 
  to_tsvector('english', data->>'text') @@ to_tsquery('Buhari');

Notice the special operator @@ used to match a vector of tokens with a specific keyword (you could also use logical operators here!). More interestingly, we can use this approach to compute the most used words in the tweets, a typical MapReduce job:

SELECT * FROM ts_stat('SELECT to_tsvector(''english'',data->>''text'') from tweet') 
  ORDER BY nentry DESC;

This is a nested query: first we extract all the tokens, then we use the ts_stat function on all the tokens (see http://www.postgresql.org/docs/9.3/static/textsearch-features.html). The execution time is pretty reasonable: 28 seconds for 250K tweets and 168 seconds for 2M tweets. These are the top 10 words (with number of occurrences) for the India’s Daughter dataset (stemmed, with number of tweets in which they occur):

  • indiasdaught, 209773
  • india, 55879
  • ban, 49851
  • documentari, 40541
  • rape, 32269
  • bbc, 31186
  • watch, 25388
  • daughter, 19968
  • rapist, 20269
  • indian, 19417

These are the top 10 words for Nigerian elections dataset:

  • buhari, 894709
  • gej, 495743
  • jonathan, 337640
  • presid, 320757
  • gmb, 285770
  • vote, 273892
  • nigeria, 216294
  • elect, 221955
  • win, 182845
  • apc, 161968

How many tweets are retweets? We could take a shortcut and count the number of tweets starting with RT (we could also use the token RT, which is more efficient). Instead, let’s see what happens if we take the long way: we check whether the “id_str” of the “retweeted_status” field is not empty:

SELECT COUNT(*) FROM tweet 
   WHERE CAST(data->'retweeted_status'->'id_str' AS text) <> '';

The query takes a bit less than 20 seconds on 250K tweets and 124 seconds on 2M tweets. More than 162,000 tweets are retweets (64%) for the India’s Daughter dataset, and a bit more than 1,1M in the Nigerian elections dataset (56.5%).

We can also find the most retweeted tweets in each dataset. In the following inefficient query I group tweets by the id of the person being retweeted and I count the number of rows:

SELECT COUNT(*) AS nretweets, 
     MAX(CAST(data->'retweeted_status'->'user'->'screen_name' AS text)) 
        AS username, 
     CAST(data->'retweeted_status'->'id_str' AS text) AS retid, 
     MAX(CAST(data->'retweeted_status'->'text' AS text)) AS text 
  FROM tweet 
  GROUP BY retid 
  HAVING CAST(data->'retweeted_status'->'id_str' AS text) <> '' 
  ORDER BY nretweets DESC;

This is the most inefficient query so far: it takes 68 seconds on 250K tweets and 417 seconds (nearly 7 minutes) on 2M tweets.

These are the 2 most retweeted tweets among the 250K tweets on India’s Daughter:

  •  “Check out @virsanghvi’s take on #IndiasDaughter – https://t.co/BsxTkLK9Yy @adityan”, by mooziek, retweeted 1579 times
  • “Forget ban, #IndiasDaughter is must watch. Anyone who watches will understand devastation caused by regressive attitudes. Face it. Fix it.”, by chetan_bhagat, retweeted 1399 times

These are the 2 most retweeted tweets among the 2M tweets on Nigerian elections:

  • “‘Buhari’ is the fastest growing brand in Africa. RT if you agree”, by APCNigeria, retweeted 3460 time.
  • “Professor Joseph Chikelue Obi : ” The 2015 Nigerian Elections are Over. President Buhari must now quickly move into Aso Rock & Start Work “.”, by DrJosephObi, retweeted 3050 times.

Finally, let’s do a simple sentiment analysis using the vaderSentiment tool. This “is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media”. It is very easy to install using pip and very easy to use, as shown in the following piece of code:

from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
[...]
conn=psycopg2.connect("dbname='indiasdaughter'")
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute("""SELECT (data->'text') as message from tweet""")
[...]
rows = cur.fetchall()
for row in rows:
    vs = vaderSentiment(row['message'])
    print row['message'],",",vs['compound']
    print vs['pos'],",",vs['neu'],",",vs['neg']

You can compute the average sentiment by taking the average of vs['compound']. It takes 1 minute and 30 seconds to run this task on 250K tweets, whose average sentiment is -0.17 (slightly negative). It takes 11 minutes and 32 seconds to run the same code on 2M tweets about the Nigerian elections; for this set the average sentiment is (slightly positive).

In conclusion:

  • You can use PostgreSQL to perform simple twitter analysis  on a dataset of 2 million tweets.
  • I have shown some query patterns. These are not optimised, I’m sure performance can be improved in a number of ways.
  • Should you use a NoSQL database, Hadoop, MapReduce etc.? Definitely yes, so that you can learn another approach. However, if you are only interested in the results of an off-line analysis, you are already familiar with standard databases and your dataset is of a million of tweets or thereabout, then a standard PostgreSQL installation would work just fine.
  • What about MySQL? I don’t know, it would be interesting to compare the results, especially with version 5.6 and full text search.

Choose a random function from a list in Racket

185924A few days ago I asked students to use the Racket library for ASIP to control the movement of a MIRTO robot to explore an unknown area, as follows:

  • The robot should start by moving forward.
  • When the robot hits an obstacle, it should stop immediately and move backward for 0.5 seconds
  • At this point, the robot should perform a left or a right rotation (randomly), and then restart and move forward until the next obstacle.
  • As an additional feature, the amount of the rotation should be random as well, say between 0.3 and 1.5 seconds (not needed for this discussion).

A group of students wrote two functions, one to rotate left and one to rotate right, something similar to the following:

#lang racket
 
(define moveLeft
  (λ ()
    ;; code here to move left, using the 
    ;; racket-asip library
    (printf "The robot moves left\n")
    )
  )
 
(define moveRight
  (λ ()
    ;; code here to move right, using the 
    ;; racket-asip library
    (printf "The robot moves right\n")
    )
  )
 
(list-ref (list (moveLeft) (moveRight)) (random 2))

The key part here is the very last line: the students defined a list of two functions with (list (moveLeft) (moveRight)) and then used list-ref to get one element from this list at a position which is randomly 0 or 1, depending on the result of (random 2). I really liked their idea and I was very happy to see that they were thinking “in a functional style“.

There is, however, a problem with the code above: if you try to run it, the robot will move both left and right. If you copy-paste the code above in DrRacket, you will see that, when you run it, both statements “The robot moves left” and “The robot moves right” will be printed. The students called me and I suggested a solution that – I now realise – was affected by my many years of imperative-style programming… Instead of going through the solution I suggested in the lab, let’s see how we can fix the code above in a more appropriate way. Let’s start from the original problem: try the following code in DrRacket

(list (+ 1 2) (+ 2 3))

You should obtain a list with values 3 and 5. That is to say: the list contains the results of the operation. The same thing happens in the code above: writing (list (moveLeft) (moveRight)) produces a list that contains the result of invoking moveLeft and moveRight. Effectively, this means printing something on screen, because the functions have a side effect. But what is the content of the list? It is just the value returned by the two functions: in this case, it is void, because printf returns void, and as a result choosing a random value from a list of voids returns nothing!

What should been done is to build a list of references to the functions, and this is very easy: just remove the brackets around them, and the references will not be executed! Try the following in DrRacket:

(list moveLeft moveRight)

Now you should see ''(#<procedure:moveLeft> #<procedure:moveRight>): that’s correct! This is a list of two procedures! What happens if you try now to get a random value? Try the following code a few times:

(list-ref (list moveLeft moveRight) (random 2))

Sometimes you will get a reference to moveLeft, and sometimes you will get a reference to moveRight. Hey, it works! But wait, we are not done yet: we have a reference to a function, but how do we call it now? This is the easy part… If you have a function, how do you ask Racket to “execute” it? You  put it in brackets, so overall you just need to surround the list-ref command with another pair of brackets, and the reference to the function selected randomly from a list will be executed:

( (list-ref (list moveLeft moveRight) (random 2)) )

If you had an actual robot, you would see that the robot now moves sometimes left, and sometimes right. If you don’t have robot, try the code above a few times and you will see that sometimes it prints “The robot moves left”, and sometimes it prints “The robot moves right”. You can obviously extend the pattern to a list of functions of arbitrary length and obtain a much shorter code compared to using a switch statement.

Controlling an Arduino board via USB with the ASIP protocol

mirtoLast year we built the Middlesex Robotic Platform (MIRTO) using a Raspberry Pi and an Arduino Uno board. In our configuration the Arduino board is essentially a “slave” of the Raspberry Pi. A pair of HUB-ee wheel, infra-red sensors and bumper sensors are attached to the Arduino board, but all the “logic” is on the Raspberry Pi. It is the Raspberry Pi that sends the instructions to read the IR values and to set the speed of the wheels. In our configuration, the Arduino board is attached to the Raspberry Pi using a serial connection (GPIO pins on the Raspberry Pi and pin 0 and 1 on the Arduino). In MIRTO we installed Firmata on the Arduino board and we extended Firmata clients available on the Raspberry Pi. More precisely, we had to extend Firmata with specific messages for wheels (both directions: to set the speed and to read quadrature encoders) and for all the other extensions we wanted to use, such as sonar distance sensors. I think that protocols of this kind can open a lot of opportunities, by allowing the integration of platforms such as Raspberry Pi and possible multiple Arduino boards.

However, we soon realised that Firmata is not as easy to extend as we wanted, in particular it may be difficult to add one of the many available Arduino libraries for things such as NeoPixels etc.. We also found the 7-bit message encoding of Firmata messages error prone (in the sense that our code had a lot of bugs :-). For these reasons, Michael Margolis suggested the implementation of a new, simpler protocol. We wanted the protocol to be text-based and to allow easy integration of new services. We have now a working implementation of this protocol that we called the Arduino Service Interface Protocol (ASIP). Michael has developed the Arduino code, which is available at this link:

We have then written client libraries for:

(and we are working at a Python implementation).

The current implementation uses serial communication, but Michael Margolis has designed the protocol so that it can be very easily adapted to any form of streaming communication.

OK, let’s see some practical details. First of all, a quick overview of the installation process:

  1. Download the code available at https://github.com/michaelmargolis/asip; you will see that there are two directories: documents/ and asip/
  2. Copy the directory asip/ (not documents) to the library folder of your Arduino IDE. If you don’t know where this folder is, check Manual Installation at this link http://arduino.cc/en/Guide/Libraries
  3. Restart your Arduino IDE. At this point, if you click File -> Examples, you should see an asip menu. Select AsipIO to install a simple ASIP Input/Output configuration. This will allow you control digital and analog pins.
  4. Connect an Arduino board, upload the code, open the serial monitor and check that you are receiving regular message of the form
    @I,A,{...}

    (these are the analog values of analog pins).

If everything is OK on the Arduino side, you can now play with one of the libraries available. As an example, I’m going to show you how to use the Java library available at https://github.com/fraimondi/java-asip. Clone the repository and open one of the examples, for instance src/uk/ac/mdx/cs/asip/examples/SimpleBlink.java. The code is the following:

package uk.ac.mdx.cs.asip.examples;
 
import uk.ac.mdx.cs.asip.AsipClient;
import uk.ac.mdx.cs.asip.SimpleSerialBoard;
 
/* 
 * @author Franco Raimondi
 * 
 * A simple board with just the I/O services.
 * The main method does a standard blink test.
 * We extend SimpleSerialBoard but this is not
 * strictly required for this example.
 */
public class SimpleBlink extends SimpleSerialBoard {
 
	public SimpleBlink(String port) {
		super(port);
	}
 
	public static void main(String[] args) {
 
                // You need to provide the serial port to which Arduino
                // is connected. Under Win this could be COM3 or COM4,
                // check the Arduino IDE to find out.
		SimpleBlink testBoard = new SimpleBlink("/dev/tty.usbmodem1411");
 
                // We need a try/catch block to sleep the thread.
		try {
                        // This is a simple setu-up: we request
                        // port mapping for digital ports (not strictly
                        // necessary).
			Thread.sleep(1000);
			testBoard.requestPortMapping();
			Thread.sleep(500);
                        // We then set pin 13 to OUTPUT mode and pin 2 
                        // to input (pull-up) mode, even if pin 2 will
                        // not be used and therefore we could skip this.
			testBoard.setPinMode(13, AsipClient.OUTPUT);
			Thread.sleep(500);
			testBoard.setPinMode(2, AsipClient.INPUT_PULLUP);
			Thread.sleep(500);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		while(true) {
                        // As above, we need a try/catch block to sleep.
                        // Then, we just loop forever, turning on and off
                        // a LED attached to pin 13.
			try {
				testBoard.digitalWrite(13, AsipClient.HIGH);
				Thread.sleep(2000);
				testBoard.digitalWrite(13, AsipClient.LOW);
				Thread.sleep(500);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
		}
	}
}

The code should be pretty self-explanatory. It subclasses a class called SimpleSerialBoard (but this is not strictly necessary), then sets up the pin modes and loops forever writing HIGH and LOW to pin 13. You can explore other methods by opening the files LightSwitch.java and Potentiometer.java, showing how to read digital and analog input pins.

Before discussing how to add additional services, let’s have a look at the format of ASIP messages. Messages are ASCII messages, terminated by an end-of-line character. An example message to the Arduino board is I,P,13,1 where I is a character identifying a service (in this case, the Input/Output service), followed by an instruction to that service (in this case, P, meaning setting the value of a digital pin), followed by the pin number and by the number to be written.

Messages from the Arduino board are initiated by a special character: @ starts a standard message, ~ starts an error message, and ! starts debug/reporting messages. After this character, the message has a structure similar to the one above: a service ID, followed by a service-specific character, and then a list of parameters.

The Java library simply abstracts from these messages and provides easy to remember method names. The library creates a reading thread to manage incoming messages. This structure is the same in the Racket and Erlang clients. The key notion of ASIP is the one of a service. Example of services are a distance service, a motor service, a NeoPixel LED strip service, etc. Each service is identified by a single-character ID. For instance, a motor service has ID ‘M’, while a distance sensor has ID ‘D’, etc. If you want to implement a new service, you need to do two things:

  1. Implement code for the Arduino board.
  2. Implement code for a client library.

As an example of how to implement a new service, switch to branch neopixels on https://github.com/michaelmargolis/asip/ and open the file asip/utility/asipNeoPixels.h. This is the header file to define a new service to control a strip of NeoPixels LEDs. As you can see from the header file, we are defining a new class asipNeoPixelsClass that is a subclass of asipServiceClass. The new class needs to implement some abstract method of the superclass. In particular, it should define how to process messages directed to this class. Open the implementation asipNeoPixelsClass.cpp and check the method processRequestMsg: it shows how to process messages to set brightness, change the colour of a pixel, and show the LED strip. You decide what to implement here, and how. So far, we have IDs for specific operations that we want the service to perform, followed by optional parameters, all in the form of comma-separated values.

After defining this new service class, you should write Arduino code to instantiate this new service. For an example, open the file examples/AsipNeoPixels/AsipNeoPixels.ino. This file creates a firmware with support for standard I/O service, for a distance service attached to pin 4, and for two NeoPixels strips attached to pins 6 and 9. Essentially, this code creates the appropriate services, adds them to array of services to be added to the main loop, sets up the pins appropriately, and then loops by invoking the method service of AsipClass. This is the method that keeps reading incoming messages and dispatches them to the registered services, and also sends data from services to the serial port.

Upload the code above to the board and at this point the board should be ready to use: just send messages to the serial port with the appropriate ID and requests, and you will see pixels turning on and off. You can do this from the serial monitor, if you just want to test things. If you want to define a corresponding Java service for java-asip, have a look at the file src/uk/ac/mdx/cs/asip/services/NeoPixelService.java at https://github.com/fraimondi/java-asip/. This class extends a generic AsipService class and provides methods to read/write messages for NeoPixels services. As an example of application, check the file src/uk/ac/mdx/cs/asip/examples/SimpleNeoPixelWithDistance.java: the file shows how to initialise a board with all the required services and how to use the methods provided by NeoPixelService.java.

We have published a tool paper about ASIP, you can find it here:

As usual, drop me an email of leave a comment if you have questions..

Calling Java from C++ on Mac OSX Mavericks with JNI and Java 8.

For a small research project I need to invoke Java from C++. More precisely, in the C++ code I need to create an object by calling a constructor and then invoke some methods. Surprisingly, I was not able to find a lot of information on-line. There are a number of tutorials to call C/C++ from Java, but not much for the other direction and very little for Java 8 on Mac. This is a very quick summary of what I’ve done, taken mainly from http://www.ibm.com/developerworks/java/tutorials/j-jni/j-jni.html and modified appropriately. Let’s start from a simple Java class, something like this:

package com.example.simple;
 
import java.util.List;
import java.util.ArrayList;
 
public class SimpleJNITest {
  List values;
 
  public SimpleJNITest() {
    values = new ArrayList();
  }
 
  public void addValue(String v) {
    values.add(v);
  }
 
  public int getSize() {
    return values.size();
  }
 
  public void printValues() {
    for (String v: values) {
      System.out.println("Value: " + v);
    }
}

This is nothing special: a simple Java class with a constructors and a few methods to add values to a list, get the list size, and print the values of the list to standard output. Save this file to SimpleJNITest.java in the directory com/example/simple (I assume you are working at the root of this directory). Now, let’s suppose you need to do the following from a C++ file:

  • Create a new object of type SimpleJNITest.
  • Add some values to this object.
  • Retrieve the number of values currently stored.
  • Invoke the printValues method to print values to screen.

The solution is the following (comments in the code. Make sure you read all the comments carefully before asking questions!).

/* Remember to include this */
#include <jni.h>
#include <cstring>
 
/* This is a simple main file to show how to call Java from C++ */
 
int main()
{
  /* The following is a list of objects provided by JNI.
     The names should be self-explanatory...
   */
  JavaVMOption options[1]; // A list of options to build a JVM from C++
  JNIEnv *env;
  JavaVM *jvm;
  JavaVMInitArgs vm_args; // Arguments for the JVM (see below)
 
  // This will be used to reference the Java class SimpleJNITest
  jclass cls;
 
  // This will be used to reference the constructor of the class
  jmethodID constructor;
 
  // This will be used to reference the object created by calling
  // the constructor of the class above:
  jobject simpleJNITestInstance;
 
  // You may need to change this. This is relative to the location
  // of the C++ executable
  options[0].optionString = "-Djava.class.path=.:./";
 
  // Setting the arguments to create a JVM...
  memset(&vm_args, 0, sizeof(vm_args));
  vm_args.version = JNI_VERSION_1_6;
  vm_args.nOptions = 1;
  vm_args.options = options;
 
 
  long status = JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args);
 
  if (status != JNI_ERR) {
    // If there was no error, let's create a reference to the class.
    // Make sure that this is in the class path specified above in the
    // options array
    cls = env->FindClass("com/example/simple/SimpleJNITest");
 
    if(cls !=0) {
      printf("Class found \n");
      // As above, if there was no error...
 
      /* Let's build a reference to the constructor.
	 This is done using the GetMethodID of env.
	 This method takes 
	 (1) a reference to the class (cls)
	 (2) the name of the method. For a constructor, this is 
             <init>; for standard methods this would be the actual
             name of the method (see below).
	 (3) the signature of the method (its internal representation, to be precise). 
	     To get the signature the quickest way is to use javap. Just type:
             javap -s -p com/example/simple/SimpleJNITest.class
             And you will get the signatures of all the methods,
             including the constructor (check the "descriptor" field).
             In the case of the constructor, the signature is "()V", which
             means that the constructor does not take arguments and has no
             return value. See below for other examples.        
       */
      constructor = env->GetMethodID(cls, "<init>", "()V");
 
      if(constructor !=0 ) {
 
	printf("Constructor found \n");
	/* If there was no error, let's create an instance of the SimpleTestJNI by
	   calling its constructor
	*/
	jobject simpleJNITestInstance = env->NewObject(cls, constructor);
 
	if ( simpleJNITestInstance != 0 ) {
	  /* If there was no error, let's call some methods of the 
	     newly created instance
	  */
 
	  printf("Instance created \n");
 
	  // First of all, we create two Strings to be passed
	  // as argument to the addValue method.
	  jstring jstr1 = env->NewStringUTF("First string");
	  jstring jstr2 = env->NewStringUTF("Second string");
 
	  /* Then, we create a reference to the addValue method.
	     This is very similar to the reference to the constructor above.
	     As above, you can get the signature of the addValue method with javap.
	     In this case, the method takes a String as input and does not return
	     a value (V is for void)
	   */
 
	  jmethodID addValue = env->GetMethodID(cls, "addValue", "(Ljava/lang/String;)V");
 
	  /* Finally, let's call the method twice, with two different arguments.
	     (it would probably be a good idea to check for errors here... This is
	      left as a simple exercise to the student ;-).
	   */
	  env->CallObjectMethod(simpleJNITestInstance , addValue, jstr1);
	  env->CallObjectMethod(simpleJNITestInstance , addValue, jstr1);
 
	  /* Let's now call another Java method: printValues should print on screen
	     the content of the List of values in the object. The pattern is identical
	     to the one above, but no arguments are passed.
	  */
	  jmethodID printValues = env->GetMethodID(cls, "printValues", "()V");
	  env->CallObjectMethod(simpleJNITestInstance , printValues);
 
	  /* Finally, let's extract an int value from a method call. We need to
	     invoke CallIntMethod and we are going to use getSize of 
	     simpleJNITestInstance. Notice the signature: the 
	     method returns an int and it does not take arguments
	  */
	  jint listSize;
	  jmethodID getSize = env->GetMethodID(cls, "getSize", "()I");
	  listSize = env->CallIntMethod(simpleJNITestInstance , getSize);
 
	  printf("The size of the Java list is: %d\n", listSize);
 
        }
      }
      else {
	printf("I could not create constructor\n");
      }
    }
    jvm->DestroyJavaVM();
    printf("All done, bye bye!\n");
    return 0;
  }
  else
    return -1;
}

The code in itself is not too difficult. The main problem is getting the correct location of the various header files and libraries… To make the whole thing run, save the code above in a file (call it test.cpp) and do the following:

  • Compile the Java class: this is easy, just type
    javac com/example/simple/SimpleJNITest.java
  • Compile the C++ file: this is slightly more complicated because you need to specify various paths. The command on my machine (Mac OSX Mavericks, Java 1.8, gcc) is the following:
    g++  -o test \
     -I/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include \
     -I/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin \
     -L/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/server/ \
     test.cpp \
     -ljvm
    

    You can remove the backslash at the end of each line if you write the command on a single line. The two -I directives tell where to find jni.h and jni_md.h; the -L (and -l) directive are used by the linker to find libjvm.dylib. If the compilation is successful, you should get an executable called test.

  • Before running test, you need to set LD_LIBRARY_PATH with the following instruction:
    export LD_LIBRARY_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/server/
  • From the same terminal where you have executed the command above, just type ./test.

Et voila, job done, if everything is OK you should see something like this:

$ ./test 
Class found 
Constructor found 
Instance created 
Value: First string
Value: First string
The size of the Java list is: 2
All done, bye bye!

Building an action camera using a Raspberry Pi and Java

20140625_205058I’m in charge of preparing some material for the new second year “Software Development” course here at Middlesex. As part of the new Java course I thought it would be a good idea to start exploring the Raspberry Pi GPIO (General Purpose I/O) pins with Java. These pins are very easy to use in Python, but with Java they require a bit more work (not much, don’t worry and keep reading).

Instead of just doing the usual exercises with traffic lights and digital inputs, I thought that it would be a nice idea to build a more “concrete” application. As a result, I decided to build an action camera that could be mounted on my bike helmet (by pure chance soon after after GoPro IPO 30% increase…). My plan is to have:

  1. A digital input switch (to set the camera on/off)
  2. A couple of LEDs to show the status of the application
  3. The standard Raspberry camera (the quality is excellent!)
  4. A USB WiFi dongle to make the Raspberry Pi an access point. Well, I’m not planning to use the Raspberry Pi as a router, I just would like it to set up a wireless network to which one could connect with a phone or a laptop to download the videos that are captured (TODO: I would like to implement a Racket-based web server to view the videos, delete them, etc.).

The final result (camera mounted on helmet) is shown above. This is a video made with the above set-up and with an appropriate “Twinkle twinkle Little Star” tune, given the time at which it was taken:

https://www.youtube.com/watch?v=aFguIT2qkTs

This is a picture of the wiring, see below for details:

JvPiwiring

OK, let’s start. I assume you have a working Raspbian image, a wireless dongle that works with hostapd (see http://raspberry-at-home.com/hotspot-wifi-access-point/), an input switch, a couple of LEDs and some experience with Linux and, more importantly, with Java. I’m using Java 8, but version 7 should work fine as well, see http://www.rpiblog.com/2014/03/installing-oracle-jdk-8-on-raspberry-pi.html for installation instructions. First of all you need to familiarise with the GPIO pins. This is a close-up picture of the GPIO pins (with some pins connected):

JvPi-wiring-closeup

Forget about the clarity, simplicity and engineering beauty of Arduino pins…

  • First of all, there are no numbers on the pins. Check carefully the picture above and you should see “P1″ on one of them: this is the only number you’ll get on the board.
  • There are three ways (that I know) to number the GPIO pins, and in most cases numbering is not sequential (see below).
  • The numbering has changed between Revision 1 and Revision 2
  • Revision 2 has an additional set of pins (but these are only accessible on the P5 header: turn the Raspberry Pi upside down and look for small holes: this is the P5 header). In the picture above you can see two holes to the right of R2: this is the beginning of the P5 header.

Keeping all this in mind, have a look at the table available at this link: http://wiringpi.com/wp-content/uploads/2013/03/gpio1.png. The two central columns (header) provide a progressive numbering. The columns “Name” contain the labels that are called “Board” in Python GPIO (for instance: the fourth pin on the left column from the top is called GPIO-zero-seven and 0V means “ground”). The column BCM GPIO contains another numbering (this numbering has changed between revision 1 and revision 2; for instance, BCM pin 21 in revision 1 is BCM pin 27 in revision 2). Finally, there is a “WiringPi Pin” numbering and this is the one that we are going to use with Java below. If you carefully check the picture above you’ll see, from left to right that:

  • There is one green wire connected to 0 V (ground) on header 9 and a red wire connected to WiringPi pin 1 (corresponding to GPIO 01): these will  control the red LED.
  • There is a red wire connected to WiringPi pin 2 (corresponding to GPIO 02) and a yellow one to 0 V (ground) on header 14: these will control the green LED.
  • There is a white wire connected to WiringPi pin 14 (header 23) and a purple one to 0 V (ground) on header 25. These will be connected to the on/off switch using a PULL-UP resistor (more on this later).

If you didn’t give up reading and you reached this point: congratulations, we are nearly there :-). It is now time to go back to Java and the first thing you need to do is to download Pi4J (http://pi4j.com/), a  library to “provide a bridge between the native libraries and Java for full access to the Raspberry Pi“. Get the 1.0 snapshot available at https://code.google.com/p/pi4j/downloads/list and extract it somewhere. Add this location to your Java classpath when you compile and run the code below.

You are now ready to write your first Java application to control GPIO pins. Let’s start with a very simple loop to turn a LED on and off (the famous Blink example in Arduino):

import com.pi4j.io.gpio.GpioController;
import com.pi4j.io.gpio.GpioFactory;
import com.pi4j.io.gpio.GpioPinDigitalInput;
import com.pi4j.io.gpio.GpioPinDigitalOutput;
import com.pi4j.io.gpio.Pin;
import com.pi4j.io.gpio.PinPullResistance;
import com.pi4j.io.gpio.RaspiPin;
 
//[...] add your methods here, then:
  GpioController gpio = GpioFactory.getInstance();
  GpioPinDigitalOutput redLED = gpio.provisionDigitalOutputPin(RaspiPin.GPIO_01);
  while (true) {
    // Add a try/catch block around the following:
    redLED.high();
    Thread.sleep(1000);
    redLED.low();
    Thread.sleep(1000);
  }

In the code above, you first need to import a number of packages; then, you set a GPIO controller and define an output pin attached to WiringPi pin 1 (with RaspiPin.GPIO_01). Then, the infinite loop keeps turning the LED on and off. Have a look at the documentation available online for additional examples: Pi4J is really well realised and there are plenty of examples available. For our action camera we are going to connect a red LED to GPIO_01 and a green LED to GPIO_02. These are configured as output pins. We then need an input pin and, more importantly, we need to start (or stop) recording when the state of this input pin changes. Pi4J provides a very convenient interface to detect pin changes. In the following example, we first define an implementation of this interface in the OnOffListener class:

import com.pi4j.io.gpio.event.GpioPinListenerDigital;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
 
public class OnOffStateListener implements GpioPinListenerDigital {
 
        @Override
	public void handleGpioPinDigitalStateChangeEvent(GpioPinDigitalStateChangeEvent event) {
            // Just print on screen for the moment
            System.out.println("State has changed");
        }
}

We then attach this listener to an input pin, as follows:

  GpioPinDigitalInput onOffSwitch = gpio.provisionDigitalInputPin(RaspiPin.GPIO_14, PinPullResistance.PULL_UP);	
  onOffSwitch.addListener(new OnOffStateListener());

Here we first define an input pin for WiringPi pin 14 and then we attach the listener defined above to this pin. Note that I define the input with a PULL_UP resistor (if you don’t know what this means, have a look at the Arduino documentation before moving to the next step!). If you try this code, you should get a message every time the input pin changes its state.

Building the full application is now a matter of gluing together these pieces and some instructions to turn the video recording on or off at each state change of the input pin. This is the full code for the main application:

package uk.ac.mdx.cs.jvpi;
 
import com.pi4j.io.gpio.GpioController;
import com.pi4j.io.gpio.GpioFactory;
import com.pi4j.io.gpio.GpioPinDigitalInput;
import com.pi4j.io.gpio.GpioPinDigitalOutput;
import com.pi4j.io.gpio.Pin;
import com.pi4j.io.gpio.PinPullResistance;
import com.pi4j.io.gpio.RaspiPin;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
import com.pi4j.io.gpio.event.GpioPinListenerDigital;
 
public class JvPi {
 
	// This is the controller.
	private GpioController gpio;
 
	// The current pin mapping
	private static final Pin redPin =  RaspiPin.GPIO_01;
	private static final Pin greenPin = RaspiPin.GPIO_02;
	private static final Pin switchPin = RaspiPin.GPIO_14;
 
	// The pins to which we attach LEDs
	private GpioPinDigitalOutput red,green;
 
	// this is going to be an input PULL_UP, see below.
	private GpioPinDigitalInput onOffSwitch;
 
	// set to true when capturing
	private boolean capturing;
 
	// Main method
	public JvPi() {
		this.gpio = GpioFactory.getInstance();
		this.red = gpio.provisionDigitalOutputPin(redPin);
		this.green = gpio.provisionDigitalOutputPin(greenPin);
		this.onOffSwitch = gpio.provisionDigitalInputPin(switchPin, PinPullResistance.PULL_UP);
 
		// The listener takes care of turning on and off the camera and the red LED	
		onOffSwitch.addListener(new OnOffStateListener(this));
	}
 
	public boolean isCapturing() {
		return this.capturing;
	}
 
	public void toggleCapture() {
		this.capturing = !this.capturing;
	}
 
	public GpioPinDigitalOutput getRed() {
		return this.red;
	}
 
	public GpioPinDigitalOutput getGreen() {
		return this.green;
	}
 
	public static void main(String[] args) {
		JvPi jvpi = new JvPi();
		jvpi.getGreen().high();
		System.out.println("System started");
		while (true) {
			try {
				Thread.sleep(1000);
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
 
}

The main method simply creates a new instance of JvPi that, in turn, attaches a listener to the input WiringPi pin 14. This is the code for the listener:

package uk.ac.mdx.cs.jvpi;
 
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
 
import com.pi4j.io.gpio.event.GpioPinListenerDigital;
import com.pi4j.io.gpio.event.GpioPinDigitalStateChangeEvent;
 
public class OnOffStateListener implements GpioPinListenerDigital {
 
	private JvPi jvpi;
 
	private final String height = "720";
	private final String width = "960";
	private final String fps = "15";
	private final String destDir = "/home/pi/capture/";
 
	// Remember to add filename and extension!
	private final String startInstruction = "/usr/bin/raspivid -t 0 -h "+height+ " -w "+width+
			" -o "+destDir;
 
	private final String killInstruction = "killall raspivid";
 
	public OnOffStateListener(JvPi j) {
		this.jvpi = j;
	}
 
	@Override
	public void handleGpioPinDigitalStateChangeEvent(GpioPinDigitalStateChangeEvent event) {
        // display pin state on console
        if (this.jvpi.isCapturing()) {
          System.out.println("Killing raspivid");
          this.jvpi.getRed().low();
          killCapture();
        } else {
          System.out.println("Starting raspivid");
          this.jvpi.getRed().high();
          startCapture();
        }
        this.jvpi.toggleCapture();
 
    }
 
	private void killCapture() {
		executeCommand(this.killInstruction);
	}
 
	private void startCapture() {
		Date date = new Date() ;
		SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");
		String filename = this.startInstruction + "vid-"+dateFormat.format(date) + ".h264";
		executeCommand(filename);
	}
 
	private void executeCommand(String cmd) {
		Runtime r = Runtime.getRuntime();
		try {
			r.exec(cmd);
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
 
 
}

As you can see, the code invokes raspivid if it was not capturing and it kills the raspivid process if it was running (TODO: improve error checking :-)! A number of default options, such as resolution and frame rate, can be configured here. The video is recorded to a file whose name is obtained from the current system date and time.

I have used a very basic box to store everything and I have attached the box to the helmet using electric tape: this is definitely not the ideal solution, but it is good enough for a proof of concept.

As usual, drop me an email (or leave a comment) if you have questions!