News Air: 2010-05-09

Viewing a BioStor reference in Cooliris

Cooliris is a web browser plugin that can display a large number of images as a moving "infinite" wall. It's Friday, so for fun I added a media RSS feed to BioStor to make the BHL page scans available to Cooliris. The result is easier to show than describe, so take a peek at the video I made of A review of the Centrolenid frogs of Ecuador, with descriptions of new species (http://biostor.org/reference/20844):

Cooliris view of BioStor from Roderic Page on Vimeo.

Cooliris is a little flaky under Snow Leopard, but still works (the plug-in is cross platform). It is also available for the iPhone (and I'm assuming the iPad), which means you can get the experience on a mobile device.

Linking biodiversity data

Time for a Friday folly. I've made a clunky screencast showing an example of linking biodiversity data together, using bioGUID as the universal wrapper around various data sources. I started with GenBank sequence EF013683, added another, EF013555, then explored some links (specimen, publication, taxon, journal), using the OpenLink RDF Browser:

Linking biodiversity data from Roderic Page on Vimeo.

You can try the URIs I used in the linked data browser of your choice:

The demo is a bit clunky, partly because the linked data browser is generic. What we really need is a browser that is tailored to displaying the kind of data we're interested, and hides the gory details under the hood. But the goal is to show that, once everything we care about has a resolvable URI that provides data in a consistent form, and we re-use identifiers, then we can glue stuff together with relative ease. In principle we can simply crawl this web of data (you can append other DOIs, ISSNs, and Genbank accession numbers to http://bioguid.info and get RDF to your heart's content).

None of this is particularly new, we've had RDF in biodiversity informatics for at least five years, there are various linked data-style projects, such as GeoSpecies and the first iteration of bioGUID, and some people (such as Roger Hyam) have been pushing HTTP URIs + RDF for a while, but we seem remarkably unable to get traction on this. Notably, no major biodiversity provider provides RDF (by major I mean GenBank or GBIF size). We make diagrams like the one I drew for GBIF last year, we make the case that linking is a Good Thing™, and yet nothing much happens. This suggests that the idea is still not be presented in a compelling enough fashion. Certainly, clunky demos like the one above probably won't help much. Linked Data clients are generally pretty awful things to use. I think we're going to need some compelling applications that really grab people's attention.

Drawing a phylogeny in a web browser using the canvas element

Some serious displacement activity. I'm toying with adding phylogenies to iSpecies, probably sourced from the PhyLoTA browser. This raises the issue of how to display trees on a web page. PhyLoTA itself uses bitmap images, such as this one:

but I'd like to avoid bitmaps. I toyed with using SVG, but that has it's own series of issues (it basically has to be served as a separate file). So, I've spent a couple of hours playing with the <canvas> element. This enables some quite nice drawing to be down in a browser window, without plugins, SVG, or Flash. I wrote a quick PHP script to parse a Newick tree and draw it using <canvas>. It's really pretty simple, and the results are quite nice:

One minor gotcha is interacting with the diagram (this is one advantage of SVG). Turns out we need a hack, so I've used the trick of a blank, transparent GIF and a usemap (see Greg Houston's Canvas Pie Chart with Tooltips). The picture above is a screen shot, you can see a live example here.

Why we need wikis

I've just spent a frustrating few minutes trying to find a reference in BioStor. The reference in question is

Heller, Edmund 1901. Papers from the Hopkins Stanford Galapagos Expedition, 1898-1899. WIV. Reptiles. Proc. Biol. Soc. Washington 14: 39-98

and comes from the Reptile Database page for the gecko Phyllodactylus gilberti HELLER, 1903 . This is primary database for reptile taxonomy, and supplies the Catalogue of Life, which repeats this reference verbatim.
Thing is, this reference doesn't exist! Page 39 of Proc. Biol. Soc. Washington volume 14 is the start of Gerrit S Miller (1901) A new dormouse from Italy. Proc Biol Soc Washington 14: 39-40.
After much fussing with trying diferent volumes and dates for Proc. Biol. Soc. Washington, I searched BHL for Phyllodactylus gilberti, and discovered that this name was published in Proceedings of the Washington Academy of Sciences:

Edmund Heller (1903) Papers from the Hopkins Stanford Galapagos Expedition, 1898-1899. XIV. Reptiles. Proceedings of the Washington Academy of Sciences 5: 39-98

(see http://biostor.org/reference/20322). Three errors (wrong journal, wrong date, minor typo in title), but enough to break the link between a name and the primary source for that name.

Anybody who demands authoritative, expert-vetted resources, and thinks the Catalgoue of Life is a shining example of this needs to think again. Our databases are riddled with errors, which are repackaged over and over again, yet these would be so easy to fix if they were opened up and made easy to edit. It's time to get serious about wikis.

Referring to a one-degree square in RDF using c-squares

I'm in the midst of rebuilding iSpecies (my mash-up of Wikipedia, NCBI, GBIF, Yahoo, and Google search results) with the aim of outputting the results in RDF. The goal is to convert iSpecies from a pretty crude "on-the-fly" mash-up to a triple store where results are cached and can be queried in interesting ways. Why? Partly because I think such a triple store is an obvious way to underpin a "biodiversity hub" of the kind envisaged by PLoS (see my earlier post).

As ever, once one embarks down the RDF route (and I've been here before), one hits all the classic stumbling blocks, such as "what URI do I use for a thing?", and "what vocabulary do I use to express relationships between things?". For example, I'd like to represent the geographic distribution of a taxon as depicted on a GBIF map. How do I describe this in a RDF document?

To make this concrete, take one of my favourite animals, the New Zealand mud crab Helice crassa. Here's the GBIF map for this taxon:

This map has the URL (I kid you not):


http://ogc.gbif.org/wms?request=GetMap
&bgcolor=0x666698
&styles=,,,
&layers=gbif:country_fill,gbif:tabDensityLayer,gbif:country_borders,gbif:country_names
&srs=EPSG:4326
&filter=()(
%3CFilter%3E
%3CPropertyIsEqualTo%3E
%3CPropertyName%3Eurl
%3C/PropertyName%3E
%3CLiteral%3E
%3C![CDATA[http%3A%2F%2Fdata.gbif.org%2Fmaplayer%2Ftaxon%2F17462693%2FtenDeg%2F-45%2F160%2F]]%3E
%3C/Literal%3E
%3C/PropertyIsEqualTo%3E
%3C/Filter%3E)()()
&width=721
&height=362
&Format=image/png
&bbox=160,-45,180,-35

(or http://bit.ly/cuTFW9, if you prefer). Now, there's no way I'm using this URL! Plus, the URL identifies an image, not the distribution.

But, if we look at the map we see that it is made of 1° × 1° squares. If each of those had a URI then I could simply list those squares as the distribution of the crab. This seems straightforward as GBIF has a service that provides these squares. For example, the URL http://data.gbif.org/species/17462693 (where 17462693 corresponds to Helice crassa) returns:


MINX	MINY	MAXX	MAXY	DENSITY
167.0	-45.0	168.0	-44.0	5
174.0	-42.0	175.0	-41.0	20
174.0	-38.0	175.0	-37.0	17
174.0	-37.0	175.0	-36.0	4

These are the 1° × 1° squares for which there are records of Helice crassa. Now, what I'd like to do is have a URI for each square, and I'd like to do this without reinventing the wheel. I've come across a URI space for points of the globe (the WGS 84 Geographic Point URI Space"), but not one for polygons. Then it dawned on me that perhaps c-squares, developed by Tony Rees at the CSIRO in Australia, would do the trick¹. To quote Tony:

C-squares is a system for storage, querying, display, and exchange of "spatial data" locations and extents in a simple, text-based, human- and machine- readable format. It uses numbered (coded) squares on the earth's surface measured in degrees (or fractions of degrees) of latitude and longitude as fundamental units of spatial information, which can then be quoted as single squares (similar to a "global postcode") in which one or more data points are located, or be built up into strings of codes to represent a wide variety of shapes and sizes of spatial data "footprints".

C-squares appeal partly (and this says nothing good about me) because they have a slightly Byzantine syntax. However, they are short, and quite easy to calculate. I'll let the reader find out the gory details. To give an example, my home town, Auckland, has latitude -36.84, longitude 174.74, which corresponds to the 1° × 1° c-square with the code 3317:364.

Now, all I need to do is convert c-squares into URIs. If you append the c-square to http://bioguid.info/csquare:, like this, http://bioguid.info/csquare:3317:364, you get a linked data-friendly URI for the c-square. In a web browser you get a simple web page like this:

A linked data client will get RDF, like this:


<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
   xmlns:dcterms="http://purl.org/dc/terms/" 
   xmlns:dwc="http://rs.tdwg.org/dwc/terms/" 
   xmlns:geom="http://fabl.net/vocabularies/geometry/1.1/">
   <dcterms:Location rdf:about="http://bioguid.info/csquare:3307:364">
      <rdfs:label>3307:364</rdfs:label>
      <geom:xmin>74</geom:xmin>
      <geom:ymin>-37</geom:ymin>
      <geom:xmax>75</geom:xmax>
      <geom:ymax>-36</geom:ymax>
      <dwc:footprintWKT>POLYGON((-37 75,-37 74,-36 74,-36 75,-37 75))</dwc:footprintWKT>
   </dcterms:Location>
</rdf:RDF>

Now, I can refer to each square by it's own URI. This will also enable me to query a triple store by c-square (e.g., what other taxa occur within this 1° × 1° square?).

Tony Rees had emailed me about this in response to a tweet about URIs for co-ordinates, but it took me a while to realise how useful c-square notation could be.

News Air

Viewing a BioStor reference in Cooliris

Linking biodiversity data

Drawing a phylogeny in a web browser using the canvas element

Why we need wikis

Referring to a one-degree square in RDF using c-squares

Feedjit

My Blog List