Wikipedia paper out

cover-medium.jpgMy short note on "Wikipedia as an Encyclopaedia of Life" has appeared in Organisms Diversity & Evolution (doi:10.1007/s13127-010-0028-9) (yes, I do occasionally write papers). A preprint of this paper is available on Nature Precedings (hdl: 10101/npre.2010.4242.1).

My presentation at iEvoBio covers much the same ground, and is included below, although the paper was written before I made the mapping from NCBI taxa to Wikipedia pages.


Show me the trees! Playing with the TreeBASE API

Being in an unusually constructive mood, I've spent the last couple of days playing with the TreeBASE II API, in an effort to find out how hard it would be to replace TreeBASE's frankly ghastly interface.

After some hair pulling and bad language I've got something to work. It's very crude, but gives a glimpse at what can be done. If you visit http://iphylo.org/~rpage/mytreebase/ and enter a taxon name, my code paddles off and queries TreeBASE to see if it has any phylogenies for that taxon. Gears grind, RSS feeds are crunched, a triple store is populated, NEXUS files are grabbed and Newick trees extracted, small creatures are needlessly harmed, and at last some phylogeny thumbnails are rendered in SVG (based on code I mentioned earlier), grouped by study. Functionality is limited (you can't click on the trees to make them bigger, for example), and the bibliographic information TreeBASE stores for studies is a bit ropey, but you get the idea.

mytreebase.png

What I'm looking for at this stage is a very simple interface that answers the question "show me the trees", which I think is the most basic question you can ask of TreeBASE (and one its own web interface makes unnecessarily hard). I've also gained some inspiration from the BioText search engine.

If you want to give it a try, here are some examples. These examples should be fairly responsive as the data is cached, but if you try searching for other taxa you may have a bit of a wait while my code talks to TreeBASE.



ZooKeys publishes articles of the future

The open access taxonomic journal ZooKeys has published a special issue with four papers, each available in HTML, PDF, and XML, the later being extensively marked up. Penev et al. ("Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples", doi:10.3897/zookeys.50.538) describes the process involved in creating these XML files. Two papers (doi:10.3897/zookeys.50.506 and doi:10.3897/zookeys.50.505) were created using authoring tools available in Scratchpads, as outlined by Blagoderov et al. ("Streamlining taxonomic publication: a working example with Scratchpads and ZooKeys", doi:10.3897/zookeys.50.539). When you view the HTMl for these articles you can toggle on or off the highlighting citations, taxonomic names, and geographic co-ordinates. Mousing over a taxonomic name, for example, a popup appears with links to GBIF, NCBI, EOL, BHL, Wikipedia, etc.):

brake.png

I think these papers represent one view of the future of scientific publishing ("article 2.0"), and I'm flattered that Penev et al. cite my Elsevier challenge work (doi:10.1016/j.websem.2010.03.004, preprint at hdl:10101/npre.2009.3173.1) as one of the sources of inspiration (along with the landmark Shotton et al. "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article" doi:10.1371/journal.pcbi.1000361, which I've discussed previously). It is also good to see the TaxPub XML schema used by a publisher, and Scratchpads being a part of the process of publishing taxonomic information.

Deep linking

My initial impression is that there is huge of potential here, although I think there is still lots to do. I'm not totally convinced that popups are they way to go (although I've dabbled with them as well), and we need to move beyond simply linking to other sites to a deeper form of integration. For example, a Zookeys article might link to BHL via a taxonomic name, but how about deeper linking? For example, the paper by Brake and von Tschirnhaus (doi:10.3897/zookeys.50.505) contains the following citations:

Biró L (1899) Commensalismus bei Fliegen. Természetrajzi füzetek 22: 198–204.

Kertész K (1899) Verzeichnis einiger, von L. Biró in Neu-Guinea und am Malayischen Archipel gesammelten Dipteren. Természetrajzi füzetek 22: 173–19

Neither reference has any links in the HTML, so the user is under the impression that they aren't available online, but both references have been scanned by BHL. You can see full text for these articles in BioStor (references 52005 and 52004, respectively -- note that the pagination for Biró 1899 is given incorrectly in the paper). This is one area where BHL has a lot to offer publishers, and it would be great to see BHL provide the services publishers need to add these links to their articles.

This integration should go both ways. It's odd that the paper by Brake and von Tschirnhaus contains LSID used by the ZooBank for this paper (urn:lsid:zoobank.org:pub:DABB03F4-A128-43BB-990C-02F25D656B00, see the <self-uri> tag in the XML), but ZooBank doesn't know about the DOI for the paper, hence the ZooBank page for this article has no link to the article itself. It's time to join this stuff together.

What's next?

What I'd really like to see is article XML repurposed as, say, RDF, and used to populate a database so that we can query it. In this way we can start to atomise the article into useful parts, and recombine them in new and interesting ways. Might be something to play with over the summer.

On a practical level, I'm somewhat bemused by the variety of XML formats being used by open access publishers. PLoS use version 2.0 of the NLM Journal Archiving and Interchange Tag Suite, and I wrote a XSLT style sheet to transform PLoS articles for viewing on an iPad. TaxPub is based on version 3.0 of the NLM DTD, which breaks quite a bit of my code relating to citations, so I'll have to tweak this to get it to display Zookeys articles correctly. Handling TaxPub itself will also require some additional work. Then there are the BMC journals, which have their own flavour of XML (based on something called the "KETON DTD"). It's all a bit messy. But I guess it'd be no fun if it was too easy...


iEvoBio: where to find out what went on

large.png
Now that I'm back in Glasgow, albeit rather jet-lagged, time for a quick summary of the first iEvoBio meeting, held at the Evolution meetings. I thought the meeting went very well, but perhaps I should leave that judgement to others. Meantime, if you want to see what the fuss was about, here are some ways to catch up.

Slides
Presentations from iEvoBio are going up at SlideShare, including the great keynotes by Jonathan Eisen and Rob Guralnick.

Abstracts
Abstracts and some presentations are going up at Nature Precedings, where you can add comments, and vote on abstracts you particularly like.

Challenge
We had five entries for the visualisation challenge. The audience voted for a clear winner, but second place was a dead heat, so we chose to split the second place prize money. Here are the entries:

PlaceScreen shotLink
1stbox.pngPhyloBox
2nd equaljsphylo.pngjsPhyloSVG
2nd equalgengis.pngGenGIS
nexplorer.pngEOL tree viewer
nexplorer.pngNexplorer

The entries gave live demos and participated in the software bazaar so people could get to play with them hands on. Doing live demos is brave, especially if you have a Twitter client on, as Andrew Hill discovered:


tweet.png


Photos
What would Jesus sequence?

Photos tagged with "ievobio10" are going up at Flickr, including my photo of Jonathan Eisen's "what would Jesus sequence?" t-shirt.

Twitter
You can follow the iEvoBio tweet stream by searching for ievobio at Twitter. I've grabbed this tweet stream and hope to do something interesting with it when I get the chance. One message that seems clear is that having keynote speakers who have a big Twitter presence is a great help in generating buzz.



The organising committee has a lot to digest as we reflect on the meeting, but personally I really liked the variety of formats (keynotes, short talks, lightning talks, software bazaar, and birds of a feather), the shortness of the meeting (2 days), and the fact that everything was in one place (no jumping between concurrent sessions). Feel free to add your thoughts below.