News Air: 2010-12-05

Over the last few months I've been exploring different ways to view scientific articles on the iPad, summarised here. I've also made a few prototypes, either from scratch (such as my response to the PLoS iPad app) or using Sencha Touch (see Touching citations on the iPad).

Today, it's time for something a little different. The Sencha Touch framework I used earlier is huge and wasn't easy to get my head around. I was resigning myself to trying to get to grips with it when jQuery Mobile came along. Still in alpha, jQuery Mobile is very simple and elegant, and writing an app is basically a case of writing HTML (with a little Javascript here and there if needed). It has a few rough edges, but it's possible to create something usable very quickly. And, it's actually fun.

So, to learn a it more about how to use it, I decided to see if I could write a "clone" of Nature.com's iPhone app (which I reviewed earlier). Nature's app is in many ways the most interesting iOS app for articles because it doesn't treat the article as a monolithic PDF, but rather it uses the ePub format. As a result, you can view figures, tables, and references separately.

The cloneYou can see the clone here.

I've tried to mimic the basic functionality of the Nature.com app in terms of transitions between pages, display of figures, references, etc. In making this clone I've focussed on just the article display.

A web app is going to lack the speed and functionality of a native app, but is probably a lot faster to develop. It also works on a wider range of platforms. jQuery Mobile is committed to supporting a wide range of platforms, so this clone should work on platforms other than the iPad.

The Nature.com app has a lot of additional functionality apart from just displaying articles, such as list the latest articles from Nature.com journals, manage a user's bookmarks, and enable the user to buy subscriptions. Some of this functionality would be pretty easy to add to this clone, for example by consuming RSS feeds to get article lists. With a little effort one could have a simple, Web-based app to browse Nature content across a range of mobile devices.

Technical stuff

Nature's app uses the ePub format, but Nature's web site doesn't provide an option to download articles in ePub format. However, if you use a HTTP debugging proxy (such as Charles Proxy) when using Nature's app you can see the URLs needed to fetch the ePub file.

I grabbed a couple of ePub files for articles in Nature communications and unzipped them (.epub files are zip files). The iPad app is a single HTML file that uses some Ajax calls to populate the different views. One Ajax call takes the index.html that has the article text and replaces the internal and external links with calls to Javascript functions. An article's references, figure captions, and tables are stored in separate XML files, so I have some simple PHP scripts that read the XML and extract the relevant bits. Internal links (such as to figures and references) are handled by jQuery Mobile. External links are displayed within an iFrame.

There are some intellectual property issues to address. Nature isn't an Open Access journal, but some articles in Nature Communications are (under the Commons Attribution-NonCommercial-Share Alike 3.0 Unported License), so I've used two of these as examples. When it displays an article, Nature's app uses Droid fonts for the article heading. These fonts are supplied as an SVG file contained within the ePub file. Droid fonts are available under an Apache License as TrueType fonts as part of the Android SDK. I couldn't find SVG versions of the fonts in the Android SDK, so I use the TrueType fonts (see Jeffrey Zeldman's Web type news: iPhone and iPad now support TrueType font embedding. This is huge.). Oh, and I "borrowed" some of the CSS from the style.css file that comes with each ePub file.

This week saw the release of two tools from the Biodiversity Heritage Library, CiteBank and the BHL-Europe portal. Both have actually been quietly around for a while, but were only publicly announced last week.

In developing a new tool there are several questions to ask. Does something already exist that meets my needs? If it doesn't exist, can I build it using an existing framework, or do I need to start from scratch? As a developer it's awfully tempting sometimes to build something from scratch (I'm certainly guilty of this). Sometimes a more sensible approach is to build on something that already exists, particularly if what you are building upon is well supported. This is one of the attractions of Drupal, which underlies CiteBank and Scratchpads. In my own work I've used Semantic Mediawiki to support editable, versioned databases, rather than roll my own. Perhaps the more difficult question for a developer is whether they need to build anything at all. What if there are tools already out there that, if not exacty what you want, are close enough (or most likely will be by the time you finish your own tool).

CiteBank

CiteBank is an open access platform to aggregate citations for biodiversity publications and deliver access to biodiversity related articles. CiteBank aggregates links to content from digital libraries, publishers, and other bibliographic systems in order to provide a single point of access to the world’s biodiversity literature, including content created by its community of users. CiteBank is a project of the Biodiversity Heritage Library (BHL).

I have two reactions to CiteBank. Firstly, Drupal's bibliographic tools really suck, and secondly, why do we need this? As I've argued earlier (see Mendeley, BHL, and the "Bibliography of Life"), I can't see the rationale for having CiteBank separate from an existing bibliographic database such as Mendeley or Zotero. These tools are more mature, better supported, and address user needs beyond simply building lists of papers (e.g., citing papers when writing manuscripts).

For me, one of BHL's goals should be integrating the literature they have scanned into mainstream scientific literature, which means finding articles, assigning DOIs, and becoming in effect a digital publishing platform (like BioOne or JSTOR). Getting to this point will require managing and cleaning metadata for many thousands of articles and books. It seems to me that you want to gather this metadata from as many sources as possible, and expose it to as many eyes (and algorithms) as possible to help tidy it up. I think this is a clear case of it being better to use an existing tool (such as Mendeley), rather than build a new one. If a good fraction of the world's taxonomists shared their person bibliographies on Mendeley we'd pretty much have the world's taxonomic literature in one place, without really trying.

BHL-Europe

It's early days for BHL-Europe, and they've taken the "lets use an existing framework" approach, basing the BHL-Europe portal on DISMARC, the later being a EU-funded project to "encourage and support the interoperability of music related data".

BHL-Europe is the kind of web site only its developers could love. It's spectacularly ugly, and a classic example of what digital libraries came up with while Google was quietly eating their lunch. Here's the web site showing search results for "Zonosaurus":

Yuck! Why do these things have to be so ugly?. DISMARC was designed to store metadata about digital objects, specifically music. Look at commercial music interfaces such as iTunes, Spotify, and Last.fm. Or even academic projects such as mSpace.

To be useful BHL-Europe really needs to provide an interface that reflects what its users care about, for example taxonomic names, classification, and geography. It can't treat scientific literature as a bunch of lifeless metadata objects (but then again, DISMARC managed to do this for music).

Where next?
CiteBank and BHL-Europe seem further additions to the worthy but ultimately deeply unsatisfying attempts to improve access biodiversity literature. To date our field has failed to get to grips with aggregating metadata (outside of the library setting), creating social networks around that aggregation, and providing intuitive interfaces that enable users to search and browse productively. These are big challenges. I'd like to see the resources that we have put to better use, rather than being used to build tools where suitable alternatives already exist (CiteBank), or used to shoe horn data into generic tools that are unspeakably ugly (BHL-Europe portal) and not fit for purpose. Let's not reinvent the wheel, and let's not try and convince ourselves that squares make perfectly good wheels.

News Air

Viewing scientific articles on the iPad: cloning the Nature.com iPhone app using jQuery Mobile

First thoughts on CiteBank and BHL-Europe

Feedjit

My Blog List