On being open: Mendeley and open data versus open source

Paulo Nuin, not the biggest fan of Mendeley wrote a blog post entitled Mendeley is going to be open source, in which he wrote:

After extensively researching some material online, analysing many blog posts and statements made by people linked to Mendeley, checking my sources, I reached the conclusion that soon Mendeley is going to be open source.

Among the essays Paulo read is Jason Hoyt's post on the Mendeley blog: Dear researcher, which side of history will you be on?. In response to a question about open sourcing the Mendeley client, Jason replied:
I get asked a lot about open sourcing Mendeley when I go to speaking events. I always state that we are open to the possibility, but then ask how many people know how to type a URL verus how many know how to program in C++? That’s why we went with the Open API first instead of open sourcing the desktop software. If you can type a URL, which is what the API is based upon, then you can build on top of Mendeley. You don’t need to know how to program.

Despite the fact that open sourcing the desktop client is the second most requested feature for Mendeley, I think Jason is right. I also think Paulo's campaign to make Mendeley open source is misguided. The client doesn't matter. OK, yes, it's probably the reason most people use Mendeley, but there are lots of competing clients (EndNote, Zotero, Papers, etc.), and there are several bibliographic data formats (RIS, EndNote XML, BibTeX) and essentially one document format (PDF) that they support, so individual users don't have to worry about locking their individual bibliographies into a proprietary format. Couple this with the existence of an API (albeit a pretty crap one), and whether an individual software client is closed or open source doesn't matter much.

Will the data be open?

However, what makes Mendeley different is the aggregation of bibliographic data (35 million references and counting).

privacy_1264073462.jpg


I'd argue it's the fate of this aggregation that matters. In a comment on the Guardian's piece Mendeley 'most likely to change the world for the better', Jane Good wrote:
"World-changing potential"? This utopian fantasy stuff is a little much, no? After all, we're talking about a for-profit corporation using closed-source software to monitor private usage habits for monetary gain. And how exactly is this company meant to sustain its millions of dollars of annual burn on a few measly storage subscriptions? At some point the data will have to go up for sale to the highest bidder, plain and simple. The API, as it exists now, does not provide access to that data, and it probably never will, right, DrGinn[sic]?

Toning down the rhetoric, the question is Mendeley, Scopus, Talis – will you be making your data Open?:

But how can a company create an income stream from Open Scientific content? That’s the a question for me for this decade. If we can solve it we can transform the world. If however the linked Open data are all going to be through paywalls, portals, query engines then we regress into the feudal information possession of the past. I hope the companies present in this session can help solve this. It won’t be easy but it has to be done. So I now ask Mendeley, Elsevier/Scopus, Talis: Are your data Openly available for re-use?

For me the question of whether the source code for the Mendeley desktop will be made open source is a red herring, and ultimately a distraction from the real question — will the data be open?

Navigating the Encyclopedia of Life tree on the desktop and the iPhone

This week seems to be API week. The Encyclopedia of Life API Beta Test has been out since August 12th. By comparison with the Mendeley API that I've spent rather too much time trying to get to grips with, the EOL API release seems rather understated.

However, I've spent the last couple of days playing with it in order to build a simple tree navigating widget, which you can view at http://iphylo.org/~rpage/eoltree/.

The widget resembles Aaron Thompson's Taxonomy (formerly called KPCOFGS) iPhone app in that it uses the iPhone table view to list all the taxa at a given level in a taxonomic tree. Clicking on a row in this table takes you to the descendants of the corresponding taxon, clicking "Back" takes you back up the tree. if you've reached a leave node (typically a species) the widget displays a snippet of information about that taxon. It also resembles Javier de la Torre's taxonomic browser written in Flex.

Here's a screen shot of the widget running in a desktop web browser:

insecta.png

Here's the same widget in the iPhone web browser:

web.pngUsing the API
The EOL API is pretty straightforward. I call the http://www.eol.org/api/docs/hierarchy_entries API to get the tree rooted at a given node, then populate each child of that node using http://www.eol.org/api/docs/pages. The result is a simple JSON file that I cache locally to speed up performance and avoid hitting the EOL servers for the same information. because I'm locally caching the API calls I need a couple of PHP scripts to do this, but everything else is HTML and Javascript.

iPhone and iPad
I've not really developed this for the iPhone. I've cobbled together some crude Javascript to simulate some iPhone-like effects, but if I was serious about the phone I'd look into one of the Javascript kits available for iPhone development. However, I did want something that was similar in size to the iPhone screen. The reason is I'm looking at adding taxonomic browsing to the geographic browser I described in the post Browsing a digital library using a map, so I wanted something easy to use but which didn't take up too much space. In the same way that the Pygmybrowse tree viewer I played with in 2006 was a solution to viewing a tree on a small screen, I think developing for the iPhone forces you to strip things down to the bare essentials.

I'm also keeping the iPad in mind. In portrait mode some apps display lists in a popover like this:

popover_flatten.png

This popover takes up a similar amount of screen space to the entire iPhone screen, so if I was to have a web app (or native app) that had taxonomic navigation, I'd want it to be about the size of the iPhone.

Let me know what you think. Meantime I need to think about bolting this onto the map browser, and providing a combined taxonomic and geographic perspective on a set of documents,

Mendeley API PHP client

mendeley.pngFollowing on from my earlier post about the Mendeley API, I've bundled up my code for OAuth access to the Mendeley API for anyone who's interested in playing with the API using PHP. You can browse the code on Google Code, or grab a tarball here. You'll need a consumer key and a consumer secret from Mendeley for the demos to work, and if you're behind a HTTP proxy you'll have to tweak the code (this is explained in the ReadMe.txt file that comes with the code).

The code is pretty rough, and doesn't use all the Mendeley API calls, but I've other things to do, and it felt like a case of either bundle this up now, or it will get lost among a host of other projects. The Mendeley API still feels woefully under-developed. I'd be more interested in developing this client further if the API was powerful enough to do the kinds of things I'd like to do.

Browsing a digital library using a map

Every so often I revisit the idea of browsing a collection of documents (or specimens, or phylogenies) geographically. It's one thing to display a map of localities for single document (as I did most recently for Zootaxa), it's quite another to browse a large collection.

Today I finally bit the bullet and put something together, which you can see at http://biostor.org/maps/. The website comprises a Google Map showing localities extracted from papers in BioStor, and a list of the papers that have one or more points visible on the map.

mapbrowser.png


In building this I hit a few obstacles. The first is the number of localities involved. I've extracted several thousand point localities from articles in BioStor. Displaying all these on a Google Map is going to be tedious. Fortunately, there's a wonderful library called MarkerCluster, part of the google-maps-utility-library-v3 that handles this problem. MarkerCluster cluster together markers based on zoom level. If you zoom out the markers cluster together, as you zoom in these clusters will start to resolve into their component points. Very, very cool.

The second challenge was to have the list of references update automatically as we move around or zoom in and out on the map. To do this I need to know the bounding box currently being displayed in the map, I can then query the MySQL database underlying BioStor for the localities within the bounding box, using MySQL's spatial extensions. The query is easy enough to implement using ajax, but the trick was knowing when to call it. Initially, listening for the bounds_changed event seemed a good idea. However, this event is fired as the map is being moved (i.e., if the user is panning or dragging the map a whole series of bounds_changed events are fired), whereas what I want is something that signals that the user has stopped moving the map, at which point I can query the database for articles that correspond to the region that map is currently displaying. Turns out that the event I need to listen for is idle (see Issue 1371: map.bounds_changed event fires repeatedly when the map is moving), so I have a function that captures that event and loads the corresponding set of articles.

Another "gotcha" occurs when the region being viewed crosses longitude 180° (or -180°) (see diagram below from http://georss.org/Encodings).

179-rule.jpg


In this case the polygon used to query MySQL would be incorrectly interpreted, so I create two polygons, each with 180° or -180° as one of the boundaries, and merge the articles with points in either of those two polygons.

I've made a short video showing the map in action. Although I've implemented this for BioStor, the code is actually pretty generic, and could easily be adapted to other cases where we want to navigate through a set of objects geographically.


Viewing scientific articles on the iPad: the PLoS Reader

Continuing on from my previous post Viewing scientific articles on the iPad: towards a universal article reader, here are some brief notes on the PLoS iPad app that I've previously been critical of.

There are two key things to note about this app. The first is that it uses the page turning metaphor. The article is displayed as a PDF, a page at a time, and the user swipes the page to turn it over. Hence, the app is simulating paper on the iPad screen.

turn.jpg


But perhaps more interesting is that, unlike the Nature app discussed earlier, the PLoS app doesn't use a custom API to retrieve articles. Instead the app uses RSS feeds from the PLoS site. PLoS provides journal-specific RSS feeds, as well as subject-specific feeds within journals (see, for example, the PLoS ONE home page). The PLoS Reader app takes these feeds and uses them to create a list of articles the reader can choose from.

A nice feature of the PLoS ATOM feeds is the provision of links to alternative formats for the article (unlike many journal RSS feeds, which provide just a DOI or a URL). For example, the feed item for the article "Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" doi:10.1371/journal.pone.0012303 contains links to the PDF and XML versions of the article:


<link rel="related"
type="application/pdf"
href="http://www.plosone.org/article/fetchObjectAttachment.action?uri=info:doi/10.1371/journal.pone.0012303&representation=PDF"
title="(PDF) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />
<link rel="related"
type="text/xml"
href="http://www.plosone.org/article/fetchObjectAttachment.action?uri=info:doi/10.1371/journal.pone.0012303&representation=XML"
title="(XML) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />


This makes the task of an article reader much easier. Rather than attempt to screen scrape the article web page, or rely on a rule for constructing the link to the desired file, the feed provides an explicit URL to the different available formats.

I've not seen this feature in other journal RSS feeds, although article web pages sometimes provide this information. BMC journals, for example, provide <link rel="alternate"> tags in the web page for each article, from which we can extract links to the XML and PDF versions, and some journals (BMC included) provide the Google Scholar metadata data tag <meta name="citation_pdf_url"> to link to the PDF. Hence, a generic article reader will need to be able to extract metadata tags from article web pages as it seeks formats suitable to display.

TreeView X now on Google Code

tv.pngTreeView X, the open source version of TreeView, has been slowly suffering bit rot as C++ compilers and operating systems change. Every so often I'd tweak the code to build on some Linux version or other, but this isn't something I've a lot of time for. Moreover, because of the hassle of rebuilding binaries and source tar balls the updated versions weren't uploaded to the TreeView X web site. Charles Plessy has been doing a nice job of keeping a Debian package working, but my own code wouldn't build on newer versions of Linux.

Today I managed to get my code to build on Fedora Core 13 using gcc 4.4.4 and the latest stable version of wxWidgets (2.8.11). The program runs OK, although there are some display glitches, and printing seems broken. However, my days of standalone application development are over, so I thought I should make the code accessible to anyone else who may find it of use. The code is now hosted on Google Code.

Viewing scientific articles on the iPad: towards a universal article reader

There are a growing number of applications for viewing scientific articles coming out for the iPhone and iPad. I'm toying with extending the experiments described in an earlier post when I took the PLoS iPad app to task for being essentially a PDF page-turner, so I thought I should take a more detailed look at the currently available apps. In particular, I'm interested in how the apps solve some basic tasks, and whether there is a consistent "vocabulary" for interacting with an article. Put less pretentiously, do the apps display things such as lists of articles, citations, references, figures, and bibliographic data in similar ways, or does the user have to learn new rules for each app? I'm also interested in how the apps treat the article (e.g., as a monolithic PDF, as a document with pages, or as a web document where pagination has no meaning), and how they get their content (from a publisher, from the user's social network, from the user's personal library).

Nature
In this post I'm going to look at Nature.com's app. Future posts will explore other apps. I'm interested in what people have done so far, and how we could improve the reading experience. Long term I'm interested in whether there's scope for a "universal article reader" that can take diverse formats (including XML, PDF, and page images) and display them in a consistent and useful way. In the diagrams below I'm using touch gesture symbols from Graffletopia (see Touch Gesture Reference Guide).



Contents
Nature's app is limited to articles published by Nature, and displays the available articles as a list with thumbnails of a figure from the article. The app fetches this list using Nature's mobile API. Up until April 30th the fulltext of an article was free, at at present you are limited to getting abstracts. It's interesting that the list of articles isn't retrieved using a RSS feed, I presume because Nature wanted to use some simple authentication to avoid users downloading all their closed-access content for free.

touch1.jpg

Article display
Nature's app, unlike all the others I've seen so far, doesn't use PDFs. Instead it uses ePub. Unlike many ePub book readers (including Apple's own iBooks), the Nature app doesn't render the article as a series of pages, but as one continuous document that you scroll down by dragging (it's essentially a web page). You can't zoom the text, but the text size is fine for reading.

touch2.jpg

Citations
Citations in the body of the article are links. If you tap them the full citation slides in from the right, with a link to the publisher's website. If you tap the link the app opens the website within the app. This can be a little jarring as you move from a customised view of an article to a web page designed for a desktop. In the case of a Nature article, it would be more elegant if the app recognised that the cited reference was a Nature article and rendered it natively in the app. More generally the transition between app and website might be less jarring if journal publishers developed mobile versions of their websites.

touch3.jpg


Figures
The figures aren't displayed directly in the body of the article, but each mention of a figure in the body of the text is a link. Tapping the link causes the figure to slide up from the bottom of the screen. A button in the top right hand corner enables you to toggle between displaying the figure and it's caption (shown as white text on a black background). You can use pinch and spread to zoom in and out of the figure, as well as save it to the photo library on your device.

touch4.jpg


Summary
I've started with the Nature app as I think it's the only one so far to seriously tackle the challenge of displaying an article on a mobile device. Instead of displaying PDFs it repackages the articles in ePub format and the result is much more interactive than a PDF.

I hope to explore other article viewing apps in later posts, but it's worth noting that we should also be looking at other apps for ideas. Personally I really like the Guardian's iPhone app, which I use as my main news reader. It has a nice gallery feature to display thumbnails of images (imagine a gallery of an article's figures), and uses tags effectively.