Linked data part 2

Continuing the Friday folly theme, below is a screencast of a linked data browser that uses the same ideas as last week's screencast, but uses a custom browser I've written to display the results in a more user-friendly way.

Linking the data together from Roderic Page on Vimeo.



The demo is live, you can view it at http://iphylo.org/~rpage/browser/www/uri/http://bioguid.info/doi:10.1371/journal.pone.0001787. Under the hood the browser uses bioGUID as the primary linked data provider (although it should consume any valid linked data source, for example Dbpedia). The data is stored in a local triple store (ARC), and the web interface is created by transforming SPARQL queries into HTML using XSLT. You can add data to it by editing the URL in the browser location bar and reloading the page, or entering a URL on the page. Linked data URLs be entered next to the Browse button as is, e.g. http://dbpedia.org/resource/Euphausia, or appended to http://iphylo.org/~rpage/browser/www, e.g. http://http://iphylo.org/~rpage/browser/www/http://dbpedia.org/resource/Euphausi. Other identifiers, such as DOIs, PubMed ids, and specimens need to be resolved via bioGUID, e.g. http://iphylo.org/~rpage/browser/www/uri/http://bioguid.info/gi:86161637.

All still very crude, but I hope you get the idea.

NCBI Taxonomy IDs and Wikipedia

Wikipedia-logo-v2-en.png
36388.gif

I've written a note on the Wikipedia Taxobox page making the case for adding NCBI taxonomy IDs to the standard Taxobox used to summarise information about a taxon. Here is what I wrote:

Wikipedia's taxon pages have a huge web presence (see my blog post Google and Wikipedia revisited and Page, R. D. M. (2010). "Wikipedia as an encyclopaedia of life". Nature Precedings hdl:10101/npre.2010.4242.1). If a taxon is in Wikipedia it is almost always the first search result in Google. Researchers in other areas of biology are making use of a Wikipedia as a tool to annotate genes Gene Wiki and RNA families Wikipedia:WikiProject_RNA, respectively. Pages for genes, such as Cytochrome_b, have numerous external identifiers in their equivalent of the Taxobox (the Pfam_box). I think we are missing a huge opportunity by not including NCBI taxonomy ids. The advantages would be:

  • It would provide a valuable service to Wikipedia readers by enabling them to go to NCBI to discover more about a taxon

  • It would help Wikipedia contributors by providing a standardised way to refer to NCBI (and enable bots to add missing NCBI taxonomy ids). Putting them in an External links section makes it harder to be consistent (there are various ways to write a URL linking to the NCBI taxonomy)

  • It would facilitate linking from NCBI to Wikipedia. A mapping of Wikipedia pages to NCBI taxonomy ids could be added to NCBI Linkout, generating more traffic to the Wikipedia pages

  • Projects that are trying to integrate information from different sources would be able to combine information of genomics from NCBI with other information much more readily

Note that I am not arguing that Wikipedia should "follow" NCBI taxonomy, merely that where the potential to link exists, the links would create value, both within and outside the Wikipedia community.

Some discussion has ensued on the Taxobox page, all positive. I'm blogging this here to encourage anyone who as any more thoughts on the matter to contribute to the discussion.