Monday 3 June 2013

Badly-coded affiliations: a too long-standing curse


  A webinar on the Repository Junction Broker (RJB) Project being presently carried out at EDINA National Data Centre in Edinburgh was delivered last week by Muriel Mewissen, RJ Broker Project manager. The RJ Broker is a SWORD-based tool for automated content delivery into institutional repositories which will identify target IRs by associating the co-authors' affiliations to their institution's platform (where available).

In the course of this RSP-organised event, Muriel shared some slides with an analysis of the preliminary content transfers the RJ Broker has performed so far. The first RJB-mediated transfer test involved processing in excess of 60,000 Europe PubMed Central articles and delivering them into the (mock) worldwide repository network.

EuropePMC is a solid disciplinary platform for the biosciences, whose content is often delivered straight from publishers. The platform's contents do usually feature good-quality metadata as a result, and EuropePMC provides thus a good example for testing research article transfer. Moreover, the specific EuropePMC article set selected for this test was remarkably modern. However, the statistical figures Muriel presented for the RJ Broker's ability to resolve author's affiliations in EuropePMC articles were simply astonishing (see figure below): author's affiliations were badly coded for over half the transferred articles' metadata.



This is a well-known issue the PEER project also had to deal with at the time. Institutions have been telling their authors since ages to try to harmonise their affiliation when signing their papers, but it's still very frequent to find affiliations such as Department of Psychology, Compton Rd or Radiology Unit, Hearts Lane which are literally impossible to process by the RJ Broker since they lack their main affiliation node.

A large collective effort needs to be done in order to provide the means for somehow tackling this long-standing issue once and for all, and ORCID looks a very promising initiative in this regard. If it were somehow possible to have author's affiliations coded into their ORCID iDs – something ORCID is actually aiming to do – the rate of miscoded affiliations could be expected to rapidly drop as a result.

Very much like the author identification, this is of course a huge challenge no-one has so far been able to tackle, and ORCID faces a lot of hard work in order to find a way to attack the miscoded affiliation issue. But there is currently much talk in the community about organisational IDs and having some system put in place that will hopefully provide the means to start solving this seemingly unsolvable difficulty. The research information management community badly needs ORCID to succeed in this challenge if it is to be able to ever start building the eagerly awaited service layer on top of the infrastructure one.