GrandIR Blog: 2013

Sunday, 22 September 2013

An attempt to provide new services to the repository network in the UK: the UK RepositoryNet+ Project

A while ago I was asked to write a brief note on the RepNet project for a the 'ThinkEPI Notes', a Spanish series of short updates on recent developments on the area of libraries and technology. Since it's a rather long text with a significant number of hyperlinks in it, I have chosen to offer it online from this blog as well so that readers may find it easier to read than via a message in a mail list. The text below is in Spanish as a result – but I shall try to provide an English translation as soon as I'm able to.

Un ensayo para el desarrollo de servicios para repositorios en el Reino Unido: el proyecto UK RepositoryNet+

El texto de esta nota está también disponible en la web de ThinkEPI.

Después de la ya tardía Declaracion de la Alhambra (mayo 2010), continúan llegando en estos días desde España noticias sobre nuevas declaraciones en apoyo del acceso abierto a nivel institucional. Aunque no completamente desprovistas de utilidad –especialmente si redundan en una mejor dotación de medios técnicos y humanos para los equipos que tratan de implantar los objetivos citados en dichos textos– estas declaraciones carecen de sentido si se limitan a ser meras expresiones de apoyo a una iniciativa próxima a cumplir diez años desde su lanzamiento [1]. Una vez que como fruto del trabajo de muchos profesionales en las bibliotecas universitarias y de centros de investigación de todo el mundo se ha alcanzado un grado de consolidación de la red de repositorios de acceso abierto que no admite vuelta atrás, el siguiente paso es aventurarse en el desarrollo de servicios sobre esa capa de infraestructura que atiendan a las necesidades de académicos e investigadores y de sus instituciones. Este es el espíritu que ha guiado el devenir del proyecto UK RepositoryNet+ en el Reino Unido [2], que se autodefine como "una iniciativa para la creación de una infraestructura socio-técnica que soporte el depósito, la curación y la difusión en acceso abierto de la literatura de investigación".

Mucho se ha hablado en este último año de la "errónea apuesta del Gobierno Británico por un modelo insostenible de acceso abierto 'dorado' (Gold Open Access) financiado mediante cuotas por procesamiento de artículos detraídas de los magros presupuestos disponibles para la investigación". Sin pretender que dicha afirmación sea completamente errónea, es preciso tener también en cuenta la cuantiosa inversión (con cifras de siete dígitos en libras esterlinas) realizada simultáneamente en una investigación sobre las vías de consolidación de la ruta verde y los repositorios de acceso abierto sin parangón en Europa [3] a través de este proyecto RepNet, apenas mencionado por contra en las acaloradas discusiones "Gold vs Green" que vienen teniendo lugar desde hace algún tiempo en las listas de distribucion de la disciplina.

Esto se debe principalmente al hecho de que, frente a la simplicidad de una política de acceso abierto concreta que es fácil juzgar y aprobar o condenar, el análisis de un proyecto tan complejo como RepNet requiere un conocimiento profundo de los retos técnicos que plantean los diferentes servicios para repositorios y de los enfoques adoptados para resolverlos por los equipos encargados de su desarrollo. De esta manera, aunque prácticamente ausente de las –frecuentemente bizantinas– discusiones entre los abogados del acceso abierto, RepNet ha sido por el contrario muy comentado y debatido por la comunidad de 'repository managers' en el Reino Unido, que es la encargada de implantar las a menudo cambiantes, cuando no contradictorias, políticas emanadas desde las distintas instancias administrativas a nivel institucional, regional o nacional.

Tal como se presenta en la página principal del proyecto, el desarrollo de servicios sobre la capa de repositorios se sustenta sobre un análisis previo de las necesidades de los diferentes actores implicados (instituciones, agencias de financiación, investigadores...) y sobre la definición de una serie de áreas de trabajo en las cuales es perentorio proporcionar nuevas funcionalidades para garantizar la continuidad de los repositorios de acceso abierto en un momento en el que las exigencias para cumplir con los requisitos de aportación de información científica que plantea el Research Excellence Framework (REF) –el ejercicio de evaluación científica que se llevará a cabo en el Reino Unido en 2014– hacen que muchas instituciones hayan optado por adquirir e implantar sistemas CRIS que a menudo amenazan con reemplazar a los repositorios de acceso abierto, pese a basarse en un enfoque mucho más centrado en la gestión de información científica que en el acceso abierto como tal [4].

Las áreas de actividad de RepNet a nivel de identificación, diseño, desarrollo e implantación de servicios para repositorios son las siguientes:

1. Agregación de Contenidos. En este ámbito, RepNet propone la construcción de un agregador de contenidos de toda la red de repositorios del país. A diferencia de muchos otros países en los que esta funcionalidad existe desde hace tiempo, en el Reino Unido no se ha consolidado ninguna de las diferentes iniciativas que han desarrollado prototipos para la agregación de contenidos. Esta desventaja a nivel de infraestructura tiene la contrapartida de que una plataforma contruida en este momento puede ofrecer funcionalidades mucho más avanzadas que las que poseen las plataformas desarrolladas con anterioridad, tales como la minería de datos sobre los textos completos de los documentos archivados con asignación automática de descriptores, la detección e integración de duplicados a partir de una estrategia similar de análisis del texto completo de los contenidos y la detección de registros metadata-only (sin texto completo asociado) incluso aunque contengan un archivo PDF por defecto o 'default dummy file' para indicar que el texto completo no está disponible. Teniendo en cuenta que la adopción de las directrices DRIVER ha sido muy escasa en el Reino Unido (lo que ha llevado a su vez a niveles de cumplimiento inusitadamente bajos de los estándares de OpenAIRE), una agregación puede ofrecer una novedosa funcionalidad de validación de esquemas de metadatos, aplicando criterios muy avanzados como los de detección de las versiones de los articulos archivados o la agregación de información de financiación de los trabajos.

Workflow ITIL para la incubación de servicios en RepNet

2. Generación de Informes y Comparativa de Plataformas. En el area de 'reporting', RepNet viene operando el proyecto IRUS-UK [5] siguiendo un modelo común de incubación de servicios externalizados de acuerdo con la metodología ITIL [6]. IRUS-UK es un proyecto desarrollado en el Centro de Datos MIMAS de la Universidad de Manchester para recolectar estadísticas de uso de múltiples repositorios armonizadas de acuerdo con el estándar COUNTER. A mediados de septiembre de 2013, IRUS-UK recoge y agrega datos de 40 repositorios institucionales –lo que supone aproximadamente un tercio de la red nacional– y continúa extendiendo su cobertura, limitada por el momento a EPrints (29) y DSpace (11) en tanto el equipo de desarrollo trabaja en el módulo de intercambio de datos para Fedora y otras plataformas. Además de permitir la comparación para diferentes plataformas y tipos de documentos, el objetivo de IRUS-UK es obtener una estimación de las estadísticas de uso agregadas para toda la red, en la confianza de que los niveles de uso globales resultarán un argumento convincente para garantizar la utilización continuada de la misma por parte de autores e instituciones.

3. Deposito Automático de Contenidos. El proyecto Repository Junction Broker (RJB) es una iniciativa desarrollada en el EDINA National Data Centre para la transferencia automatizada de contenidos a la red de repositorios a través del protocolo SWORD. Después de varios años de trabajo, el proyecto RJB se incluyó como parte de los servicios a prestar por parte de RepNet, y ha sido bajo este paraguas cuando ha comenzado a funcionar como servicio en fase piloto desde mediados de este año [7]. RJB pretende consolidar una base de proveedores de contenido, fundamentalmente a nivel de artículos de revista, que puedan ser distribuidos, bien como registros sólo de metadatos o como metadatos+texto completo, a los diversos repositorios institucionales correspondientes a las afiliaciones de los autores de cada artículo concreto. En un principio, el RJ Broker ha firmado acuerdos con el repositorio temático EuropePMC y con Nature Publishing Group para distribuir los contenidos de ambos proveedores como proyecto piloto (el primero de ellos según el modelo 'metadata-only' y el segundo transfiriendo metadata+full-text, lo que requiere el compromiso expreso por parte de los repositorios receptores de no difundir los textos completos antes de la fecha de embargo). Un aspecto clave de la operación de este servicio es su naturaleza internacional por defecto: dado que los autores de los artículos son con frecuencia internacionales, basta con que los repositorios institucionales susceptibles de recibir información esten registrados con el servicio para que automáticamente puedan recibir los contenidos (previa instalación de SWORD) con independencia del país en el que esten ubicados.

Servicio RJB para la distribución automática de contenidos

4. Enriquecimiento de Metadatos. El area de Metadata Enhancement es posiblemente la más amplia de las que aborda el proyecto RepNet. Fruto de las investigaciones previas sobre necesidades de los diferentes ámbitos implicados, se puso de manifiesto la existencia de estrategias para la asignación de metadatos puestas en práctica por repositorios aislados (por ejemplo en el ámbito de la preservación de contenidos) que no se difundían al resto de la red. Vista la necesidad de armonizar el desarollo de toda la red al compás, se puso en marcha la iniciativa RIOXX [8] para el desarrollo e implantacion de un 'application profile' que permitiera la incorporación conjunta de metadatos sobre financiación (algo que ya abordaba OpenAIRE para los proyectos FP7), sobre aspectos específicos relativos al acceso y sobre identificadores como ORCID. Las iniciativas preliminares para la incorporación de estos metadatos avanzados a los repositorios han comenzado a difundirse recientemente [9] de modo que puedan gradualmente adoptarse de manera conjunta por parte de toda la red.

5. Registro de Repositorios. Los dos principales directorios de repositorios existentes en la actualidad, OpenDOAR y ROAR, mantenidos respectivamente por las universidades de Nottingham y Southampton, aportan una información más que aceptable sobre la red mundial de repositorios. Sin embargo, ninguno de ambos proporciona una cobertura completa de la red. Por este motivo, y también para actualizar el perfil que los directorios proporcionan sobre las plataformas que indexan, se ha puesto en marcha como parte de RepNet el proyecto Open Access Repository Registry (OARR) [10]. Este proyecto pretende actualizar la informacion de OpenDOAR cubriendo en mayor detalle las características de los repositorios, en un momento en que tanto la implantación generalizada de sistemas CRIS como el creciente numero de repositorios de datos de investigación estan introduciendo cambios significativos en el sector. El nuevo directorio, cuyo proyecto lidera el equipo CRC-SHERPA en la Universidad de Nottingham, se alojará eventualmente en los servidores de RepNet junto a otros servicios proporcionados por SHERPA tales como RoMEO, JULIET o más recientemente, FACT. De hecho, una de las líneas para el diseño de nuevos servicios para repositorios pasa por explotar las sinergias entre estas aplicaciones gestionadas de manera integrada.

6. Localización de la Información. Una de las cuestiones más problemáticas de los repositorios hace referencia a la escasa visibilidad de sus contenidos en la red. Junto a la creación de esquemas de metadatos suficientemente comprensivos que puedan servir los propósitos de la 'discoverability', la línea de trabajo orientada a la mejora de la visibilidad de los contenidos pretende sobre todo optimizar los ratios de indexación de los materiales archivados en la red de repositorios del Reino Unido por parte de motores de búsqueda como Google Scholar o Microsoft Academic Search. Sea a través de la identificación de buenas prácticas a nivel de repositorio individual o bien a través de la indexación masiva de una agregación de contenidos [11], es preciso mejorar la visibilidad de los contenidos de los repositorios en la red, así como identificar su procedencia de modo que el usuario final de la información pueda conocer y valorar la labor realizada desde estas plataformas.

7. Preservación/Continuidad de Acceso. Sin entrar directamente en el área de la preservación digital, cubierta por otros programas y proyectos del Jisc como SPRUCE [12], el proyecto RepNet sí se planteó en cambio ofrecer alguna clase de servicio para la red de repositorios en el sentido de asegurar la continuidad de acceso a los contenidos archivados en la misma. Para ello, RepNet trabaja sobre la extensión a los materiales archivados en acceso abierto del modelo LOCKSS, ya empleado con éxito para la gestión de la continuidad en el acceso a materiales obtenidos a traves de suscripción por parte de las bibliotecas [13]. Este modelo se basa en el archivo periódico de los contenidos en una red de servidores distribuidos (las 'LOCKSS Boxes') gestionada por las instituciones.

Servicios de nueva creación Además del énfasis en la integración y ulterior desarrollo de los servicios para repositorios ya existentes, el proyecto RepNet pretende también abordar el diseño, desarrollo e implantación de una serie de nuevos servicios. Para ello, RepNet adopta el modelo para la construcción de una infraestructura (de servicios) basada en datos o 'data-driven infrastructure' [14] que permita plantear la puesta en marcha de servicios de nueva creación largamente demandados por la comunidad, tales como herramientas para la monitorización del cumplimiento de mandatos de acceso abierto. La creación de nuevos servicios se lleva a cabo mediante el establecimiento de partnerships con instituciones concretas que permitan el ensayo y testeo de desarrollos piloto. Así, la iniciativa STARS [15] llevada a cabo en colaboración con la Universidad de St Andrews y el Scottish Digital Library Consortium (SDLC) se ha planteado como una prueba piloto para la implantación del conjunto de servicios que una iniciativa como RepNet puede ofrecer a una institución y un repositorio específicos.

Referencias

[1] La Declaración de Berlín, publicada por la Sociedad Max Planck en octubre de 2003, puede considerarse razonablemente como el pistoletazo de salida del movimiento del acceso abierto con la opción que ofrecía a organismos académicos y de investigación para suscribirla de manera institucional. De hecho, la Semana de Acceso Abierto se celebra anualmente en el mes de octubre como conmemoración de la publicación de esta Declaración.

[2] Proyecto UK RepositoryNet+ (comúnmente conocido como "RepNet"), http://repositorynet.ac.uk/

[3] Sólo el proyecto europeo OpenAIRE plantea un nivel de objetivos de similar amplitud y ambición a los de RepNet a nivel de servicios a desarrollar sobre la red de repositorios de acceso abierto existente en la actualidad.

[4] En relación con el impacto sobre las instituciones del ejercicio de recopilación de información científica para el REF2014, véase la excelente presentación 'I am turning enterprisey' realizada por Chris Keene ('repository manager' en la Universidad de Sussex) en la reciente conferencia Repository Fringe 2013 celebrada en Edimburgo el pasado mes de agosto.

[5] Institutional Repository Usage Statistics (IRUS-UK), http://irus.mimas.ac.uk/

[6] Ver referencia a ITIL en la sección de preguntas frecuentes de RepNet, http://www.repositorynet.ac.uk/?q=content/faq

[7] “RJ Broker delivers its first test transfers”, http://bit.ly/16dsJmq

[8] "RIOXX: Developing Repository Metadata Guidelines", http://bit.ly/18hlzQW

[9] Nixon, W.J., Ashworth, S., and McCutcheon, V. (2013) “Enlighten: Research and APC funding workflows at the University of Glasgow”. Insights: the UKSG journal, 26 (2). pp. 159-167. ISSN 2048-7754 (doi:10.1629/2048-7754.80), http://eprints.gla.ac.uk/83882/

[10] Open Access Repository Registry (OARR), http://bit.ly/LeXGjp

[11] Kenning Arlitsch, Patrick S. O'Brien, (2012) "Invisible institutional repositories: Addressing the low indexing ratios of IRs in Google Scholar", Library Hi Tech, Vol. 30 Iss: 1 pp. 60-81, DOI: 10.1108/07378831211213210

[12] Sustainable Preservation Using Community Engagement (SPRUCE), http://bit.ly/1aXY1Vd

[13] UK LOCKSS Alliance Case Studies Now Available, http://bit.ly/18Gkw9j

[14] Informe “Preparing for Data-driven Infrastructure”, http://bit.ly/1a8SuXe

[15] Pablo de Castro, Jackie Proven, “The STARS Shared Initiative: Delivering Repository Services in an Advanced CRIS/IR Environment”. Presentación en el RepositoryFringe 2013, http://slidesha.re/1eXH2nI

Monday, 3 June 2013

Badly-coded affiliations: a too long-standing curse

A webinar on the Repository Junction Broker (RJB) Project being presently carried out at EDINA National Data Centre in Edinburgh was delivered last week by Muriel Mewissen, RJ Broker Project manager. The RJ Broker is a SWORD-based tool for automated content delivery into institutional repositories which will identify target IRs by associating the co-authors' affiliations to their institution's platform (where available).

In the course of this RSP-organised event, Muriel shared some slides with an analysis of the preliminary content transfers the RJ Broker has performed so far. The first RJB-mediated transfer test involved processing in excess of 60,000 Europe PubMed Central articles and delivering them into the (mock) worldwide repository network.

EuropePMC is a solid disciplinary platform for the biosciences, whose content is often delivered straight from publishers. The platform's contents do usually feature good-quality metadata as a result, and EuropePMC provides thus a good example for testing research article transfer. Moreover, the specific EuropePMC article set selected for this test was remarkably modern. However, the statistical figures Muriel presented for the RJ Broker's ability to resolve author's affiliations in EuropePMC articles were simply astonishing (see figure below): author's affiliations were badly coded for over half the transferred articles' metadata.

This is a well-known issue the PEER project also had to deal with at the time. Institutions have been telling their authors since ages to try to harmonise their affiliation when signing their papers, but it's still very frequent to find affiliations such as Department of Psychology, Compton Rd or Radiology Unit, Hearts Lane which are literally impossible to process by the RJ Broker since they lack their main affiliation node.

A large collective effort needs to be done in order to provide the means for somehow tackling this long-standing issue once and for all, and ORCID looks a very promising initiative in this regard. If it were somehow possible to have author's affiliations coded into their ORCID iDs – something ORCID is actually aiming to do – the rate of miscoded affiliations could be expected to rapidly drop as a result.

Very much like the author identification, this is of course a huge challenge no-one has so far been able to tackle, and ORCID faces a lot of hard work in order to find a way to attack the miscoded affiliation issue. But there is currently much talk in the community about organisational IDs and having some system put in place that will hopefully provide the means to start solving this seemingly unsolvable difficulty. The research information management community badly needs ORCID to succeed in this challenge if it is to be able to ever start building the eagerly awaited service layer on top of the infrastructure one.

Friday, 24 May 2013

It takes two to tango: a few post-ORCID Outreach meeting reflections

After listening to the presentations delivered at the ORCID Outreach meeting held yesterday at St Anne's College in Oxford, the impression remains that this promising initiative resembles a ball game played by publishers (and the like) on one side of the pitch and researchers and institutions on the other one. In order to enjoy a reasonably amusing game, you need the two sides to be sufficiently balanced. But this is not the case for ORCID - or at least it's not the case so far.

The 'publisher side' - for simplicity purposes - features not just publishers, but also large commercial stakeholders such as Thomson Reuters, CRIS vendors and a wide range of third-party companies. This side is delivering an excellent performance so far by solving all the (otherwise not too complicated) technical challenges posed by the use of ORCID for populating submission systems or CRISes. However, the other side is not doing so well at the moment. One could expect an ORCID deluge to arrive from researchers and institutions interested in becoming ORCID members for providing iDs to all their staff. But this is not happening, or at least not as quickly as the other side is progressing. Which leaves us with fully prepared technical systems and no incoming stream of ORCID iDs to test them and prove their benefits to the research community.

It is true that there are over 140,000 registered authors in the ORCID database as of today. How many of those registered themselves and proceeded to populate their publications into their ORCID account it is impossible to know thus far. But after listening to Paul Peters's presentation on the huge advocacy campaign carried out by a 10-strong team at Hindawi HQs, it's easy to see that many ORCIDs out there are the result of the 'publisher side' work too (oh but wait, we may have some approximate stats on the provenance of ORCID accounts based on the highly-correlated number of visits to the ORCID website).

The argument held by patient observers (which one shares to some extent) says the game re-balancing will eventually happen, but institutions need more time to react - and once they start, the contribution from their side will become unstoppable. Institutions need to figure out their business models and their mechanisms for involving their researchers and all their relevant units in the process for creating and populating ORCID accounts.

Other critical observers -the institutions themselves- seem however not to completely share this approach. In their view, ORCID should be made available as a free service to them, since they're the ones expected to do the hard work anyway. A significant number of stakeholders argue that ORCID needn't become an overcomplicated platform aiming to achieve too many goals at the same time, but rather focus on the basic functionality, namely providing a unique identifier for researchers. Then again, it may not be that simple at all: features like researcher affiliation pose a huge challenge themselves that must be dealt with for offering really useful information from the ORCID iDs.

It becomes evident at some point that best practices are badly needed in ORCID implementation at institutional level so that the advantages of having their ORCID iDs institutionally created (and even maintained) can start to be perceived by researchers. So it's just about getting a critical mass of member institutions in different countries that will pioneer the adoption process - and hopefully receive some credit for it from the community, as they are dealing with the issues in a much harder early adopting way than those institutions that will follow suit.

Thursday, 9 May 2013

Are publishers "the enemy"?

This interesting issue came up (again) at the Author ID Tutorial delivered within the 4th COAR Annual Meeting in Istanbul - and it might be useful to devote a couple of reflections to it here. This Author ID Tutorial was jointly delivered on May 8th by Titia van der Werf from OCLC and myself as part of an attractive set of four tutorials at the COAR event - with the selected topics for the tutorials being as good a hint on the way things are evolving around repositories as the workshops themselves.

ORCID was big on the Author ID tutorial, to the extent that the timeschedule for the activity had to be updated on the spot in order to make room for the large number of questions and reflections prompted by the ORCID presentation. I'd like to address one of these questions more thoroughly here, namely the reluctant attitude some very qualified colleagues show towards ORCID due to the fact that the initiative seems very much publisher-driven - this making it probably not that interesting for the scholarly community.

This is again about the antagonism between publishers and the academia, and about whether both communities may at some point overcome such antagonism - real or perceived, it does not make much difference - in order to jointly work for pursuing a common benefit. This discussion is certainly interesting since it goes to the heart of a critical issue that has traditionally prevented a deeper implementation of Open Access, namely the fact that both publishers and Open Access community see each other as "the enemy". Mike Taylor - to mention just one inspiring example - regularly writes in an eloquent fashion about the reasons why the scholarly community may consider publishers to be the enemy of knowledge dissemination. However, same way as a certain degree of (informal) agreement was reached at the COAR event that the fight between advocates of Green and Gold OA is a pointless diversion of energy and will only harm their common objective, it could very much be argued that making emphasis on the differences and the misbehaviours over the good practices in collaboration may result in blocking win-win cooperation opportunities.

It is true that publishers such as Elsevier and databases such as the TR Web of Science or Scopus are a big driver behind ORCID - although the fact that over 130,000 researchers worldwide have chosen to individually register their ORCIDs as of May 3rd should not be overlooked either. It is evident too that a widely implemented successful persistent author identifier scheme will benefit publishers very much - but it will benefit institutions and especially authors even more. There was again an agreement at the author ID tutorial that this is something that needs to be done, and when examining the wide range of previous attempts to achieve the goal of author and work identification and disambiguation, it becomes clear that having publishers involved in the initiative provides it a significantly larger chance of succeeding.

I have repeatedly written here about the encouraging effort the EC-funded PEER project did in bringing together publishers and Open Access repositories and how advisable it would be to try to further explore opportunities for collaboration - some of which are indeed being exploited, see for instance Wiley's direct involvement in the JISC-funded PREPARDE project for research data publishing. ORCID is certainly one of these opportunities and with all due respect to constructive dissent, it would be an exercise in shortsightedness to let it slip away.

Sunday, 31 March 2013

Could the so-called Gold Rush result in Green reinforcement? (II)

A post was published last December at the UKCoRR blog examining the question of whether Green Open Access could become mainstream at Higher Education Institutions (HEIs) as a result of the policies resulting from the Finch report and aiming to drive the scholarly communication model towards a Gold OA-based one. Buiding on the discusions held at the webinar "The Role of Institutional Repositories after the Finch Report" organised by the Repositories Support Project earlier that month, the post highlighted the role IR managers were to play in explaining the different options for policy compliance at HEIs and the relevant role deposit into institutional repositories would acquire as a result of the economic impossibility to make the whole institutional research output available via Gold Open Access.

A few months later, at a time when the RCUK Open Access policy is about to come into effect, preliminary strategies for ensuring compliance are being designed at HEIs. Driven by the RCUK policy statement that "The RCUK OA Block Grant is principally to support the payment of APCs. However, Research Organisations have the flexibility to use the block grant in the manner they consider will best deliver the RCUK Policy on Open Access, as long as the primary purpose to support the payment of APCs is fulfilled", institutions are wisely investing part of the Block Grant funding on enhancing their Green Open Access infrastructure (including human resources) and making sure their institutional repository will be ready to provide support for Open Access dissemination purposes to all researchers whose publications are not awarded Gold OA funding.

In an even more inspiring realisation of this leveraging policy, the Spanish National Research Council (CSIC) released last week the requirements it will apply for authors to be eligible for Gold Open Access funding (Spanish only). With the caveat that "due to limited resources, just one article per author will be allowed per year", these include the need to deposit the author's research outputs published in the last three years into the Digital.CSIC institutional repository in three months time since the funding for the payment of APCs has been awarded.

Compliance monitorisation is becoming a key concern at HEIs as a result of these policies and attempts at having pilot systems in place for ensuring the reporting tools for policy compliance are available will shortly be carried out at pioneering institutions. In the meantime the whole move towards Gold and Green Open Access remains a daring experiment whose outcome -including the way researchers in different domains are willing to follow the policy guidelines- will be very interesting to follow in upcoming months. The Global Research Council meeting in Berlin next May 2013 will provide a good opportunity to agree on an international action plan for implementing Open Access to Publications – Open Access implementation is one of only two items on the agenda.

Tuesday, 26 March 2013

Primer proceso de creación automática de ORCIDs a nivel institucional

Con fecha 25 de marzo se ha realizado desde la Universidad de Oviedo (UniOvi) el primer ensayo exitoso de creación automática de ORCIDs para autores institucionales desde la Biblioteca. El proceso consistió en la ingestión de una modesta primera tanda de 10 ficheros XML de autores UniOvi en la API de ORCID en producción. Como resultado de este proceso se crearon 9 perfiles ORCID, y un décimo fue identificado como un potencial duplicado y se reportó como tal al administrador. A continuación se ofrece una breve descripción del proceso que condujo a este resultado.

El proceso

Tras un intenso esfuerzo de difusión de ORCID en el país, la Universidad de Oviedo devino el pasado mes de diciembre el primer miembro institucional de ORCID en España. Una vez firmado el acuerdo con ORCID, UniOvi decidió que serían la Biblioteca y su Jefe del Servicio de Información Bibliográfica María Luisa Alvarez de Toledo las responsables de la adopción institucional de ORCID en la Universidad. Además de apoyarse en el servicio de soporte técnico de ORCID – muchas gracias a Catalina Oyler en este sentido – la Biblioteca UniOvi decidió contar también con el apoyo de GrandIR para este propósito. GrandIR había organizado la sesión técnica sobre ORCID el anterior mes de septiembre y estaba muy involucrada en la difusión y la adopción de ORCID, de modo que esta colaboración se perfilaba como una buena oportunidad para poner a trabajar los conocimientos adquiridos en el proceso.

El primer paso en el camino hacia la adopción institucional de ORCID por parte de UniOvi fue definir una estrategia para la creación institucional de ORCIDs y su implantación en los sistemas de gestión de la información científica de la Universidad. La Biblioteca UniOvi mantiene un registro de todos los autores institucionales en una tabla en la que figuran también sus firmas más frecuentes e identificadores tales como ScopusID o ResearcherID – que con frecuencia son asimismo gestionados directamente desde la Biblioteca. Esta tabla se empleó para generar ficheros XML de los autores UniOvi listos para introducir en la API de ORCID.

A continuación se realizó una etapa de testeo: a partir de los ficheros XML se generó una serie de perfiles ORCID de prueba desde la línea de comandos del entorno de pruebas OAuth de ORCID. Estos ensayos fueron exitosos y permitieron testear la configuración particular de los XMLs en aspectos tales como la codificación de caracteres o el uso de caracteres especiales característicos de la lengua española. Sin embargo, la necesidad de operar desde la línea de comandos hacia que la creación de los perfiles ORCID resultara un proceso muy lento, que podía valer para crear ORCIDs para unos pocos autores, pero no para la generación de perfiles para el conjunto de autores de la institución. Se decidió entonces desarrollar una aplicación que permitiera crear ORCIDs de manera automática para un gran número de autores, tarea que se encomendó a GrandIR. Unas semanas después el primer prototipo estaba disponible para realizar pruebas 'en real' sobre el entorno de producción de ORCID. Estas pruebas arrojaron como resultado la introducción de 10 ficheros XML de autores UniOvi en la API de ORCID y la creación automática de 9 nuevos perfiles ORCID. La mayor parte de estos perfiles se encuentra aún pendiente de ser reclamada por los autores - y de hecho el ritmo de reclamación de los perfiles es uno de los aspectos que la Biblioteca está monitorizando antes de planificar ulteriores estrategias de difusión de ORCID a nivel interno.

Los retos

A lo largo del proceso que ha llevado a la creación automática de ORCIDs se ha resuelto toda una serie de retos. El principal entre ellos se deriva de ser la Universidad de Oviedo la primera institución en el mundo que ha realizado la mayor parte de los procesos, desde solicitar y utilizar sus credenciales de usuario hasta aprender a manejar las APIs de ORCID. El hecho de que ORCID se encuentre aun en un estado relativamente temprano de desarrollo también supuso una dificultad en algunos momentos, dado que ocasionalmente implicaba colaborar directamente con ORCID en la definición del procedimiento para realizar determinados procesos. Finalmente, la necesidad de apoyarse en un único servicio de soporte técnico de ORCID con su horario temporal específico fue asimismo una de las consecuencias del papel pionero adoptado por la Universidad.

Uno de los grandes retos que afronto la Biblioteca UniOvi – uno que sera además relativamente frecuente en otras instituciones – fue la falta de soporte técnico interno específico para la tarea. Esta dificultad se pudo superar no obstante gracias al apoyo proporcionado tanto por ORCID como por GrandIR.

Dos son los ámbitos adicionales en los que existen aún retos por resolver antes de lograr una adopción amplia de ORCID en la Universidad. El primero de estos ámbitos es cultural, y conlleva implicar a los autores en el proceso de reclamación, alimentación y utilización de sus ORCIDs. Esto debería basarse en buena medida en la definición y difusión de bunas prácticas. El otro ámbito en el que quedan retos por resolver es el técnico: en primer lugar hace falta un procedimiento para identificar con garantías los potenciales duplicados y posiblemente para fusionar perfiles ORCID creados sobre direcciones de correo diferentes de un mismo autor. Además de esto, la Biblioteca desearía contar con los permisos necesarios para poder mantener los perfiles ORCID de nueva creación y para ser capaz por ejemplo de reclamar publicaciones en nombre de los autores. Estas son áreas en las que ORCID está desarrollando su trabajo en este momento, y a medio plazo se podrá contar con las funcionalidades necesarias para abordar estos retos.

El resultado

El principal resultado del proceso hasta ahora ha sido el intento, tan exitoso como modesto, de crear automáticamente perfiles ORCID para unos pocos autores UniOvi desde la Biblioteca. Sin embargo, una vez que se han creado los primeros ORCIDs, extender su cobertura hasta abarcar la totalidad de los autores UniOvi no supone grandes retos técnicos. Además de esto, el éxito preliminar en la identificación de duplicados por parte de la aplicación para la creación automática de ORCIDs supone un primer paso en la definición de criterios que permitan asegurar la detección de potenciales duplicados como parte del proceso de creación automática de ORCIDs. La Biblioteca tiene ahora la oportunidad de examinar el proceso de reclamación de ORCIDs por parte de los autores – junto a la ocasión de proporcionar feedback durante el proceso, por ejemplo sugiriendo la posibilidad de permitir una personalización del mensaje de bienvenida por parte de la institución miembro a través de un panel de opciones que permita seleccionar el idioma en que se recibe el mensaje de bienvenida. Por otro lado la Biblioteca está ya diseñando estrategias institucionales de difusión, incluyendo una breve Guía de Reclamación de ORCIDs para los autores y un sitio web institucional que ofrezca una introducción a ORCID y un resumen de sus principales beneficios para los autores y para la Universidad. Todos estos contenidos deberían ser en buena medida reutilizables por las instituciones que se unan a ORCID de ahora en adelante.

El camino pendiente

Los siguientes pasos a dar para completar el trabajo son en primer lugar extender este desarrollo piloto hasta proporcionar cobertura a todos los autores UniOvi. Una vez que se logre esto, debe desarrollarse una estrategia para implantar los nuevos ORCIDs en los sistemas institucionales, comenzando con el repositorio institucional RUO. La implantación de ORCID en el repositorio debería suponer un medio para atraer a los autores hacia él y asegurarse de que aquellos autores que aún no han depositado ningún trabajo en el repositorio se percaten de los servicios de valor añadido que éste puede proporcionarles.

Finalmente, una buena parte de las tareas pendientes pertenece al ámbito de la difusión: desde la Biblioteca se pretende promover una serie de buenas prácticas para el uso de los ORCIDs por parte de los autores institucionales. Además de esto, una vez que se complete el proceso, la Biblioteca está también interesada en difundir las buenas prácticas para la adopción institucional de ORCID a través de un canal más riguroso que un mero post, por lo demás el medio más rápido para dar a conocer y compartir los progresos realizados.

First successful automated ORCID creation at institutional level

On March 25th a first successful attempt was made at Universidad de Oviedo (UniOvi) for an automated ORCID creation process for institutional authors. A modest first batch with 10 XML UniOvi author files was fed into the production ORCID API and 9 ORCID profiles were successfully created – with the 10th being identified as a potential duplicate and subsequently reported. A brief description of the process that lead to this result is provided below.

The process

Following extensive ORCID outreach activities in the country, Universidad de Oviedo became the first institutional ORCID member in Spain last December. Once the membership was signed, the decision was made for UniOvi Library and its Bibliographic Information Service Manager Maria Luisa Alvarez de Toledo to become responsible for ORCID adoption at UniOvi. Besides relying on the ORCID technical support service -a big thanks to Catalina Oyler here- UniOvi decided to also contact GrandIR for the purpose. GrandIR had organised the ORCID technical session earlier in September and was very much involved into ORCID dissemination and adoption, so it looked like a good opportunity to put this knowledge to use.

The first step towards ORCID adoption at UniOvi was to define a strategy for institutional ORCID creation and implementation into UniOvi research information management systems. The Library keeps a registry for all UniOvi authors, together with their most frequent signatures and identifiers such as ScopusID or ResearcherID - which are often managed from the Library too. The process involved XML UniOvi author file generation so that these could be fed into the ORCID API.

A testing stage followed: a number of mock ORCID profiles were generated on ORCID OAuth Playground testing environment via the command line. These were successful and allowed to test specific XML configuration with regard to character coding and special characters often found in Spanish names. However, the need to operate from the command line made the ORCID generation process quite a slow one, which would suit the purpose of creating ORCIDs for a few authors, but certainly not for all UniOvi scholars. The decision was then made to develop an application that would allow automated ORCID creation for a large number of authors, and GrandIR took on the challenge. A few weeks later, a first prototype was available for live testing on ORCD production environment. These first tests resulted in 10 XML author files fed to the ORCID API and 9 new ORCID profiles created. Most of these new ORCIDs are still pending claim by authors - and in fact the claiming rate by authors is one of the aspects the Library is looking at before planning further internal outreach strategies.

The challenges

A number of challenges have already been tackled along the way to automated ORCID creation. The main one among these is a consequence of UniOvi being the very first institution to carry out most of the procedures, from requesting and using its credentials to learning how to operate the ORCID APIs. The fact that working together with ORCID was occasionally required to define how specific processes should be carried out was sometimes a bit challenging – but certainly fun as well. Finally, the need to rely on a single-point ORCID technical support service (running on a specific time zone) was also one of the consequences of the pioneering role UniOvi took that will presumably be improved in the future.

One of the big challenges that the UniOvi Library faced – and this will probably be quite frequent at other institutions – was the lack of specific internal technical support for the task. However, this issue could be overcome thanks to the support provided both by ORCID and GrandIR – and it should by no means discourage institutions interested in becoming ORCID adopters, since from now on there will be a growing network of supporting colleagues and institutions available to help.

There are two additional strands in which challenges remain before a far-reaching ORCID adoption is achieved at the University. The first one is cultural, and involves engaging authors into the process of claiming, completing and using their ORCIDs. This should very much be based on a best practice definition and dissemination. The other domain where challenges are still to be tackled is the technical area: first, there is a need for a reliable identification of potential duplicates and possibly for merging ORCID profiles created on different author email addresses, and then, the Library would also wish to have privileges for maintaining the newly-created ORCID accounts and be able for instance to claim publications on behalf of the authors. These are areas where current ORCID work is taking place, and new features will be available in the mid-term that will enable this functionality.

The outcome

The main process outcome has so far been a successful (if humble) attempt for automatically creating ORCID profiles for a few UniOvi authors from the Library. However, once the first ORCIDs were created, extending the coverage to the remaining UniOvi authors poses no major technical challenge. Furthermore, the successful identification by the application for automated ORCID generation of a previously existing ORCID for one of these 10 authors was a first step in putting together a set of criteria that will ensure detection of candidates for duplicated entries at ORCID creation time.

The Library has now the opportunity to test the process for ORCID claiming by authors - together with the opportunity to provide useful feedback along the process, for instance by suggesting that it might be useful to allow the welcome message to be customised by the member institution through an option panel that would allow to choose things such as the language the welcome message is written in. Institutional outreach strategies are already being designed, including a brief guide on ORCID claiming for authors and an institutional ORCID website providing an introduction to ORCID and explaining what its benefits are both for authors and the institution itself. All these contents should very much be re-usable by institutions which join ORCID from now on.

The way ahead

The next steps for completing the work are in the first place extending the pilot to cover the whole set of UniOvi scholars. Once this is achieved, strategies are to be designed for implementing the new ORCIDs on institutional systems, starting with the DSpace-based institutional repository RUO. Ideally, ORCID implementation on the repository will provide a means to engage authors with it and ensure that those authors who have not deposited anything yet in the repository will realise some of the value-added services it may provide them.

Finally, a great deal of the remaining tasks fall into the outreach domain: best practices for ORCID use by institutional authors are to be promoted from the Library as part of an awareness raising campaign about ORCID. Besides this, once the process has been completed, the Library has the intention to disseminate best practices in ORCID adoption at institutional level through a somewhat more rigorous channel than a blog post.

Friday, 22 March 2013

Preaching to the converted...

Sometimes when you try to argue on controversial issues - say for instance you consider that the coverage of sex abuse in the media is biased and lacks objectivity - you'll be accused of not being interested in figting sex abuse. Regardless of how frequently found around the place these days, it's the sort of argumentation that will drive you mad, since its sole purpose seems to be to avoid discussing the issue itself.

Open Access is no exception in this regard. I do not think my personally being pro-Open Access or anti-Open Access should be part of the discussion here, but you may check posts below should this paragraphs raise any doubt about it. Incidentally I happen to be a physicist and have had the opportunity during my extensive Open Access dissemination activities (and have taken great pleasure in it) to discuss it with dozens of researchers from all fields, possibly hundreds of them, many of them Open Access-friendly, some of them reluctant, nearly all of them interested in a sensible dialogue on the future of scholarly communication.

As a result of these experiences, plus again lots of work for implementing OA at institutional and cross-institutional level, there are two relevant points I would like to make in this post:

1) Librarians ('shambrarians' would probably be more accurate) should refrain from going too far in telling researchers how to perform their professional activity. Librarians/shambrarians usually do not know enough about research and scholars can easily perceive that out of a five minute conversation. This is possibly the main reason for the huge divide among both communities - and the one that does actually explain why repositories are nearly empty. Underpopulated repositories is by no means just a 'keystroke issue' I'm afraid (although solving the 'keystroke issue' will help of course). Asking for strong mandates has lately become a mantra from the 'shambrarian' community, but when you listen to researchers' thoughts about this many of them are far from being convinced this is the right way to deal with Open Access implementation.

2) Open Access success is mainly a technology issue and not a policy one (or not to such extent anyway). Ideally the technology and policy strands should work in parallel, but technology can do without policy, whereas the opposite is not true. If you try to sell researchers an Open Access mandate for depositing their papers on the equivalent of a shack in architectural terms, they will very likely laugh in your face (this is the story of the last 20 years Stevan Harnad often talks about). You need to have a solid technical foundation with solid added-value services for talking researchers into Green Open Access. And that is presently far from being a fact. Regrettably far, I feel obliged to add. Not only that, it is kept far from being a possibility by the regular advocates ignoring most about technology and making emphasis on what they do know about: dialectics. I will not mention particular examples here, since it's not the goal of this post to get personal, but if Open Access is to succeed, the debate should clearly become more technical and less political.

No offence meant in any of this, I should warn. I read as much as I can of what gets published on OA, especially blogposts, and there's not too much I like out there I must say. But I do usually keep my opinions to myself since I believe there may be many different ways for OA to succeed - and fights between passionate Green OA supporters and their critical friends is certainly not one of them.

... and preaching to the non converted

After quite a long time away from this blog due to various circumstances – with work overload probably being the most convincing one – I will try to catch up with various threads in the next day ot two – and I shall start the attempt with an answer to this request for explaining what Open Access is and what its aims are I was delivered from the interesting comments section of this “Whoops! Are Some Current Open Access Mandates Backfiring on the Intended Beneficiaries?” post by Kent Anderson at The Scholarly Kitchen blog. This is my answer – I tried to keep it as concise as possible, apologies if it may still be a bit long.

I am probably too busy trying to overcome the numerous challenges that stand in the way of Open Access implementation myself to provide a too detailed and accurate description of what Open Access is and what its aims are, but I'll give it a go. Let me start by quoting the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2002):

"The Internet has fundamentally changed the practical and economic realities of distributing scientific knowledge and cultural heritage. For the first time ever, the Internet now offers the chance to constitute a global and interactive representation of human knowledge, including cultural heritage and the guarantee of worldwide access".

According to this, Open Access means ensuring this possibility is realised, and worldwide dissemination of research outputs should indeed be a shared goal for institutions (and its libraries) and for publishers. It means that any researcher anywhere in the world may have the opportunity for the first time in history to freely share her research results (and this includes research data) with the whole research community and beyond. Whether this is achieved through the so-called Gold route (Open Access or hybrid journals) or via Open Access repositories (the Green route) is secondary to some extent - although not of course if business models are our sole concern here.

Open Access deals with the have and the have-nots (which does not just mean developed vs developing countries, but rather privileged vs underprivileged researchers in terms of having or not an institutional coverage for accessing the research information they require for carrying out their own research). And Open Access deals with whether a freely available author's final peer-reviewed manuscript might provide a useful alternative to the much-preferable version of record for those underprivileged researchers who can't or won't afford paying the fees required to read the papers that will allow them keep up-to-date with advances in their own research area.

Research funders are well aware of the challenge, especially those in the area of biomedical research, and Open Access mandates are their attempt to tackle the access issue in an area where many institutions both in rich and poor countries lack the (quite substantial) budgets required to provide their reseachers a comprehensive access to publications in toll-access journals. What about publishers? They are indeed adapting their business models to fit the Gold route by taking Article Processing Charges from authors as a prerequisite to making research papers available Open Access so they can meet the funders' mandates – which is fine. But this adaption to Open Access has not at all improved their image in the eyes of institutions (and many researchers in them), who suspect some not-so-subtle form of double-dipping is taking place since they still need to pay for their journal subscriptions on top of the APCs.

What could publishers then do to stop the fight?

The European PEER Project was a 3-yr STM Publisher Association-lead attempt to assess the impact of Open Access repositories on the 'European Research ecosystem'. This was technically carried out by delivering a large amount of final peer-reviewed author manuscripts into a cross-European institutional Open Access repository network.

Publisher participation ensured the right research article version was deposited, and the whole exercise was also useful for them: not only they were able to become aware of the relevance of sufficient metadata (a concept that CrossRef has later extended among the wider publisher community), but also to harmonise their interoperability standards through the use of the NLM DTD. Furthermore, the conclusions of the PEER project assessment carried out by CIBER Research Ltd was that not only publishers were not harmed by Open Access repositories, but rather on the contrary the paper download figures from journal pages at publisher websites were much improved by their availability as final manuscripts at repositories (since it's the version of record any researcher will prefer to read and cite unless of course they have no means to accessing it).

PEER was a one-time exercise, but it also delivered a proof of concept for cooperation between publishers and institutions in order to provide researchers the service they require for meeting the funders' mandates they are subject to. And in fact some sensible publishers are still interested – and taking subsequent steps in this direction – in delivering their authors the deposit service they require to meet the mandates. The way these sensible publishers see it, this is a means to offer researchers competitive advantages at journal selection time and will ensure a steady number of submissions in an increasingly competitive market framework for journals.

In the meantime the institutional Open Access community (which reached a critical mass quite a long time ago) is taking steps to ensure the repository systems become fit for purpose in order to meet funder requirements in terms of offering OA to the outputs of research projects funded by them. There are indeed technical as well as cultural/political challenges, in fact quite a number of them, but there is also a sustained and persistent effort to figure out the best ways to gradually address them. Institutional Research Committees are suddenly becoming aware (and this is the concern comment #2 addresses) that institutional research publishing budgets won't reach for providing Gold Open Access via payment of APCs for the whole institutional research output, so they're instead turning their eyes to their institutional Open Access repositories and wondering whether it could be the way of meeting funder mandates in a much cheaper fashion. At the same time, some funders are starting to rule hybrid journals out of their mandates for compliance purposes on order to avid the abovementioned risk of double-dipping.

The landscape keeps hastily evolving and it seems further adaption will be required both from publishers and institutions. This could ideally happen through cooperation and not through struggle, but there seem to be too many prejudices and too little efforts out there for a constructive dialogue to take place in a sustainable way.

Tuesday, 1 January 2013

ORCID en los países de habla hispana

Se leía el otro dia en un tweet que ORCID como iniciativa de identificacion de autores beneficiaría sobre todo a las mujeres, en una referencia a la acendrada costumbre en muchos países de que las mujeres cambien de apellido con el matrimonio.

Dado que esta costumbre no está tan arraigada en el mundo hispanohablante, será tal vez por ello que, con la excepción de España, no se encuentra ningún otro país hispanohablante en la lista de los 25 en los que ORCID está arraigando con más fuerza. Teniendo en cuenta que el registro en ORCID es gratuito para los autores y considerando asimismo el elevado número de excelentes comunicadores en el ámbito de la gestión de información científica que pueblan la lista LLAAR sobre Acceso Abierto y Repositorios, esto no deja no obstante de constituir una considerable sorpresa.

En el día en que tradicionalmente se formulan los propósitos de año nuevo, hacer un esfuerzo por incorporar a los países hispanohablantes a próximas ediciones de este listado internacional podría ser una sugerencia interesante...

Por supuesto, la auténtica razón para la ausencia de los países hispanohablantes de este listado no es la que se apunta más arriba, sino la que se deduce de este otro tweet de fecha 28 de diciembre:

"ORCID now has 42,918 researchers registered, a third via manuscript systems or linking with other IDs"