Tuesday, 21 December 2010

OAI-PMH aggregation?

I wanted to explore the issue of OAI-PMH aggregation and to gauge UKCoRR opinion of its still-to-be-realised potential (or not). I've also been threatening to post to the blog for a while and this seemed like an ideal subject to explore in a more public forum than on the mailing list.

As I noted once in a post on my own blog I have for some time been a little nonplussed by our collective, continued obsession with the woefully under-used OAI-PMH. Other than OAIster (an international service), the only services I'm currently aware of in the UK are the former Intute demo now maintained by Mimas - http://irs.mimas.ac.uk/demonstrator/ and a (pilot) OAI-PMH cross-search tool developed as part of ERIS (Enhancing Institutional Repository Services in Scotland)

The protocol dates back to the earliest days of the open access and institutional repository movements when there was considerable investment by the community, in software specification for example, and has never really, I don't think, been as widely used as it could be. I can offer only anecdotal evidence but I’m pretty sure that your average academic will tend towards Google/Google Scholar* - who withdrew support for OAI-PMH back in April 2008 - to source research on the open web. Google, however, arguably has inherent limitations for academic purposes and I would argue that OAI-PMH still has considerable potential for (OA) research dissemination (though possibly watered down by so many repositories also carrying metadata only records rather than exclusively full text - one of the draw-backs of OAI-PMH harvest is that there was no easy way of filtering on full text from the major repository software.)

* As an aside I've had mixed results retrieving full text records from UK IRs using Google Scholar with many not returning anything at all - though the IRs in question certainly contain full text content.

In Ireland, however, they have rian.ie - Pathways to Irish Research which is much more fully realised portal that aggregates 8 Irish IRs using OAI-PMH, enabling you to browse by author surname and offering an advanced search form to filter by keyword, title, author, subject, institution and (interestingly) funder. Aggregating just 8 repositories (5 DSpace, 2 EPrints, 1 Digital Commons) will obviously make it easier to standardise metadata and systems than in the UK and it also returns full text only which immediately makes it more useful from an OA perspective. I've been in touch with the chair of the RIAN project group who has confirmed that "it was a policy decision to include only full text metadata in the RIAN harvest, even though some IRs might have some metadata only deposits. It was felt that a national portal of OA research material would be much more useful if it included only full text." This was achieved, however, by organising local IRs such that only full text content is exposed for harvest which isn't really a practical solution across the much greater number of repositories in the UK.

I've had some discussion with James Toon, the project manager for ERIS who in his dealings with research groups in Scotland has found "no interest at all in just searching for data in a national aggregation". Nevertheless, I can't help but feel that there is still potential for an aggregation service with a high level of functionality especially if we could figure out how to return full text only. May be it's just me?

James suggests that "the power of aggregations are on the subject level, when you can do things with the data - such as enhance it by linking common ontology, or providing subject specific services, such as topic mapping and so on"; he has also been working on CRISpool which is using the CERIF standard to integrate heterogeneous research information from several institutions into a single Portal. Perhaps OAI-PMH has had it's day and CERIF-XML aggregation is the future; nevertheless, the current repository infrastructure across the UK does not (yet) widely support that format - though this may change if more institutions implement CRIS - whereas the older protocol is a standard output across all 183 institutional repositories in the UK currently listed on OpenDOAR and, for that reason, I would argue, if no other, could be used more effectively after the rian.ie model.