Initiative for Open Abstracts

By COKI Project Co-lead Professor Cameron Neylon

One of the tactical questions that often comes up with moving towards more open practice in research is the value of taking small steps vs fighting the large battles. Sometimes big changes occur – and the shift towards open access, although slow is an example of a big shift – but often a set of small steps can help to build towards progress. But there is a tension here as well. Small improvements relieve pressure on the system. How do we address the risk that they reduce progress over all? The key to this is in understanding what those small steps can achieve.

Improving the quality and openness of metadata about scholarly communications is an example where many small steps have been made. Because metadata is infrastructure, underpinning many other systems, it is almost entirely invisible. But the work to make it is not.

We make elements of progress, each of them seemingly quite small, but then in combination they suddenly enable significant change.

What we do within the Curtin Open Knowledge Initiative is possible in large part due to incremental improvements in the infrastructure of persistent identifiers and the quality of open metadata data generally. The improvement in access to open citations data as a result of I4OC has been a major boost to our research allowing us, for instance to make a fair comparison of how a citation count index would perform if it used different bibliographic data sources to define the set of outputs to count citations for.

But where does metadata end and content begin? As a research project we also want to be able to do more granular analysis of the contents of research. Lots of data sources provide a classification of the topics of articles, either at the journal or article level. But mostly these are black boxes that tell us more about who made those classifications than about the things we’re interested in. For instance, in my work I’ve frequently been more interested in categorising articles by the technique that they use, rather than the topic being studied. Sometimes the region a study focuses on is more important than the discipline label. In a perfect world any researcher would be able to process the full text to create their own categorisations, but then we’re restricted to open access content, even assuming we can gather all the content together efficiently. Titles can tell us something, but certainly not enough.

What would make a huge difference is comprehensive and central access to abstracts.

Abstracts give enough detail to be useful, they often touch on the techniques used or the specific scope of a study. They’re small enough to be easy to work with at scale and rich enough to have valuable detail.

Microsoft Academic goes a long way towards providing this, and we’re using this great openly licensed source at scale. But what would be even better is a central source of high quality metadata provided directly by publishers. Having a single central source makes our work a lot easier. Having the data direct from publishers also means it is direct from the source and has been through fewer extra systems. It means we have the raw data. And finally, placing it in a system where publishers are providing the metadata means it can be comprehensive, not restricted to what a third party filter decides to include.

That’s what the Initiative for Open Abstracts is advocating for. And why I am supporting it.

I4OA advocates for publishers that already provide metadata to Crossref to use an already available capacity to deposit abstracts. Many major publishers, but also many smaller publishers are already on board. For our research work we could probably scrape all of their websites, and probably no-one would really object (or notice), but that’s not a polite way to get this metadata. Provide it once, provide it in a single place that is properly provisioned to support large-scale use. Our work at COKI is at a large scale and moves beyond research, so we want to apply best practice, ethically, technically and through financial support of the data holders where appropriate. COKI is a Crossref Metadata Plus member for precisely this reason (we also subscribe to the data feed from Unpaywall and plan to become an Open Citations member).

As a researcher I need reliability, and ease of use to create transformed information resources. As a builder of services we need stable, reliable and technically robust means of getting data at a large scale. For publishers it makes sense to drive discovery as efficiently as possible to the content they host and curate.

In our work, we want to use the work that publishers do best – curation, markup and metadata provision – to drive new discoveries, drive new ways of making discoveries and to help others to find the best copy of that content they have access to.

Making abstracts available through Crossref is one step towards improving that. Far from being a step backwards on a route towards fully-funded systems for global open-access it will help to demonstrate what is possible, and to showcase what publishers do well.