Interview with Vincent Larivière

Date: Feb. 11, 2019
Categories: bibliometrics
information science

Lede: The project led by the Observatoire des sciences et des technologies aims to build a citation index based on the Érudit platform.

1. The Observatoire des sciences et des technologies (OST) is a world-renowned organization, known for its expertise in the field of research evaluation and metrics. Can you tell us more about its mission and activities?

The OST was created in 1997 in Montreal and is dedicated to the measurement of science, technology and innovation (STI). The members of the OST team have gained an international-level expertise in scientometrics, technometrics and research evaluation. They construct, enhance and maintain several databases on R&D, research funding, patents and scholarly publications, in order to offer various services related to the evaluation of scientific and technological activities. The OST has over 30 Canadian partners and has completed over 500 assignments for private, public, and parapublic organizations.

2. As part of the CO.SHS project, the OST team is working on building a citation index based on the scholarly and cultural journals disseminated on the Érudit platform. Can you tell us what exactly constitutes a citation index?

A citation index is a bibliographic database identifying, like any documentary base, different metadata such as author names, publication years, publication titles, journal titles, summaries and keywords. Citation indexes, however, also index, in addition to this traditional information, the references quoted in these publications as well as the institutional address (or “affiliation”) of the authors.

This allows to understand the formal ties between documents. Rather than searching exclusively with keywords, citation indexes make it possible to browse documents with common bibliographic references or quoting/quoted links, without the documents necessarily containing the same words or being published in the same journals.

3. What is unique about the index you are developing?

In the 1960s and 1970s, citation indexes were considered to be revolutionary, but they are now very common. While the Web of Science (Clarivate Analytics) has remained the only citation index for several years, a dozen citation indexes have been created worldwide as of the beginning of the century.

Despite the large amount of citation indexes existing today, their main coverage still shows a strong bias in favour of international journals, and specifically Anglo-American scientific literature. The citation index developed by the OST will allow to counter this problem for French-language scholarly publications, and thus to bridge the gap in our current understanding of knowledge production in French.

4. The citation index will allow, on the one hand, to browse from an article to the other on the Érudit platform through citation networks linking the documents between each other and, on the other hand, to better understand the modes of production and dissemination of knowledge in the humanities and social sciences and arts and letters in Quebec and in Canada. How will the citation index fulfill both of these functions?

First, new features will be implemented into the Érudit platform in order to allow for navigation between documents, starting with added links between quoting and quoted documents. Several documentary databases, such as Google Scholar and Web of Science, already allow navigation through citation networks.

The two other developments are related to data availability for research. A relational database (Microsoft SQL Server) will be created and maintained by the OST team and made available to researchers. Researchers wishing to build their own database will be invited to request a data dump, which will be updated yearly.

5. Concerning direct implementation into the Érudit platform, how do you foresee integration with the Érudit team? Will the citations be updated automatically and continuously?

The updates will need to happen in real time for current production (the journal issues for the ongoing year), as is the case for any documentary database identifying existing citations. This will eventually call for a collaboration with the Érudit team, as the XML production processes will need to be adapted. The technological developments that will allow the platform to become a true citation index are of course very important and will require recurrent funding beyond the scope of CO.SHS. It will therefore be necessary to plan ahead for the remainder of the project.

As a first step, it would be quite possible to add a button on the detailed pages for documents on the platform – similar to the PLOS metrics –, showing the amount of citations found for the document, and which could link to the list of quoting documents on the Crossref website.

Source: https://doi.org/10.1371/journal.pone.0011273

6. How far along are your developments and what are the main challenges you have encountered?

We are currently working on identifying and standardizing author affiliations. The challenges facing us are related to the vast disparity of formats in which this data is recorded, given that the journals on Érudit are allowed to individually choose their presentation format.

Bibliographic references are therefore sometimes located at the end of documents, in a bibliography, but in certain cases, they can be found in footnotes – and footnotes are extremely hard to extract for a machine. Both types of references can also coexist in a single document. The same thing goes for author affiliations: they can be mentioned at the beginning of a document, under the authors’ names, or in a biographical note at the end of an article. A great deal of manual labour is thus required to be able to identify each element of information precisely.

7. How will you ensure that the index doesn’t operate in a vacuum?

A citation index is only useful if it operates on a large scale. We must therefore go beyond the documents disseminated on the Érudit platform and connect the largest possible amount of publications. This is why we are working with Crossref, a non-profit organization offering various referencing services for scholarly content and metadata sharing, such as citations. With our efforts, we aim to create better connections with Crossref and other services indexing this information (such as Microsoft Academic and Google Scholar). But there is no ideal solution as of yet.

8. Where do you stand in regards to other international initiatives around metadata and open citations?

A large portion of the data on knowledge production (usage, impact, etc.) is currently owned by private companies. This is why we are working in collaboration with non-profit organizations and initiatives in order to build a collective, open, and free research infrastructure for the academic community that will allow for the promotion of French-language publications. For example, The Initiative for Open Citations (I4OC) plays a major role in this movement. It is a project bringing together scientific publishers, researchers and other stakeholders from academia whose common goal is to foster unrestricted access to citation data. Érudit has been part of this initiative since January 2018. Over half of the current references worldwide are currently identified therein.

To find out more about the project’s developments, follow @coshslab on Twitter!

Interview with Vincent Larivière

Article