Update on the relationships endpoint

Our relationships endpoint is up and running and delivering information about how Crossref DOIs connect to each other and other items and objects with identifiers. This is all part of building the research nexus.

The endpoint is still at an early stage, though, and there are a few things to look out for in the results.

Available features

To see the available query filters you can check the Swagger documentation, which hasn’t been migrated to the new platform yet: the URL is Swagger UI.

Almost all of the features we wanted to put in at this point are available. One that we haven’t been successful with is returning a count of all results with queries, but this is an expensive call that was costing us performance.

Query times

The performance hasn’t been optimised, so you might find that some queries are slow and time out. If this happens, cut down the number of rows in your response, query over smaller time periods, and make use of cursors to scroll through multiple pages of results from a query. If you still have issues, it might be that there are a high volume of other queries being run at the same time so try waiting for a while.

Data ingestion

Data is still being ingested. We have four sources of data:

  • Live ingestion of Crossref member metadata records. This is working well and should be up-to-date.
  • Live ingestion of relationships from DataCite DOIs (via Event Data). This is also working and the data should be available.
  • Historical ingestion of Crossref metadata records. The data has been added to the database, but there is currently a backlog of over 150 million relationships which will take most of January to clear at the current rate.
  • Historical ingestion of relationships from DataCite DOIs. These have also been added, although a small number are in the backlog with Crossref metadata records.

Later this month we will start to add Event Data. We expect that process to be finished by the end of Feburary.

Timestamps

Dates are complicated! There is an updated time stamp for each relationship. Unfortunately, these differ for the historical DataCite and Crossref metadata. For DataCite, it is the date at which we originally received the data, so you will see DataCite events going back over a number of years. For Crossref, it is the date at which the metadata was collected from our primary database, which means that the majority of relationships have an updated time between 29 November and 7 December 2023. This graph shows how deposited date in the works endpoint correlates with updated time in the relationships endpoint for some sampled relationships:

The relationships go backwards in time, so the majority of the most recent records have been ingested, with earlier records still being added. However, this process hasn’t been completely sequential and some more recent records are still missing.

Feedback

We’ll keep posting here with further updates, in the meantime please let us know if you have feedback or suggestions.

3 Likes

Tomorrow (7 February 2024) we will be carrying out maintenance on the relationships endpoint. In order to improve performance we will be removing historical data and restarting the service with data from 1 January 2024.

1 Like