OpenAlex: The Open Catalogue of the Global Research System
— by Florence Ng
What is OpenAlex?
OpenAlex is a free and open source index of hundreds of millions of interconnected entities across the global research system. The name “OpenAlex” is inspired by the ancient Library of Alexandria in Egypt. This project is run by OurResearch with a grant received from the Arcadia Fund.
In January 2022, OpenAlex was first announced as a replacement of the Microsoft Academic Graph (MAG), which has been retired and discontinued at the end of 2021. Despite merely being a replacement, OpenAlex aims to reach beyond what the MAG originally provides by aggregating data from Crossref, Unpaywall, ORCID, Institutional Repositories (IR), and more.
What data are indexed in OpenAlex?
As mentioned in the previous paragraph, OpenAlex is standardising its dataset from extended sources and platforms. While the MAG and Crossref remain as the most important data source, OpenAlex would gather data such as the open access status of publications from Unpaywall, author cluster from ORCID, persistent identifiers like ISSNs for journals, and ROR IDs for institutions.
The dataset in OpenAlex describes scholarly entities and how those entities are connected to each other. Five types of entities are available:
- Works (i.e. papers, books, datasets, etc; they cite other works)
- Authors (i.e. people who create works)
- Venues (i.e. journals and repositories that host works)
- Institutions (i.e. universities and other orgs that are affiliated with works via authors)
- Concepts (i.e. tag Works with a topic)
These five entities are inter-connected and they make it easier for you to retrieve selected data from OpenAlex.
How to retrieve data from OpenAlex?
It is announced that three methods can be used to retrieve data from OpenAlex:
- The OpenAlex API
- Website (It is expected to be officially launched soon. The site is under construction: https://explore.openalex.org/)
- The database snapshot (The API and the forthcoming website will be most recommended and sufficient enough, but you may still refer to this page on how to set-up the snapshot.)
About the API
At the moment the author wrote this post, the API is the primary way to retrieve information in OpenAlex. Data retrieved from the API are in the JSON format which could be easily readable. Results are suggested to be viewed on the Firefox web browser as it has a built-in JSON viewer. Alternatively, you may copy and paste the results onto an online JSON formatter for a more readable view on other web browsers.
Here are some examples:
#1
Get the numbers of scholarly works published by HKU authors by their Open Access (OA) status (i.e. closed, bronze, green, gold, etc.):
https://api.openalex.org/works?filter=institutions.ror:https://ror.org/02zhqgq86&group_by=oa_status
The below graph visualized the data:
Percentage of Open Access Publications at HKU
*Data retrieved on 2 March 2022.
#2
Get the numbers of OA scholarly works published by HKU authors, grouped by year:
https://api.openalex.org/works?filter=institutions.ror:https://ror.org/02zhqgq86,is_oa:true&group_by=publication_year
The below graph visualized the data:
Number of Open Access Publications at HKU (2001-2021)
*Data retrieved on 2 March 2022.
#3
Get details of an author by ORCID id, including counts of publications and citations by year:
http://api.openalex.org/authors/orcid:0000-0003-3408-2852
#4
Get the number of retracted papers by HKU authors:
https://api.openalex.org/works?filter=institutions.ror:https://ror.org/02zhqgq86&group_by=is_retracted
For more details, you may refer to the guidelines on using the OpenAlex API at https://docs.openalex.org/api. For any questions, you are welcome to contact us at scholarlycomm@hku.hk.