A Comparative Approach to Explore HKU Publication Trends Using Data from OpenAlex and Peer Databases

— Vivian Qiu, Florence Ng

The accelerating adoption of open science practices has driven the emergence of digital tools or platforms that enable openness in research. These tools greatly enhanced the accessibility and transparency of scholarly publications and relevant data. In our previous blog post released in 2022, we introduced OpenAlex, an open catalog of the global research system that indexes publication data worldwide. To explore how competitive open data could be, in this post, we would take a comparative approach to analyze HKU scholarly publication trends using open data retrieved from OpenAlex, together with data retrieved from two mainstream subscription-based academic research databases Web of Science and Scopus.

1. Comparison on Basic Features of OpenAlex, Web of Science and Scopus

Subscription-based scientific databases like Scopus and Web of Science have been commonly used by researchers and institutions worldwide for research impact assessment. They have strict indexing criteria for their databases with an aim of ensuring reliability and quality, yet they are also costly for institutions to sustainably provide access to their researchers and students. Table 1 shows a brief comparison of the three platforms.

Table 1: Comparison on Basic Features of OpenAlex, Web of Science and Scopus

FeatureOpenAlexWeb of Science Scopus
AccessibilityOpen sourceSubscription-basedSubscription-based
No. of RecordsOver 250 million[2]Over 92 million[3]Over 97 million[4]
Are Indexed Journal Peer Reviewed?Information not availablePeer reviewed*Peer reviewed^
Data ExportAvailableAvailableAvailable
API availabilityAvailable with free usageAvailable with subscriptionAvailable with subscription

Note:

  1. Data retrieved in Aug 2024
  2. *Peer review is one of the journal evaluation and selection criteria for the Web of Science Core Collection. While most of the indexed journals are peer reviewed, the database does not curate the peer review status information. https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Core-Collection-Explanation-of-peer-reviewed-journals
  3. ^Peer review is one of the journal selection criteria for Scopus. https://www.elsevier.com/products/scopus/content/content-policy-and-selection

From the above table, we could see that OpenAlex has a larger volume of indexed data (over 250 million) comparing to the Web of Science Core Collection (92 million) and Scopus (94 million) due to its wider coverage of data sources, such as Microsoft Academic Graph (MAG), CrossRef, ORCID, ROR, DOAJ, Pubmed Central, Unpaywall, repositories like arXiv or Zenodo etc. It is important to highlight that this extensive coverage may include works which are not peer-reviewed; Hence, users should be cautious in using these data and deriving interpretations on them.

 2. Analysis on HKU Research Publications

To explore insights on HKU research publication trends using data from OpenAlex, we extracted data from its API and replicated the analysis we have done previously using the Web of Science and Scopus data. The scope of the studies will be aligned so that we can take a comparative approach to draw insights from the data. This post will demonstrate a brief analysis on two areas:

  • Open Access publications
  • Publications in relation to the Sustainable Development Goals (SDG)

2.1 HKU Open Access Publication

In our previous study conducted in January 2024, we analyzed the involvement of HKU authors in open access publishing from 2003 to 2022 using data extracted from InCites. When replicating it with data from OpenAlex, we investigated the percentages of open access and non-open access publications each year, looked further into the percentages of two types of open access route: Gold and Green open access and explored the ratio of open and non-open access publications by subject areas. The data are represented in figure 1, 2, and 3.

Figure 1: Percentage of Open Access and Non-Open Access HKU Publications (2003-2022)
using data from OpenAlex (left) and InCites (right)

Data source: OpenAlex
Data source: InCites

Note:

  1. Data extracted from OpenAlex on 5 March 2024, and InCites on 13 March 2024.
  2. Publications included documents indexed by OpenAlex, with filtering entities #institutions.ror (HKU), #publication_year (2003-2022), and #is_oa (True, False). For data from InCites, methodologies are included in this blog post.
  3. OpenAlex uses a broad definition for Open Access (OA). It defines OA publications as “(publication) having a URL where one can read the fulltext of this work without needing to pay money or log in.” https://docs.openalex.org/api-entities/works/work-object#is_oa

Despite the minor discrepancies in the exact annual percentage, both studies consistently indicate a general upward trend in the percentage of HKU open access publications from 2003 to 2022.

Figure 2: Percentage of Gold, Green, and Non-Open Access HKU Publications (2003-2022)
using data from OpenAlex (left) and InCites (right)

Data source: OpenAlex
Data source: InCites

Note:

  1. Data extracted from OpenAlex on 5 March 2024, and InCites on 13 March 2024.
  2. Publications included documents indexed by OpenAlex, with filtering entities #institutions.ror (HKU), #publication_year (2003-2022), and #oa_status (diamond, gold, green, hybrid, closed). For data from InCites, methodologies are included in this blog post.
  3. Please refer to the definitions for each type of open access route used by OpenAlex at: https://docs.openalex.org/api-entities/works/work-object#oa_status

In the analysis of open access publications in different routes, OpenAlex provides a more specific categorization, which enables user to draw detailed insights in the context of open access routes. Since the same specifications are not available on InCites, with an aim to conduct comparisons in this post, we grouped publications classified under diamond, gold, and hybrid as “Gold” in the OpenAlex data analysis.

From Figure 2, the visualizations again reveal similar results with an upward trend in percentage of gold open access publications and a downward trend in percentage of non-open access publications. A slight difference in the ratio of gold open access publications and green open access publications has been observed, and this suggests a higher number of green open access publications by HKU authors is indexed in OpenAlex comparing to the Web of Science databases.

Figure 3: Percentage of Open Access and Non-Open Access HKU Publications by Research Areas (2018-2022)
using data from OpenAlex (left) and InCites (right)

Data source: OpenAlex
Data source: InCites

Note:

  1. Data extracted from OpenAlex on 23 September 2024 and InCites on 16 January 2024.
  2. Publications included documents indexed by OpenAlex, with filtering entities #institutions.ror (HKU), #publication_year (2018-2022), and #is_oa (True, False). For data from InCites, methodologies are included in this blog post.
  3. Research areas were classified as “fields” in OpenAlex. The classification methodology is available on the OpenAlex help page.

From Figure 3, we observed that both “Immunology and microbiology” and “Biochemistry, genetics and molecular biology” had the highest ratio of open access publications to all other documents in the OpenAlex data analysis, while the data from InCites showed that “Microbiology” and “Molecular Biology & Genetics” were 2 of the 3 research areas having the highest ratio.

Since the methodology of categorizing research subjects are different between OpenAlex (26 fields) and the Essential Science Indicators (ESI) schema (22 research areas), direct comparisons on the exact data cannot be carried out further to identifying the research areas with highest open access publications ratio. However, to a certain extent, the analysis demonstrated the possibility of reflecting the open access publication patterns by research areas using OpenAlex data.

2.2 HKU Research Performance in Sustainable Development Goals (SDGs)

In our previous study released in March 2024, we examined the contribution of HKU to sustainable development by analyzing the SciVal data on SDGs. Replication of the study was only conducted on publication counts owing to the availability of data in OpenAlex. Figure 4 shows the number of HKU publications in SDG 1-16 from 2018 to 2023 using data extracted from OpenAlex and SciVal.

Figure 4: HKU Publications in Sustainable Development Goals (2018-2023)
using data from OpenAlex (left) and SciVal (right)

Note:

  1. Data extracted from OpenAlex on 5 March 2024, and SciVal on 5 March 2024.
  2. Publications included documents indexed by OpenAlex, with filtering entities #institutions.ror (HKU), #publication_year (2018-2023), and #sustainable_development_goals (1-16). For data from SciVal, methodologies are included in this blog post.

As shown in Figure 4, we observed similar patterns in the distribution of HKU publications in the 16 SDGs. Both studies revealed that HKU authors contributed most to SDG 3 (Good Health and Well-being), SDG 7 (Affordable and Clean Energy), and SDG 11 (Sustainable Cities and Communities), with publications in SDG 3 (Good Health and Well-being) significantly outnumbered the others.

3. Summary

The analysis presented in this post revealed that we were able to obtain similar results concerning publication trends with open data from OpenAlex. It suggested that the open database OpenAlex could serve as an alternative source for identifying patterns and trends in institutional research performance.

However, it is noteworthy that there were obvious differences in the exact number of publication counts between data from OpenAlex and subscription-based sources.

Restricted by the availability of common metadata fields between OpenAlex and subscription-based tools like InCites and SciVal, this post could only demonstrate four possible comparisons in a limited scope, and may not fully uncover the potential of OpenAlex data in research performance analysis. We hope that this post could shed light on the possibilities of open data and the efforts made by the global community in enabling transparency and affordability in understanding research performance at different levels.

References:

[1] “About us,” OpenAlex. Accessed: Sep. 09, 2024. [Online]. Available: https://help.openalex.org/hc/en-us/articles/24396686889751-About-us

[2] “OpenAlex: The open catalog to the global research ecosystem.” Accessed: Sep. 20, 2024. [Online]. Available: OpenAlex: The open catalog to the global research system

[3] “Web of Science Core Collection,” Clarivate. Accessed: Sep. 03, 2024. [Online]. Available: https://clarivate.com/products/scientific-and-academic-research/research-discovery-and-workflow-solutions/webofscience-platform/web-of-science-core-collection/

[4] “Scopus | Abstract and citation database | Elsevier,” www.elsevier.com. Accessed: Sep. 20, 2024. [Online]. Available: https://www.elsevier.com/products/scopus

[5] “What are the most used Subject Area categories and classifications in Scopus? – Scopus: Access and use Support Center.” Accessed: Sep. 03, 2024. [Online]. Available: https://service.elsevier.com/app/answers/detail/a_id/14882/supporthub/scopus/kw/+subject+areas/

[6] “Open Access in Progress: An Overview of Participation of HKU Authors in Open Access Publishing — Researcher Connect.” Accessed: Sep. 04, 2024. [Online]. Available: https://blog-sc.hku.hk/open-access-in-progress-an-overview-of-participation-of-hku-authors-in-open-access-publishing/

[7] “Where does your data come from?” Accessed: Sep 19, 2024. [Online]. Available: https://docs.openalex.org/additional-help/faq#where-does-your-data-come-from

[8] “Work object | OpenAlex technical documentation.” Accessed: Sep. 05, 2024. [Online]. Available: https://docs.openalex.org/api-entities/works/work-object

Share