
Balancing Openness and Privacy in Research Data Sharing
— by Chloe Ng
To promote transparency and research integrity, researchers are encouraged to make their research processes accessible, including sharing research data. However, for researchers working with human participants, there are important ethical and legal considerations when handling personal data, such as interview transcripts or medical records. This blog post explores strategies for practicing open research data while protecting personal privacy.
Why should research data be made open access?
Open research data refers to making the data that support scientific findings publicly available. Open data provides a range of advantages:
- Enhancing credibility
When data supporting research findings is openly available, other researchers can replicate the studies to validate the results, strengthening the reliability of the work (Borgman, 2012).
- Enabling reuse of research data
“The value of data lies in their use” (National Research Council, 1997). Sharing data provides a foundation for others to build upon, fostering new insights and scientific progress.
- Increasing visibility and citations
Researchers can get higher visibility of the research findings associated with the shared data. Studies show that journal articles linked to openly accessible data receive on average 25% more citations (Colavizza et al., 2020).
- Complying with journal and funder requirements
Many funding agencies and journal publishers now mandate data sharing as a condition of grant awards or publication.
Can all data be shared publicly?
While researchers are encouraged to share their data, the handling of human participant data requires extra care. Ethical guidelines and legal frameworks govern the collection, storage, and sharing of personal data to protect individuals’ privacy rights.
The Personal Data (Privacy) Ordinance in Hong Kong defines personal data as “any information relating directly or indirectly to a living individual, from which it is practicable for the identity of the individual to be directly or indirectly ascertained, and in a form in which access to or processing of the data is practicable” (Office of the Privacy Commissioner for Personal Data, 2022).
The University of Hong Kong (HKU)’s Policy on Research Integrity typically requires that research involving human participants obtain ethical approval before data collection begins and comply with relevant data protection principles throughout the research lifecycle. These safeguards mean that some data should not be shared publicly.
How do researchers share sensitive data responsibly?
Several measures can be implemented to ensure data is shared ethically and legally.
1. Obtain informed consent

Before a study begins, researchers should consider if there is any sensitive and personal data in the projects and how it might be shared. Data management plans (DMPs) would be helpful for planning how sensitive data will be handled during and after the project. Research studies must obtain informed consent from human participants in the case of data sharing. Participants should have the option to decline data sharing specifically, or if they prefer, to withdraw from the study altogether if they are uncomfortable with their data being shared.
2. Anonymise research data

Anonymisation involves transforming data to prevent the identification of individuals, either directly or indirectly, while preserving its usefulness for research. The following anonymisation techniques can be applied:
- Direct identifiers such as names, national ID numbers, and addresses should be removed or pseudonymised by replacing them with fictitious names or codes.
- Indirect identifiers do not reveal identity on their own but can lead to identification when combined, such as age, sex, educational level and occupation. Research has shown that even genetic data can disclose the surnames of study participants by analysing Y-chromosome sequences (Gymrek et al., 2013). To reduce identifiability, various techniques can be applied, such as banding and aggregation which group continuous data points into broader bands, and generalisation which modifies specific details of text responses into general categories.
- Data-specific methods can also be employed, such as blurring features in visual data, applying voice distortion to audio recordings, or using statistical disclosure controls for quantitative datasets.
Extended reading: Anonymisation step-by-step — UK Data Service
3. Apply restricted access

Full anonymisation may not always be feasible, or it may make the dataset unusable if too much information is removed. In such cases, provided that researchers have obtained participants’ consent, the data can be deposited in repositories with restricted access. Rather than sharing sensitive data files openly, a metadata record describing the dataset is made publicly available. When the research is published in academic journals, the data availability statement should specify where the data is stored and the conditions under which it can be accessed.
The dilemma of open research data can be summarised by the principle:
“As open as possible, as closed as necessary”.
(European Commission, 2017)
While publishing data openly is encouraged to enhance transparency and facilitate reuse, it is not always appropriate due to ethical constraints. If researchers wish to share research data containing confidential information, they must take proper precautions such as obtaining informed consent, anonymising personal identifiers, or applying restricted access at data repositories to protect privacy and comply with regulations.
Extended Readings
References
Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059-1078. https://doi.org/10.1002/asi.22634
Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PloS one, 15(4), e0230416-e0230416. https://doi.org/10.1371/journal.pone.0230416
European Commission. (2017). H2020 Programme Guidelines on FAIR Data Management in Horizon 2020.
Gymrek, M., McGuire, A. L., Golan, D., Halperin, E., & Erlich, Y. (2013). Identifying Personal Genomes by Surname Inference. Science (American Association for the Advancement of Science), 339(6117), 321-324. https://doi.org/10.1126/science.1229566
National Research Council. (1997). Bits of power issues in global access to scientific data. National Academy Press.
Office of the Privacy Commissioner for Personal Data. (2022). Personal Data (Privacy) Ordinance (Cap. 486). Hong Kong: Office of the Privacy Commissioner for Personal Data (PCPD)