Selecting a Repository for Data Sharing    

— by Christina Wong

For ensuring transparency and reproducibility of research, preserving data, increasing research visibility, and meeting the requirements of publishers and funders, sharing research data has been an increasing trend in academia. On top of managing research data with a Data Management Plan, the next question has come to your attention: in which repository shall we share our research data?   

Functions and Features  

 As suggested by Wilkinson et al. (2016), it is important to make research data Findable, Accessible, Interoperable, and Reusable (FAIR). Apart from being a protocol for research data management, the FAIR Principles could also be used as a standard for the assessment of the quality and suitability of a repository for storing and sharing research data. Kim (2018) further elaborated that researchers shall use repositories with essential features like data security and access level setting, edit log retrieval, automation support, and compatibility with various file format and application language. Sometimes, research grant funders and publishers may have requirements for the repositories to store and share data too.  

Target Audience  

Once those requirements are met, you can consider:

  • To increase research visibility within your own subject area, disciplinary repository could be the best choice.  
  • To increase research visibility within your own organization and make good use of the resources of your organization, institutional repository could be an ideal repository for you.  
  • To deposit multi-disciplinary research data with different formats, you may select generalist repository.  

List of Data Repositories  

Re3data.org (the Registry of Research Data Repositories) is a curated directory to support selection of a data repository. Each entry summarizes key features, including the repository URL, subject areas, data management policies, scale, supported languages, and the corresponding organization with contact details. It helps you to access which repository satisfies your needs, and navigate to the repository for further details.  

Disciplinary Repository

Disciplinary repositories focus on specific subject areas, and are typically compatible to data type in the corresponding fields. So, they are ideal places to publish your dataset and promote the visibility of your research within the subject. Below are some examples of disciplinary repositories and their respective subject areas.  

Subject Area   Repository   Description  
Clinical Science   Vivli  Clinical data  
Biochemical Science   BioGRID  Protein and Gene data  
Engineering   The Materials Project  Data/ models of inorganic and crystalline materials  
Environmental Sciences   PANGAEA  Georeferenced observational and experimental data  
Social Science   Inter-university Consortium for Political and Social Research   Quantitative statistics on social sciences  
Mathematics   The SuiteSparse Matrix Collection   Sparse matrices  

Institutional Repository: HKU DataHub   

Darragh et al. (2024) highlighted the benefits of institutional repositories, such as lower cost for data preservation, sustainable long-term storage, capacity for large datasets, and strong reputations with publishers. HKU DataHub is the institutional data repository at the University of Hong Kong (HKU). With features like issuing a unique Digital Object Identifiers (DOI) to dataset and sharing via private links for confidential peer-review, HKU DataHub is an option for HKU researchers to meet the requirements of funders and journal publishers for the review and sharing of data.    

If you have deposited your research data in external repository, you may wish to create an item record in HKU DataHub and link it with the URL of your external record. This may help promote the visibility of your research within the HKU community. For more details, you may wish to visit HKU DataHub: The Guide.  

Generalist Repository

Generalist repository refers to repositories which support depositing multidisciplinary research data in a huge variety of file format. Repositories like Figshare, Dryad and Zenodo are examples of generalist repositories.  

To conclude, to choose an appropriate data repository for your research data, it is fundamental to ensure that data management functions of a repository align with the FAIR Principle. Building on this foundation, you may then select the repository which suit your specific needs the best. A well-chosen repository not only helps you comply with publisher requirements but also enhances the visibility, accessibility, and reusability of your research data, ultimately maximizing its impact and value.  

Further Reading  

Reference  

Darragh, J., Narlock, M. R., Burns, H., Cerda, P. A., Cowles, W., Delserone, L., Erickson, S., Herndon, J., Imker, H., Johnston, R., Lake, S., Lenard, M., Mohr, A. H., Moore, J., Petters, J., Pullen, B., Taylor, S., & Wham, B. (2024). Institutional data repositories are vital. Science, 385(6714), 1174. https://hdl.handle.net/11299/265639   

Kim, S. (2018). Functional Requirements for Research Data Repositories. International Journal of Knowledge Content Development & Technology, 8(1), 25-36. https://doi.org/10.5865/IJKCT.2018.8.1.025  

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Evelo, C. T. R., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3, 160018.  
https://doi.org/10.1038/sdata.2016.18

Share