Data management (archiving and sharing)
When your research is complete and you are in the process of publishing your research output, archiving your data also comes into play. This will ensure that your research can be verified and reproduced. On the other hand, you archive data for future reuse, for example for further research or educational purposes. In addition, the data cannot remain in Research Drive, the storage system used during the research. The Hague University of Applied Sciences advises researchers to archive their data in the DANS Easy data archive after completion of their research. DANS is the national expertise centre for research data in the Netherlands.
As Open as Possible, as Closed as Necessary
The Hague University of Applied Sciences is a member of – and shares the ambitions of – The Netherlands Association of Universities of Applied Sciences. The Netherlands Association of Universities of Applied Sciences has signed the Berlin Declaration on Open Access to Knowledge in Science and Humanities. In the field of open science, the aim is, among others, 'the careful preservation of research data and the accessibility of research data according to the FAIR principle' (see below). The research data will then be available to colleagues within their own university of applied sciences, to colleagues at other universities of applied sciences, to researchers affiliated with knowledge institutions (universities, institutes for applied research) and to partners from professional practice. The principle is 'as open as possible, as closed as necessary'.
The Dutch code of conduct for scientific integrity also requires the researcher to carefully store both raw and processed data for an appropriate period of time (for the specialisation and the methodology) and to make the data as publicly available as possible. Subsidy providers and publishers follow suit. Subsidy providers ask that you make an effort to make the data freely available to other researchers after completion of the research. And when you want to publish in a magazine, the publisher may ask several things of you. For example, to publish your data on the publisher's website, to store it in a data archive or to make your contact details available so that the data can be requested.
Applying open science in data management means opening up data according to the FAIR principle.
Data are findable if:
- The data set is provided with a persistent identifier
- The data set provides data citation
- The data set is also linked to the persistent identifier of the author(s) (ORCID, ISNI)
- Provide the data set with complete metadata according to the appropriate standards
- The data set is linked to the publication based on the data set
Data are accessible if:
- The data set has been included in and made available through a reliable data archive
- The metadata is publicly accessible, even if the data set itself is not
- Access conditions (access protocols, contact details, embargo) are clearly stated
Data are interoperable if:
- The (meta)data have been made accessible via an API (Application Programming Interface)
- The (meta)data contain correct terms and relevant vocabulary according to the standards of your profession
- The (meta)data of the highest quality are
- The data are available in open standardised data formats
If you have already applied good data management during your research, transferring your data to an archive need not be complicated. You make the final considerations about which data should or can be kept and you check whether your data meet the requirements of the archive.
The file format in which data is stored is of great importance for long-term access to research data. DANS Easy, the preferred data archive for researchers of The Hague University of Applied Sciences, works with various preferred formats for different types of research data. The deposit of research data in these preferred formats will be accepted by DANS without question. Therefore, please read their table of preferred formats carefully.
As a general guideline, DANS states that the file formats that are best suited for sustainability and accessibility, in the long term are the ones that:
- are widely used;
- have open specifications;
- are independent of specific software, developers or suppliers.
Which data are suitable for archiving?
- To decide on this, first consider the following general criteria:
- Research results that have a high social impact must always be verifiable (e.g. clinical trials)
- If applicable, observe the requirements, stipulations or conditions of your subsidy provider and publisher
- Archive data with a high potential reuse value
- Consider the scientific, cultural or historical significance of data. Data that are valuable for scientific-historical research, for example, are eligible for archiving
- When the value of data comes from the complexity of recreating the data, this data also qualifies for archiving. The value of the data retention is then greater than the cost of creating the data
- Look at the usability of the data: data format, sufficient documentation and metadata, clarity of ownership
- Then you can make a detailed choice with the help of the following points:
- In general, the best practice is to archive the raw data as much as possible. But there are reasons to share the processed data, for example, if your research is intended to demonstrate a new method. Archiving interviews in audio or video formats is also discouraged because it is difficult to anonymise such data
- In simulations, it is better to archive the data used for the simulation (instead of the data resulting from the simulation)
- In the case of experiments, it is always wise to archive all the data necessary to repeat your experiment
- Archive the wording and information sheets of your consent form. You may also need to archive the signed consent forms as these are important for understanding how the data may be used in the future. However, you must bear in mind that these are personal data. According to the General Data Protection Regulation (GDPR), these should never be kept longer than necessary. You should also be extremely careful when sharing data that contains personal information
- When archiving completed questionnaires, it is not necessary to archive the empty questionnaires as well. But if you do not want to share the answers because of the sensitivity of the data, archive the emptied questionnaires so that they can be shared
- Raw data from interviews containing personal data and sensitive information should be destroyed immediately after they have been anonymised
- Software and code are important to archive so that you can repeat the simulations yourself and so that other researchers can validate and further develop your code
- It is usually not necessary to archive intermediate data or auxiliary data. The final data, on the other hand, are important when they form the basis of your results. These data are crucial for the integrity and verification of your results
- Archive data from an external party only when the licence or the conditions under which it is licenced allow you to archive the data. If not, make sure you document this data properly and archive this documentation. If the external party has archived the data themselves, you can refer to these data
For the data that you do not archive, you must take follow-up action. This includes, for example, deleting the data carefully. Pay special attention to sensitive data. When deleting data, you must prevent it from being restored and you must ensure that the data are deleted from all your storage locations. The most reliable way of destroying data is to render the carrier completely physically unusable. For device-independent storage or if you want to keep your device, files must be overwritten to make them inaccessible. This is called data erasure, data clearing, data wiping or data destruction. You can do this with the help of software such as CCleaner.
The retention period of data depends on the field of study, its developments, the costs of storage and access, and the expected (re)use. Data sets that are considered to be heritage, such as the results of archaeological research, are generally preserved for eternity. In some cases, it is legally stipulated how long data must be kept. The General Data Protection Regulation (GDPR) does not specify a concrete retention period for personal data, but it does state that such data may not be kept longer than necessary. According to the Dutch Code of Conduct for Scientific Practice, ten years is a minimum retention period for raw research data in the Netherlands. DANS therefore recommends this period as a time to reconsider whether research data should still be retained or destroyed.
By describing your data and providing accompanying metadata (data about data, characteristics or properties of data), you ensure the findability of the data. There are different types of metadata and metadata is classified on both file and data level. Read more about the types and levels in this guide [Link to file Guide folder structure and data documentation.docx]. The use of a metadata standard also guarantees interchangeability between systems. This makes your data more widely accessible.
DANS Easy, the preferred data archive for researchers of The Hague University of Applied Sciences, uses the Dublin Core metadata standard. This is a very common and general international metadata standard. Here you can find an overview and explanation of the metadata fields offered by DANS Easy. The more fields you fill in, the greater the findability of the data. The metadata is public. The fields should therefore only contain personal data to justify the data set and no personal data of research subjects.
In addition to general metadata standards such as Dublin Core, there are also domain-specific metadata standards. These are metadata fields that relate to, for example, numerical data (social sciences), material objects and their visualisations (archaeology), primary biodiversity data (biology) or tools to capture data (engineering).
Domain-specific data archives naturally use domain-specific metadata standards, but domain-specific metadata can also serve as a supplement to a general metadata standard (used by a general data archive). For example, DANS Easy contains specific fields for archaeology data that refer to the Archaeological Basis Register.
There are many different domain-specific metadata standards, depending on the research community, the purpose, the function in the domain. The English Digital Curation Centre provides a good overview. The Research Data Alliance established by the European Commission also maintains a list.
The Hague University of Applied Sciences advises researchers to archive their data in the DANS Easy data archive after completion of their research. DANS Easy has placed a number of points of attention concerning the deposit of data in their archive in a manual. Please read it carefully. For the deposit of your data, you enter into an agreement with the archive:
- DANS is granted the right to include the data set in its archive and to make it available under the conditions indicated.
- The agreement is a "non-exclusive" licence. This means that the owner of the data set remains free to deposit it and/or make it available elsewhere.
- You declare that you are the rightful claimant, or that you have permission from possible rightful claimants to deposit and make available the data set. Think of copyright, database right or patent right.
- You do not waive database rights or any copyrights; unless you choose to place the data set in the public domain.
DANS Easy is a general data archive but has delivery specifications for certain disciplines, including social and behavioural sciences. You can also choose a domain-specific data archive within your field, if available. The advantage of a discipline-specific data archive is that the possibilities are even more tailored to the respective research community. The data can be described in a richer way by using discipline-specific metadata standards. The re3data.org website offers an overview of general and disciplinary data archives worldwide. You can filter by subject, specialisation or country. You also have the option to search for data archives with a data marking. Such a seal indicates that the archive is a Trusted Digital Repository according to third parties and that the research data deposited there can also be found and shared in the future. A data archive with a data seal such as the CoreTrustSeal puts long-term access, security, findability and standardisation first.
The public sharing of data of completed (parts of) research supports transparency and openness of research. You will meet any requirements of subsidy providers or publishers and you will respect codes of conduct and declarations. You increase the impact of your research within and outside your specialisation and it benefits the visibility of you as a researcher.
In essence, four things are important when sharing data: (1) place the reusable data in a data archive that uses (2) a metadata standard and (3) a persistent identifier and (4) licence the data. (1) is necessary for accessibility, (2) is necessary for exchangeability, (3) is necessary for (re)findability and citability and (4) is necessary to make actual reuse possible (and can also be used for exchangeability).
DANS Easy, the data archive recommended by The Hague University of Applied Sciences, offers all options. Metadata standard is discussed above in the Data Archiving section. In the following, we look at licensing and quoting. But first we will discuss the reasons for restricting data sharing.
Sharing research data is not an all or nothing choice. It ranges from making data completely open on the one hand to keeping it completely closed on the other, with various possible forms of restricted/controlled access in between.
Open research data are data that 'can be freely used, modified and shared by anyone for any purpose' (opendefinition.org). Closed research data are data that are temporarily embargoed or cannot be shared at all. Restricted/controlled research data are data that are not shared in a fully open manner, but made available under more restricted access and use conditions. This means that there are limits to who can access and use the data, how and/or for what purpose. Access to data can be restricted in various ways:
- First of all, a login or authentication related to a certain institution/organisation or with a membership can be used.
- You can also choose to work with an agreement between you as the provider of the data and those who want to reuse your data, a Data Use Agreement. You agree on the conditions and the ways in which your data may be reused.
- A data archive such as DANS Easy also offers the possibility to (temporarily) embargo your data: during the embargo period, the description of the data set is often published, but the data itself is not available for reuse by others. If you want to configure an embargo on the data in DANS Easy (maximum of two years), you can do so in the field 'Date available'.
At DANS Easy you can choose for the access category 'Restricted Access'. Others can then request permission from you to view and download your data via the data archive. They have to justify their application. Before granting access, you can impose additional conditions on the other person.
Whether you choose open, closed or restricted/controlled depends largely on what is appropriate given the nature of the data and ownership (whether you have the right permissions). Reasons to limit data sharing:
- The data constitute or contain personal data, i.e. any information relating to an identified or identifiable living individual (directly or indirectly). If possible, anonymise this data.
- You otherwise have a duty or have agreed to keep the data confidential (for example, by signing a confidentiality agreement or an agreement with a confidentiality clause).
- The data could potentially cause damage (e.g. to endangered species, vulnerable locations or groups, public health, national security, etc.) if made public.
- The data are not generated in the course of your own research project, but are provided by another party (e.g. commercial provider, government agency, etc.).
- Research data – or rather the form in which they are expressed – may, under certain circumstances, be protected by copyright and/or database law.
- The research data may constitute a patentable invention or contain commercially valuable know-how. If they are shared (prematurely), this could jeopardise your valorisation efforts.
Is there a legitimate reason as described here? Then subsidy providers, institutions and reputable journals/publishers will deviate from their conditions of data sharing. However, as researchers, you are expected to provide the appropriate justification, for example in the data management plan or in a data accessibility statement that you include in your publication. A data accessibility statement is usually included in the 'Acknowledgment' section of your article. Such a declaration indicates where and how the data on which the article is based can be consulted. And if the data cannot be made available, why.
When publishing research data, it is important to let potential users know in advance what they are allowed to do with the data. Licences are an effective way of doing this. A good data archive will normally apply a licence to each data set it contains. Usually, you can make a choice when you deposit data. DANS Easy offers a whole list. Each licence is linked to the website with more information about that specific licence.
Good practice is to apply a standard and open licence for open research data, as this ensures legal interchangeability and the widest possible reuse. One of the standard licences that is widely used for research data is the series of Creative Commons (CC) licences.
- The CC Attribution Licence (CC BY) gives others maximum freedom to reuse the data (i.e. copy, redistribute, adapt), provided they give proper acknowledgement.
- THE CC Attribution-ShareAlike (CC BY-SA) licence gives the same freedom to others as CC BY, but requires redistribution of derivative works (based on your data) under the same licence.
- You could use the CC licence Attribution-NonCommercial when applying for a patent or otherwise commercialising your research. But in the setting of research at The Hague University of Applied Sciences, where the research is done with public funds, using your data commercially yourself is not common. There is hardly any question of an open licence.
- The CC Attribution-NoDerivatives licence allows others to use the data and share it as is but not to modify or transform it in any way. However, with data, this is tantamount to 'All rights reserved' and others can do little more than verify results already derived from the data. There is hardly any question of an open licence.
The CC licences are general licences. Very well suited for data, but also for publications, among other things. There are licences that apply specifically to data. These are so-called Open Data Commons, subdivided into three licences:
- Public Domain Dedication and Licence (PDDL)
- Attribution Licence (ODC-By)
- Open Database Licence (ODC-ODbL)
And in case your data set contains software, you can use the Open Source Licences
Need help selecting a suitable standard licence? Check out this EUDAT licence selection tool.
There are various ways of increasing the awareness and accessibility of your data. When you create research output, you can add your data as additional material. This works for posters, papers or other publications. An enhanced publication is an online publication in which an article is accompanied by e.g. (links to) research data, illustrations, visualisations, internet sources and comments.
Have you created or collected a special data set? Or is the methodology used innovative and worthy of more extensive discussion than just a short paragraph in an article? Then you might consider publishing an article in a data journal. This is valuable for the following reasons:
- Together with your article, your data set is peer-reviewed and thus receives scientific accreditation for reuse.
- With the article about your data set, you make your methodology and results even more transparent.
- Publication in a data journal is another accessible route leading to your data set. This will increase the awareness of the publication.
Examples of data magazines:
The publication (public sharing) of data sets increasingly counts as a citable contribution to the research curriculum. The citation of research data is part of the Altmetrics movement (alternative metrics), which states that the impact of your research is determined by (the references to) a wide range of research outputs such as data sets, software, blog posts, presentations, etc.
- makes data easier to find;
- promotes reproducibility;
- promotes the reuse of data;
- makes it possible to track the impact of the research data;
- creates a publication structure that allows for long-term availability of data;
- provides a structure within which the impact of the data can be traced back to the researchers who created it.
To be citable, a data set needs a persistent identifier (PID). When publishing data to a data archive such as DANS Easy, a PID is automatically assigned to the data set. The PID means that your data can always be found, even if you change the name and location. Broken links or 'page not found' messages are prevented by the use of a PID in data retrieval. Digital Object Identifiers (DOIs) are widely accepted as the persistent identifier for data citations. DANS Easy also uses DOI.
An example of a standard data citation to a data set in DANS Easy:
Coenen, M.J.H. (Radboud University) (1) (2022) (2): Data from: Genome-wide association study of nociceptive musculoskeletal pain treatment response in UK Biobank. (3) DANS. (4) https://doi.org/10.17026/dans-xns-un6c(5)
- Author: the person who has created the data set, individuals and/or organisation.
- Date: the year or exact date the data set was published.
- Title: the name given to the data set or the name of the research project.
- Publisher: the archive responsible for making the data set available.
- Online location: DOI or other persistent identifier.
Other possible elements in data citation are:
- Editor: person or persons (other than the author) responsible for the compilation, editing and correction of the data set.
- The format of the data files.
- Version, if more than one version of the data set has been deposited.
When cooperating with other institutions or organisations, it will be necessary to examine together which institutions archive which data and where, and whether and how the sharing of data is facilitated. These agreements must be included in the (joint) data management plan but also laid down in writing in a consortium agreement. Periodically check that all parties continue to observe the procedures that have been agreed upon.
Support by a Data Steward
Researchers can receive support in research data management. The research data steward(s) of THUAS can be contacted at email@example.com.