arrow-left

All pages
gitbookPowered by GitBook
1 of 2

Loading...

Loading...

Data Management Best Practices

The following are some best practices that should be considered prior to starting a data project and provide guidance for managing data in the Research Data Lifecycle's post-active research stage.

hashtag
Data Storage

To prevent data from being lost to incompatibility, store it as formats and on hardware that are open standard, not proprietary.

hashtag
Data Documentation

In your documentation, use to record details about the data collection process (e.g., a study) such as:

  • its context

  • the dates of data collections

  • data collection methods, etc.

hashtag
Sharing

Sharing data makes it possible for researchers to validate research results and to reuse data for teaching and further research. Sharing is also required by an increasing number of funders and publishers. Funders seek to maximize the impact of the research they fund by encouraging or requiring data sharing.

Depositing to an established repository will help to ensure that data are consistently available and accessible, and preserved for future use. Choosing a data repository can be determined by various factors, such as discipline, accepted data format, data sharing policies and etc. You can obtain assistance from to identify a repository to publish your research data.

Type of Data

Recommended Formats

Formats Acceptable

Plain Text

txt, pdf/A xml

docx, doc, rtf

Tabular Text

csv, tsv

xlsx, xls, sav, dta

Image

tiff, JPEF2000

jpg, psd, png, gif, bmp

Audio

wave, aiff

mp3, wma, aac, ogg

Archiving

zip

rar

Video

motion jpg 2000, mov, avi

mpeg-4

Subject/Discipline

Example Archive/Repository

Ecology

Dryadarrow-up-right

DNA Sequences

GenBankarrow-up-right

Chemistry

Cambridge Crystallographic Data Centrearrow-up-right

Social Sciences

ICPSRarrow-up-right

arrow-up-right
metadataarrow-up-right
Data Services arrow-up-right

¶ Research Data Lifecycle

Research data has a "life cycle" that describes and identifies the steps to be taken at the different stages of the research cycle to ensure successful data curation and preservation. The research data lifecycle can be divided into two main parts, Active Research Stage and Post-Active Research Stage.

During the active research stage, research activities mainly include data planning, acquiring, and analysis; while during the post-active research stage, the focus is on long-term data preservation, sharing, and re-use (Also see, Data Management Best Practices).

hashtag
Active Research Stage

Planning - The stage it is determined how data will be managed. Typical considerations include:

  • The type and format of data will be used.

  • Whether any collected data will involve human subjects.

  • Where the data will be stored and whether it will be re-used or shared at the end of the project.

Acquire (or "Find") - The stage of when data is found or collected. There are a few steps that can help you develop your approach:

  • Define your topic as specifically as possible. For example:

    • What is the average SAT score by race for the last 10 years?

  • Identify the unit of analysis, meaning what you will specifically be analyzing and by what measure. For example:

Collaborate and Analyze - The stage of your (and your collaborators') acitve use of the research data.

  • What data processing tool(s) are you using? e.g., Excel, Stata, SPSS, Python, R

  • What kind of data are you working on? e.g., numerical, categorical, text

  • What kind of data tasks are you performing? e.g., data cleaning, descriptive statistics.

hashtag
Post-Active Research Stage

Store and Preserve - The planning stage for how the data will be archived for long-term preservation. Considerations include:

  • What archive/repository/database have you identified as a place to deposit data? e.g., Dataverse

  • How long will data be kept beyond the life of the project?

  • What metadata schema will you use? Established domain-specific repositories will usually only accept data that meet their standards for file formats, documentation and metadata, e.g.,

Share (or Publish) - The stage in which data is shared (or re-used) after a project. Some considerations include:

  • Through what resources/platforms the data be made available, e.g., a server or data repository

  • When the data will be made available, e.g., immediately or after a 12 month embargo

  • If the dataset was collected by the researchers, how it will be licensed to others e.g., a Creative Commons licenses

Discovery and Re-Use - this stage involves facilitating data sharing, which refers to publicly sharing data from completed (parts of) research, and having data reusable, i.e. outside your project or research team.

  • Whether any permission restrictions need to be placed on the data, e.g., non-commercial use

  • What are the intended or foreseeable uses of the data and who are the users

The following video explains the data management activities that can take place at different stages of the research process.

  • Geographic unit, e.g., local, national, international

  • Frequency, e.g., annual, quarterly, daily

  • Unit of analysis, e.g., individual, institution

  • Time series, e.g., cross-sectional, longitudinal (or panel)

  • Identify data sources. For example:

    • Government agencies, e.g., census

    • Organization, e.g., International Monetary Fund (IMF)

    • Commercial Subscription Services, e.g., Inter-University Consortium for Political and Social Research (ICPSR), Statista

  • Are you working in a team and is there a designated project manager?

  • Are you looking for a web-based tool for working on your data?

  • DublinCorearrow-up-right