Research data has a "life cycle" that describes and identifies the steps to be taken at the different stages of the research cycle to ensure successful data curation and preservation. The research data lifecycle can be divided into two main parts, Active Research Stage and Post-Active Research Stage.
During the active research stage, research activities mainly include data planning, acquiring, and analysis; while during the post-active research stage, the focus is on long-term data preservation, sharing, and re-use (Also see, Data Management Best Practices).
Planning - The stage it is determined how data will be managed. Typical considerations include:
The type and format of data will be used.
Whether any collected data will involve human subjects.
Where the data will be stored and whether it will be re-used or shared at the end of the project.
Acquire (or "Find") - The stage of when data is found or collected. There are a few steps that can help you develop your approach:
Define your topic as specifically as possible. For example:
What is the average SAT score by race for the last 10 years?
Identify the unit of analysis, meaning what you will specifically be analyzing and by what measure. For example:
Geographic unit, e.g., local, national, international
Frequency, e.g., annual, quarterly, daily
Unit of analysis, e.g., individual, institution
Time series, e.g., cross-sectional, longitudinal (or panel)
Identify data sources. For example:
Government agencies, e.g., census
Organization, e.g., International Monetary Fund (IMF)
Commercial Subscription Services, e.g., Inter-University Consortium for Political and Social Research (ICPSR), Statista
Collaborate and Analyze - The stage of your (and your collaborators') acitve use of the research data.
What data processing tool(s) are you using? e.g., Excel, Stata, SPSS, Python, R
What kind of data are you working on? e.g., numerical, categorical, text
What kind of data tasks are you performing? e.g., data cleaning, descriptive statistics.
Are you working in a team and is there a designated project manager?
Are you looking for a web-based tool for working on your data?
Store and Preserve - The planning stage for how the data will be archived for long-term preservation. Considerations include:
What archive/repository/database have you identified as a place to deposit data? e.g., Dataverse
How long will data be kept beyond the life of the project?
What metadata schema will you use? Established domain-specific repositories will usually only accept data that meet their standards for file formats, documentation and metadata, e.g., DublinCore
Share (or Publish) - The stage in which data is shared (or re-used) after a project. Some considerations include:
Through what resources/platforms the data be made available, e.g., a server or data repository
When the data will be made available, e.g., immediately or after a 12 month embargo
If the dataset was collected by the researchers, how it will be licensed to others e.g., a Creative Commons licenses
Discovery and Re-Use - this stage involves facilitating data sharing, which refers to publicly sharing data from completed (parts of) research, and having data reusable, i.e. outside your project or research team.
Whether any permission restrictions need to be placed on the data, e.g., non-commercial use
What are the intended or foreseeable uses of the data and who are the users
The following video explains the data management activities that can take place at different stages of the research process.
The following are some best practices that should be considered prior to starting a data project and provide guidance for managing data in the Research Data Lifecycle's post-active research stage.
To prevent data from being lost to incompatibility, store it as formats and on hardware that are open standard, not proprietary.
In your documentation, use metadata to record details about the data collection process (e.g., a study) such as:
its context
the dates of data collections
data collection methods, etc.
Sharing data makes it possible for researchers to validate research results and to reuse data for teaching and further research. Sharing is also required by an increasing number of funders and publishers. Funders seek to maximize the impact of the research they fund by encouraging or requiring data sharing.
Depositing to an established repository will help to ensure that data are consistently available and accessible, and preserved for future use. Choosing a data repository can be determined by various factors, such as discipline, accepted data format, data sharing policies and etc. You can obtain assistance from Data Services to identify a repository to publish your research data.
Type of Data
Recommended Formats
Formats Acceptable
Plain Text
txt, pdf/A xml
docx, doc, rtf
Tabular Text
csv, tsv
xlsx, xls, sav, dta
Image
tiff, JPEF2000
jpg, psd, png, gif, bmp
Audio
wave, aiff
mp3, wma, aac, ogg
Archiving
zip
rar
Video
motion jpg 2000, mov, avi
mpeg-4
Subject/Discipline
Example Archive/Repository
Ecology
DNA Sequences
Chemistry
Social Sciences