Skip to Main Content

Chemistry 470

A research guide for Chemistry Thesis Students at Reed College

Data Storage and Backup

  • Keep multiple copies: two copies of your data, in two different locations, one of them on the cloud (Reed’s like Reed’s Google Drive) unless not allowed due to data size or privacy.  Email if you think you might have special circumstances
  • Backup your data regularly. Backup recommendations from Reed.
  • Regularly audit your backups: are they actually happening? Are they happening the way you intended??? Check for expected number of files, matching file size between copies, can the files be opened.
  • Keep the 10 Steps to Safer Computing in mind

Folder and File Management

  • Consistency is key!
  • Pick a system and stick with it. It needs to make sense to you so that you will use it.
  • Follow a consistent folder system for storing data - ideally one that makes sense without deep knowledge of the project.
  • Document your system in your lab notebook
  • Do not rely on the folder structure to provide critical context for files. Files copied elsewhere will lose this information.
  • Avoid spaces and characters other than - _  (dash and underscore) in your filenames - - not all software applications can handle them and they might get changed by the system
  • For dates, use the format YYYYMMDD, YYYY-MM-DD, or YYYY_MM_DD. With this date format, files sorted alphabetically will also sort chronologically.
  • Keep track of different file versions by using a suffix to represent the version number. Example: ProjectName_Instrument_Condition_YYYYMMD_v01.txt
  • Decide in advance what kind of changes will render a version change
Use this Instead of this
Project1/SiteB/SiteB_2021-09-15_rawdata.txt Project1/SiteB/2021/09/15/rawdata.txt
sequence_readings_2021-09-15.csv sequence_readings_3rd_sept.csv
SiteB_2021-09-15_processed_v02.r SiteB_2021-09-15_processed_redo.r

Data Documentation

  • Document the decisions you make about your data during the research process. It is easy to forget and you may need to refer to it later during the thesis writing process.

  • Describe the content of your data files in a data dictionary. List variables with definitions, units of measure, scope notes, coded values. Document how missing values are represented. List the file formats you are using. 

Data Analysis

Data at Reed provides a lot of resources for working with R. Still need help? Send an email to them team,

After your thesis is handed in

  • If possible, migrate your data out of software specific formats to an open format (such as a csv). This is especially true for files that require an expensive and specailized software program to open - will you still be able to access your data after you graduate?
  • Consider uploading the data to the Reed College Dataverse