Analyse your Data
For your research to be transparent and reproducible, a vital part is to provide data cleaning instructions and analysis code (ideally by using nonproprietary software). See the PRO Initiative's basic guidelines for making your analyses public for a few information. This collection of resources aims at providing some helpful links to facilitate and improving your analysis sharing practices.
Analysing your data with free software such as R enhances reproducibility without the limitations of proprietary software. A common way to use R is by writing analysis code and graphics code in the RStudio interface. It further implements R Markdown, with can be used to create documents, reports, and presentations that are fully reproducible.
Here are some helpful links for researchers who want to learn R:
- RStudio links to different online guides to learning R: https://www.rstudio.com/online-learning/
- Swirl is an interactive learning tool which can be directly embedded in RStudio.
- RStudio cheat sheets: https://www.rstudio.com/resources/cheatsheets/
- Grolemund & Wickham's R for Data Science
- Hadley Wickham's Tidy Data
The PRO Initiative gives a few basic guidelines for authors on how to facilitate reproducibility when sharing your analyses: https://opennessinitiative.org/making-your-analyses-public/
For a hands-on example of reproducible research, Lars Vilhuber created a Replication Tutorial in which he generates a fully reproducible example. More specific collections of useful advice on this topic can be found in the following sources:
- Code and Data for the Social Sciences: A Practitioner’s Guide by Matthew Gentzkow & Jesse M. Shapiro
- R for Reproducible Scientific Analysis: A course by Thomas Wright and Naupaka Zimmerman
- The Reseach Cycle course on principles on reproducible research and practical training in statistical programming with R, taught by Dale Barr and Lisa DeBruine
- If you want to make your code citable, see this GitHub Guide: https://guides.github.com/activities/citable-code/
- The DRESS Protocall by Project TIER describes what the final documentation of your study should consist of (for empirical social sciences).
- Codebook Cookbook by Ruben Arslan: R-package (and online-tool) to create a codebook for your dataset.
- Creating a codebook within SPSS: https://libguides.library.kent.edu/SPSS/Codebooks
"Works on my Machine" Error
In his paper (available here), Nicholas Eubank makes the case for increasing reproducibility by testing files on a different computer. Testing or even sharing code via cloud-based platforms prevents deficits in reproducibility that occur when code runs on the researcher's local platform but not on others'. Avoid the so called WOMME by using tools like:
- Rouder, Haaf, and Snyder (2018) wrote a helpful tutorial on how to organize a lab in the face of Open Science practices: https://psyarxiv.com/gxcy5
- R Markdown: Create fully reproducible documents that combine code execution and documentation. A big advantage is the variety of output format: documents (e.g. Word, PDF, interactive R notebook, HTML), presentation slides, shiny apps, websites and more. Multiple languages including R, python, and SQL can be used.
- GitHub serves as data repository and active research workflow tool. Tracking of contributions of others enables version control on your files. This tool is especially useful for a research team that collaborates on developing code.
- As one of multiple features, Open Science Framework's version control system and its live-editing mode facilitate collaboration within the research team.
- p-Hack like a pro by Felix Schönbrodt
- Do's and Don'ts of Data Analysis by Felix Schönbrodt
- An overview on questionable research practices (QRP) by Ulrich Schimmack can be found here.
- Check out: http://shinyapps.org/apps/p-hacker/ for an interactive tool on p-hacking
The Jupyter Notebook is an open source web-application for interactive computing. Virtual notebooks support over 40 programming languages, can be shared, collaboratively edited and can return interactive output. Uses vary from data cleaning, transformation and visualization to machine learning, statistical modeling and more.