The aim of this tutorial is to provide an introduction to data manipulation in R, primarily using tools from the tidyverse. Given that lots of people are currently moving their data collection procedures online, we will use an output file from Gorilla as an example. However, the tools should readily apply regardless of your data collection software.1
In this first part of the tutorial, we will cover the basics of extracting the relevant data from your output files. In Part 2, we will cover some extra tips and tricks for monitoring sample size during online data collection, scaling up the tools to more complex datasets, and re-organising your data flexibly.
This introduction is aimed at beginners, with very little experience coding in R.
However, it does assume that you can find your way around R Studio (i.e., familiar with the script and console, how to run one/several lines of code), and navigate your working directory to access your data. A basic understanding of functions and arguments will also help. Some recommended resources on this are:
If you prefer some light interactive activities, you can also try:
As with most programming tasks, there are multiple different ways of achieving the same thing. Within the R programming language, there are some clusters of approaches (“dialects” or “grammars”, if you like). It’s not essential to acknowledge the difference - mostly you will settle for whatever tools you can get to work! However, you will see when searching your issues on StackExchange that many people provide alternative solutions, and will often refer to using Base R, the tidyverse (including dplyr, tidyr packages), and data.table. This document will focus on data manipulation using the tidyverse.
What are the pros?
…and the cons?
For me, learning to use the tidyverse took me from years of copying and pasting random bits of code from the internet, to being able to sit down and write a data processing script from scratch. I hope you’ll find it helpful too!
Ready? Let’s begin!
Thank you to Catia Oliveira, Ruth Lee, Claudia Mazzuca, and Jon Flavell for trialling and providing valuable feedback on this tutorial. Additional feedback would be gratefully received via email to firstname.lastname@example.org.