Guidelines to improve your workflow in R scripts.
rm(list=ls())
will empty your complete workspace. Be aware of that if you get a script with rm
from a colleague. Saving workspace automatically can be switched of in R Studio.
Have sections for
Add comments to your script
Use also blank lines to format your script.
Use options()
to configure your R output.
# ---------------------------------------------
# Name and aim of the script, author, date, version, contact
# Take some lines to describe what the script is doing
# If a scripts depends on a not well-known or developer package
# then state the package version here.
# ---------------------------------------------
# ----- Clean workspace and options -----
rm(list=ls())
options(max.print=10e5, # max. output of lines
digits=5, # how many digits shown
scipen=999) # handling of scientific numbering 10e3
# ----- Load required packages -----
library(tidyverse)
library(lubridate)
library(zoo)
# ----- Path-----
path <- "/directory/path/to/my/folder/"
w.path <- "/directory/path/to/my/folder/output/"
# ---------------------------------------------
# 0. Functions
# ---------------------------------------------
cv <- function(x) sd(x) / mean(x)
# ---------------------------------------------
# 1. Main programm
# ---------------------------------------------
# -1.1---- Load data -----
df <- read_tsv("file.tsv")
# -1.2---- Preparation of data -----
df1 <- df %>% ... %>%
# -1.3---- Analysis 1: Filter to find dates -----
df2 <-
# -1.4---- Analysis 2 (with backup): Arrange to find maximum -----
df3 <-
write.tsv(df2...)
# -1.5---- Analysis 3: Prepare for plotting (join_) -----
df_final <-
# -1.6--- Write new data.frame in file -----
write_tsv(df_final ) # format is tsv in w.path folder
Try to code within a maxium line width of 80 characters (70 is even better). If commands are longer split them into two or think about a line break within the command or to encapsulate one command into two. In R Studio you can configure a helper guide indicating where the maximum number of characters is reached.
In tidyverse
you can have a line break after each pipe %>%
to improve readability.
df %>%
group_by(date) %>%
summarise(x = mean(y)) %>%
arrange(-x) %>%
slice(1:3)
Code should be indented according to hierachy.
for (i in vector) {
z <- z + i
for (j in vector2) {
zz <- z + j
}
}
if (condition) {
one or more lines
} else {
one or more lines
}
If you re-open your script in 1 month or 1 year you should be able to understand what the script is doing in less then 5 minutes. In this course it would be helpful to implement different exercises in different scripts/files. Normally you put one task into one file but if the script gets longer and longer then
Including new functions and commands in existing scripts often needs new packages to be loaded. Included them continiously in a list at the top of the script. It can be really frustrating to make it two-thirds of the way through a long script only to find out that a dependency hasn’t been installed.
Use a consistent style within your code. For example, name all data.frames something ending in _df
, all matrices ending with _mat
. Consistency makes code easier to read and problems easier to spot. Use numbers or an indication of content for subsequent objects, e.g.:
df_read, df1, df2, df3, df4, df_write
df_read, df_all, df_day, df_hr, df_hr2, df_hr_avg, df_write
Don’t use common package or function names for your variables. There are also a couple of reserved variable names, e.g. mean()
, n()
, c()
etc. Try to reduce your character count for variables on 7, 6 or 5 characters in average.
Use <-, not =, for assignment.
x <- 1:10
After some time of coding you will develope useful helper functions or specific settings that you want avaiable for all your scripts. You can put those settings and functions into a separate R script and load this in the first lines of your daily scripts.
source("/directory/my_genius_functions.R")
…to be continued…