project

Start a research project under version control

install R, Rstudio and git
create a well-named github repository (https://github.com/new), initialize with Readme
Code -> Copy URL (SSH)
Rstudio -> File -> New Project -> Version Control -> Git: paste URL, set subdirectory, create project.
Rstudio -> File -> New File -> R script / quarto document
follow good practices
work, then commit changes and push to github

Organize a research project

in the git folder you could have something like:

project/
├── raw_data_large/
├── reduce_data_size.R
├── raw_data_small/
│   ├── file1.csv
│   └── file2.csv
├── process_data.R
├── data_full.csv
├── functions.R
├── test_functions.R
└── main_file.qmd

raw_data_large/ (only locally, i.e. listed in .gitignore)
reduce_data_size.R: read big files, select interesting bits, store in raw_data_small/ with write.csv. If you have (many) text entries with commas but no tabstops, use write.table(..., sep="\t", row.names=FALSE, fileEncoding="UTF-8") instead.
process_data.R with

data_csv <- lapply(csvfiles, read.csv)
data_full <- Reduce(merge, data_csv)
write.csv(data_full, "data_full.csv")

functions.R with

helper <- function(x) x
analyze <- function(df) sapply(df, helper)
visualize <- function(column) plot(analyze(full_data)[,column])

test_functions.R with

source("functions.R")
helper(input) == expected
checkmate::assert_number(helper(example))
testthat::expect_equal(analyze(example_df), expected)
res <- analyze(example_df)
if(res != expected) stop("analyze(example_df) should be ", expected, ", not ", res)

main_file.qmd with code chunks for

full_data <- read.csv("data_full.csv")
source("functions.R")
visualize("columnname")