Running all scripts in an R project, in different folders which contain multiple folders
As the title of this post suggests, we will be discussing how to run all R scripts in a project that are located in different folders. The project and the main script (run_scripts.R) are in a parent folder, while each child folder contains numerous subfolders, including a script_n.R file.
The problem at hand is to write a single script, let’s call it run_all_scripts.R, which can run all the other scripts in the child folders. Each of these scripts writes data into a specific location within their respective folders and subfolders. We will also be using R packages like misty and MplusAutomation for this task.
Background
R is a popular programming language and environment for statistical computing and graphics. While it has many features that make it easy to work with data, it can become cumbersome when trying to manage multiple scripts that need to access different folders and files.
Misty is an R package that provides some useful utilities for working with files and directories in R. One of its functions allows us to set the working directory within an R script using misty::setsource().
When we try to run a script from another location, like this:
source("child_folder/script_n.R", chdir = T)
We get an error because the script is being executed from its parent folder (the main script’s directory) and not from within the child folder itself. This means that misty::setsource() does not recognize it.
Another package we can use to achieve this task is MplusAutomation, which provides a simple way to execute multiple scripts in different locations using the system function with the intern = TRUE argument.
Finding All R Scripts
Before running all our other scripts, let’s first find all of them. We can create a script called find_all_scripts.R that finds all .R and .r files within a given folder (in this case, child_folder) and stores them in a vector. We’ll then use the source() function to run each one.
## Finds all .R and .r files within a folder and sources them
sourceFolder <- function(folder, recursive = FALSE, ...)
{
# Create a list of all R files
files <- list.files(folder, pattern = "[.][rR]$", full.names = TRUE, recursive = recursive)
if (!length(files))
stop(simpleError(sprintf('No R files in folder "%s"', folder)))
src <- invisible(lapply(files, source, ...))
message(sprintf('%s files sourced from folder "%s"', length(src), folder))
}
sourceFolder("child_folder", recursive = TRUE)
This script takes a folder and recursively looks for all .R and .r files within it. It then sources each file in turn, printing out a success message for each one.
Running All Scripts
Once we have found all of our other scripts, let’s create the run_all_scripts.R script that will run them all:
## Runs all R scripts in child_folder and its subfolders
# Set source to correct working directory
setwd("child_folder")
# Run each R script in turn
files <- list.files(pattern = "[.][rR]$", full.names = TRUE, recursive = TRUE)
for (file in files) {
# Only run if it's a .R or .r file
if (file == file) {
cat(file, "\n")
# Simulate data for each script
mod0 <- read.RData("script_n.R") # Replace with your script name
dat <- simulateData(mod0, sample.nobs = nobs)
write.table(dat, paste0("_Data_/Data_", file, ".dat"), col.names = FALSE,
row.names = FALSE, quote = FALSE)
}
}
This script works by first setting the source directory to child_folder, and then using a for loop to find each .R or .r file in that folder. When we run into one of these files, it simulates data using its corresponding function (simulatedData), writes this data out as a table with a unique filename.
We can easily expand on the above script by running other functions from different scripts and writing their results to separate locations within our project’s structure.
Handling Misty::setSource() Commands
One thing we might want to handle is if any of our child scripts use misty::setsource() commands, since this function does not work in the way it seems to with source(). We could then run each one individually using system(), but that’s probably a bit of overkill for what we’re trying to accomplish.
Handling MplusAutomation Commands
Another potential issue is if any of our child scripts use some command from MplusAutomation. To handle this, we would need to make sure these commands work in R scripts run from child_folder, which could involve using system() as mentioned above or some other workaround.
Final Notes
Overall, using sourceFolder to find all R files and then running them individually is a good way to approach the problem of running multiple scripts at once. Just be aware that if any of these scripts use commands that don’t work in the same context (like misty::setsource()), we’ll need some kind of workaround.
Last modified on 2024-06-13