Forms and Files: Running Form Estimation in R | r-bloggers

- Advertisement -

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers], (You can report a problem about the content on this page here) Want to share your content on R-Blogger? Click here if you have a blog, or click here if you don’t.

- Advertisement -

There are lots of ways for runners and cyclists to analyze training data. An important question that most fitness enthusiasts want to know is “How am I doing?”.

- Advertisement -

“How are you doing” is known as the form.

Unsurprisingly, form can be estimated in a number of ways. One method is using training stress scores (acute training loads and chronic training loads) to assess form as a form of training stress balance. The abbreviations of these terms are expressly copyright(!) by TrainingPeaks. That’s why I would refer to acute training load as fatigue, chronic training load as fitness and training stress balance as form. Some notes on how these are calculated can be found below.

- Advertisement -

Let’s calculate these scores for running using R.

The plots above show my scores for this year so far. The way these scores are calculated, it takes 7 days to get a meaningful fatigue score and 42 days for a meaningful fitness score, i.e. the calculation starts from 0 on New Year’s Day, when in fact, I December to Fitness and Fatigue. Still it is a good way to track form.

So what does this tell us? This year I spent a lot of time in the gray zone, only entering the optimal zone during intense activity. That’s because my baseline activity (and therefore fitness) is pretty high. That means if I want to target improvement I have to do it from time to time. This is where the runner interleaves intense periods (blocks) with less active spells.

how to calculate data

Using summary data from Garmin Connect (downloadable as CSV), it is necessary to calculate a runner’s average heart rate and run time for an activity date.

## The purpose of this script is to load and process CSV data from the Garmin Connect website. ## This script will load all csv files in data/ (in current WD) and filter for running (and treadmill running) ## Place one or more Garmin csv outputs in data folder for inclusion. Dates for activities may be overlapping ## Duplicates are dealt with, so you can just keep adding the csv with the latest data and use the script again. ## Use of `find_form(from, to)` enables the user to check his running form within the specified window. require (ggplot2) require (hms) libraries (reshape2) libraries (patchwork) ## setup preferred directory structure in wd ifelse(!dir.exists(“data”), dir.create(“data”), “folder” already exists” ) ifelse(!dir.exists(“Output”), dir.create(“Output”), “Folder already exists”) ifelse(!dir.exists(“Output/Data”), dir .create(“Output/Data “), “Folder already exists”) ifelse(!dir.exists(“Output/Plots”), dir.create(“Output/Plots”), “Folder already exists” ) ifelse(!dir.exists(“Script” ), dir.create(“script”), “Folder already exists”) ## function getWindowActivities <- function(activity, fromStr, toStr, df) { # of activity filter for df_window <- subset(df, grepl(tolower(Activity) ),tolower(df$Activity.Type))) # Activities within window Date <- as.Date(fromStr) to Date <- as.Date (toStr) df_window <-subset(df_window, as.Date(df_window$Date) >= from date and as date(df_window$Date) <= toDate) # put them in order df_window <- df_window[order(as.numeric(df_window$Date)),] return (df_window)} makeDateDF <- function (fromStr, toStr) { temp <- seq(as.Date(fromStr), as.Date(toStr), by="days") df <- data.frame(Date = temp , atl = rep(0, length(temp)), ctl = rep(0, length(temp))) return (df) } data", pattern = "*.csv", full.names = TRUE) df_all <- read.csv(all_files)[1]header = TRUE, stringAsFactors=FALSE) df_all <- subset(df_all, select = c(Activity.Type,Date,Title,Distance, Time,Avg.HR)) for (filename in all_files)[-1]) { df_temp <- read.csv(filename, stringAsFactors=FALSE) # subset data because Garmin can add or remove columns and we don't need them all df_temp <- subset(df_temp, select = c(Activity.Type,Date) , title, distance, time, average hr)) df_all <- rbind(df_all, df_temp) } # remove duplicates df_all <- df_all[!duplicated(df_all), ] # Format date column to POSIXct df_all$Date <- as.POSIXct(strptime(df_all$Date, format = "%Y-%m-%d %H:%M:%S")) # Average HR to numeric df_all replace $Avg.HR <- as.numeric(df_all$Avg.HR) # replace NA with average HR # retrieve activities that match the activity type in the time window of interest df_all <- getWindowActivities(activityStr, fromStr,toStr, df_all) # Add a column containing the load for each activity # One way to calculate load is to multiply the time in hours by the average HR and add 2.5 times the average HR # this is a of y = ax + b is related to load by = 0.418, b = -150 df_all$load <- 0.418 * ((as.numeric(lubridate::hms(df_all$Time)) / 3600 * df_all$Avg.HR) + (2.5 * df_all$ Avg.HR)) - return 150 (df_all)} sumDays <- function(df, daydf) { df$Date <- as.Date(df$Date) tempdf <- aggregate(load ~ date, data = df, sum) newdf <- merge(daydf, tempdf) , all.x = TRUE) newdf[] = 0 return (newdf)} calculate TL <- function (df) { for (i in 1:nrow(df)) { # add today's load to training load df$ATL[i] <- df$atl[i] +df$load[i] df$ctl[i] <- डीएफ$सीटीएल[i] + डीएफ$लोड[i] for (j in (i + 1) : (i + 42)) {if(j > nrow(df)) { break } df$ATL[j] <- df$atl[i] * exp(-(g)/7) df$ctl[j] <- df$ctl[i] * exp(-(g)/42) } } df <- df[,1:3] df[2] <- df[2] / 7 df[3] <- df[3] / 42 df$TSS <- df$CTL - df$ATL return(df) } # run analysis find_form <- function(from, to) { # load data and calculate load for each activity mydata <- process_load( "running ", from, to) # Create a data frame representing each day in our time window tl <- makeDateDF(from,to) # The sum of the loads for each day df <- sumDays(mydata,tl ) # calculate training load df <- calculate tl(df) # data frame for form zone rects <- data.frame(ystart = c(20,5,-10,-30,-50), yend = c (30,20,5,-10, -30), xstart = rep(as.Date(from), 5), xend = rep(as.Date(to), 5), col = factor(c("Transition") ", "Fresh", "Gre Zone" , "optimal", "high risk"), level = c("infection", "fresh", "grey zone", "optimal", "high risk")) # first plot = fitness and fatigue p1 <- ggplot(df, aes(x = date)) + geom_area(aes(y = CTL), fill = "#58abdf", alpha = 0.2) + geom_line(aes(y = CTL), color = "#58abdf") + geom_line(aes(y = atl), color = "#5e3cc4") + geom_text( aes(x = as.Date(to), y = 0, vjust = "inward", hjust = "inward", labe l = "fitness"), color = "#58abdf") + geom_text(aes(x = as.Date(from), y = max(ATL), vjust = "inward ", hjust = "inward", label = "fatigue"), color = "#5e3cc4") + labs(x = "", y = "training load per day") + theme_bw() + theme(Legend. position = "None") # second plot = form p2 < - ggplot(df, aes(x = date, y = TSS)) + geom_line(color = "#0a0a0a", ) + geom_rect(data = rects, inherit . aes = f, aes(xmin = xstart, xmax = xend, ymin = ystart, ymax = yend, fill = col), alpha = 0.2) + scale_fill_manual(values ​​= c("#DDB140", "#58ABDF" , "#A3A3A3", "#67C75D", "#CB2A1D")) + labs(x = "", y = "form") + theme_bw() + theme(legend.tss_",from,"_",to ".png"), plot = p3, width = 8, height = 4, dpi = "print") } # parse find_form("2022-01-01", "2022-11-05")

I have functionalized the code to make it easier to understand how it works. In short, the data is read in, cleaned up a bit and then the “load” is calculated. We only look for plots within a certain window, so we subset the data and then calculate the total load for each day within this time window. Then stress scores are calculated and plots are generated using ggplot and patchwork.

Some Notes on Stress Scores

Although stress score sums are copyrighted, what they do is not very mysterious. Fatigue is how tired you are feeling that week and fitness is how much training you have done in six weeks. In other words, fatigue is an exponentially weighted average of weight over 7 days while fitness is an exponentially weighted average of weight over 42 days. Form is the difference between fatigue and fitness. There’s probably a function in R to do a fast weighted average, but I just wrote something quick to do it in the function calculateTl().

So how do we calculate load? We just needed a measure of how stressful the activity was. We can’t take distance or time (because it doesn’t really tell us how hard we work). Speed ​​would be better but again the terrain could be hilly or flat… if we were measuring cycling performance and we had power measurements, that would be ideal. Instead of running we can use heart rate data (as long as we have it for all activities).

The training stress score in the interval looks like this. I used his estimate of the load to back-calculate and apply a similar metric to my data in R.

In short, the average heart rate for the activity multiplied by the duration of the activity gets us very close to an estimate of the load. This makes perfect sense: If you run for 30 minutes at a given heart rate and then run at the same average heart rate for 1 hour another day, that should be twice the load. However, it was not completely linear and had some outliers, i.e. particularly difficult or easy runs. So it needed a correction, and voila, I had something estimating the “load” metric used by Keep in mind that if you are re-running my code with your data, your values ​​may need to change.

Load calculations can be done in a more sophisticated way by dividing the activity in each heart rate zone into periods. However, we only need a big picture view here and the approximation done here serves the purpose.


The title of the post comes from “Form and File” by Archers of Load from their album “All the Nations Airports”.


Source link

- Advertisement -

Recent Articles

Related Stories