I once had the opportunity to work as a customer service agent in a call center. My job was to provide assistance to customers with problems regarding their wearable body sensors. Wearable body sensors or trackers as we commonly refer to them are revolutionizing fitness programs by providing feedback to the user. It provides an assessment of how well you performed today during your exercise routine and even allows you to compare how well you have advanced since you started your program three months ago or even three years ago.
Not only has trackers improved individual exercise programs but also programs that cater to groups of people. More and more companies are seeing the benefit of incorporating a physical fitness program for their employees with some form of incentives based on the feedback from wearable body sensors. Local version of the show “Biggest loser” can easily be orgainzed using trackers to monitor progress and aid in deciding who the winner is. Companies are joining in to keep their workforce healthy, increase productivity, and decrease medical insurance cost for the employee and the company.
Manufacturers of wearable sensors have tapped into the popularity of social media and accomplished what Public Health Programs have sought to do previously, increase exercise frequency among the general population. People across geographical boundaries come together to join in social media events that arrange friendly competitions whose winner is based on who had the most number of steps or climbed the highest number of floors for the week, day, or month.
I was allowed to borrow from the company one tracker for a week at a time. There were several models with different features available. One of the tracker that i borrowed had a GPS (Global Positioning System) Receiver which allows you to track your geo-positions while doing an activity. I thought it was pretty neat!!! I liked running or taking long bike rides on weekends from Quezon City to Antipolo (Philippines). I was even able to reach Lucban, Quezon and visit the Kamay ni Hesus Shrine during a summer break. Join me as I explore the data gathered from the trackers I borrowed using R and have a great time creating visualizations using plots.
The simplest tracker counts the number of steps that you take. It does this through the help of sensors that monitor changes along the x, y, and z axis during regular body movements. To put it in a simplistic way, trackers that are worn on the wrist monitors the swinging movements of the arm. When you take a step forward with your right leg, your left arm moves naturally backward. The tracker counts the number of times your arm moves backward and forward and records that as the number of steps that you took.
More advanced trackers contain sensors for changes in altitude to determine if you are climbing stairs. Other trackers can monitor heart rate, and even sleep.
Sensors are not the only source of information for trackers. When you create an account at the website of the manufacturer of your tracker, you are asked to provide information like your height, weight, sex, and age. These information are then extended to predict your basal metabolic rate, stride length, and other information.
At the heart of a tracker is an algorithm that collects all the information and process them to provide even more information. Based on your stride length and the number of steps that the sensors registers, it computes the distance you have travelled. And if you are walking upward on an inclined plane, the algorithm add a few more calories burned as it takes more energy to walk upward.
The algorithm is also responsible for the trackers accuracy. We move our arms all the time even when we are just sitting. The algorithm specifies which movements are interpreted as a step and which ones are not.
Some trackers even have algorithms that predict whether you are elevating your heart rate to the desired level to gain the most benefit from the exercise session. Others predict the amount of oxygen that your body consumes during the session. Traditionally, these are data obtained in a controlled environment like a hospital using expensive equipment. Consumers of these technology are mostly highly paid athletes who need to boost their performance to match their salary. The accuracy of these predictions from trackers however, have been questioned but its practical use has not escaped the medical community.
When I left my job I lost access to my tracker’s account. I tried accessing it but I Couldn’t remember the right password anymore and i don’t have any access to my old company email address. Luckily, I was able to save some of my data in .tcx, .csv, and .RData formats the last time I tried to explore the data while I was still with the company.
The company that owned the tracker offers a nice and easy way for its customers to view their data but i will take this opportunity to exercise my learned R skills and explore the R package TrackeR by Hannah Frick and Ioannis Kosmidis. The package is available in CRAN and has a function that can read .tcx
files.
list.files("./data")
## [1] "5744338855.tcx" "5760817276.tcx" "5793241381.tcx" "5793565236.tcx"
## [5] "5846949327.tcx" "5846949328.tcx" "fitbit.csv" "fitbit.tcx"
## [9] "fitbit1.RData" "fitbit2.tcx" "fitbit3.tcx"
I downloaded the .tcx
files from the tracker’s website. Back then I didn’t know about the package TrackeR
or it wasn’t available yet. I was disappointed that i couldn’t create customized visualization of the GPS data. We’ll leave the GPS data for now and concentrate on the number of steps that i took per day.
The RData
files contained the data I was able to previously download using the R package fitbitScraper by Cory Nissen which is also available in CRAN.
Let’s take a look at the content of the fitbit1.RData file. By now you would have noticed the rectangular tabs labelled as “Code” on the right side of the screen. Clicking on the tabs will display the codes that created the output in R. R is a great statistical program and a lot more. With all the available resources now available, learning R has become a lot easier. The best thing about it is it’s free.
load("./data/fitbit1.RData") ###Load variables in global environment
ls() ###List variables in global environment
## [1] "cookie" "d" "daydf" "dt" "fitbit.dt"
## [6] "fitbit.pwd" "fitbit.usr" "i" "iris" "target"
The variables: cookie
, fitbit.dt
, fitbit.pwd
, fitbit.usr
, target
, and i
were arguments for the functions in fitbitScraper
needed to download my data. It doesn’t work now because I no longer have access to my trackers account. I leave it to you to excplore the fitbitScraper
package in case you want to use it to download your data.
It turns out that the variable d contains all the data and the rest are just subsets of d.
iris
, as most R users know is the name of a popular dataset in R. I guess i did a couple of practice rounds with that data while downloading my Fitbit data. The .tcx files contain the gps data.
library(dplyr)
library(lubridate)
library(leaflet)
library(ggplot2)
library(gridExtra)
library(leaflet)
library(ggthemes)
We loaded the packages: dplyr
, lubridate
, leaflet
, and ggplot
which will help us to manipulate the data and visualize it. Packages allows us to extend the functionality of R in different ways.
str(d) ###See content of variable d
## 'data.frame': 1536 obs. of 4 variables:
## $ time : POSIXct, format: "2016-11-18 00:00:00" "2016-11-18 00:15:00" ...
## $ steps : num 0 0 0 0 0 0 0 0 0 0 ...
## $ day : chr "11-18" "11-18" "11-18" "11-18" ...
## $ timestamp: chr "13:00:00" "13:15:00" "13:30:00" "13:45:00" ...
The R object d is a dataframe that contains 4 variables.
we cans see the first six rows of data below.
head(d) ###Show first sixs rows of d
## time steps day timestamp
## 1 2016-11-18 00:00:00 0 11-18 13:00:00
## 2 2016-11-18 00:15:00 0 11-18 13:15:00
## 3 2016-11-18 00:30:00 0 11-18 13:30:00
## 4 2016-11-18 00:45:00 0 11-18 13:45:00
## 5 2016-11-18 01:00:00 0 11-18 14:00:00
## 6 2016-11-18 01:15:00 0 11-18 14:15:00
We can also manipulate the data in R in order to show other details with regard to time like days of the week.
d$weekday <- wday(d$time, ### get weekday from d$time
label = TRUE, ### display as words
abbr = TRUE) ### display abbreviated version
d$date <- date(d$time) ### create new variable date
head(d)
## time steps day timestamp weekday date
## 1 2016-11-18 00:00:00 0 11-18 13:00:00 Fri 2016-11-18
## 2 2016-11-18 00:15:00 0 11-18 13:15:00 Fri 2016-11-18
## 3 2016-11-18 00:30:00 0 11-18 13:30:00 Fri 2016-11-18
## 4 2016-11-18 00:45:00 0 11-18 13:45:00 Fri 2016-11-18
## 5 2016-11-18 01:00:00 0 11-18 14:00:00 Fri 2016-11-18
## 6 2016-11-18 01:15:00 0 11-18 14:15:00 Fri 2016-11-18
Let’s summarize our data to reflect the total number of steps per day during that 2 week period.
day_sum <- d %>% ### create variable day_sum which
group_by(day) %>% ### summarize the data as the number
summarize(Total_steps = sum(steps)) ### of steps per day
tail(day_sum) ### show last 6 rows of day_sum
## # A tibble: 6 x 2
## day Total_steps
## <chr> <dbl>
## 1 11-28 10429
## 2 11-29 15245
## 3 11-30 13795
## 4 12-01 16376
## 5 12-02 14539
## 6 12-03 13404
We can appreciate that data much better in a plot.
ggplot(day_sum, ### data = day_sum
aes(x = day, ### plot day on x axisl
y = Total_steps, ### plot Total_steps on y axis
fill = Total_steps)) + ### Use Totals_steps to color
geom_bar(stat = "identity") + ### create a bar graph
geom_hline(yintercept = 15000) + ### create horizontal line at 15000
labs(title = "Number of steps per day", ### Provide labels for axis and title
x = " Date (Nov 18 - Dec 3)",
y = "Steps") +
theme(plot.title =element_text(face = "bold", ### specify font face
size = 17, ### specify font size of title
vjust = 2), ### specify distance of title from plot
axis.title = element_text(size = 15), ### specify font size of axis titles
axis.title.x = element_text(vjust = -2), ### specify distance of label from axis
axis.text = element_text(size = 13)) + ### specify font size of axis text
scale_x_discrete(labels = c(18:30, 1:3))
We can see right away that I was only able to reach my goal of 15,000 steps twice during that 16 day period. I was transitioning from a baseline goal of 10,000 steps per day to 15,000 steps but wasn’t meeting much success. I had a 16 day total of 200081 steps.
We can improve the previous graph such that it conveys the information readily.
d %>% group_by(date) %>%
summarize(Total_steps = sum(steps)) %>%
mutate(target_met = Total_steps >= 15000) %>% ### create a variable which gives
ggplot(aes(x = date, ### the value of true if the number
y = Total_steps, ### of steps exceed 15000
fill = target_met)) +
geom_bar(stat = "identity") + ### plot data as barplot
geom_hline(yintercept = 15000, ### place horizontal line at 15000 steps
linetype = "dashed") + ### specify type of line
labs(title = "Number of steps per day", ### Provide labels for axis and title
x = "Date",
y = "Steps") +
theme(plot.title =element_text(face = "bold", ### specify font face
size = 17, ### specify font size of title
vjust = 2), ### specify distance of title from plot
axis.title = element_text(size = 15), ### specify font size of axis titles
axis.title.x = element_text(vjust = -2), ### specify distance of label from axis
axis.text = element_text(size = 13)) + ### specify font size of axis text
scale_x_date(breaks = unique(d$date), ### specify axis breaks and labels
labels = c("", "", "", "", "",
"", "", "", "", "",
"", "Nov 29", "",
"Dec 1", "", ""))
Right away we see that the target of 15000 steps were met on November 29 and Decemeber 1.
Let’s find out which day of the week I had the highest mean number of steps. This time let’s try to change the default output of ggplot. I like the colors in ggplot but from time to time I get that feeling that I want to try something else. With the help of the package ggthemes
we can do this with little effort.
d %>% ### take the variable d
group_by(date, weekday) %>% ### group by the variable date and weekday
summarise(Total_steps = sum(steps)) %>% ### sum the total number of steps / day
ungroup() %>% ### ungroup
group_by(weekday) %>% ### group by weekday
summarize(Ave_steps = mean(Total_steps)) %>% ### take the mean number of steps/day of the week
ggplot(aes(x = weekday, ### plot the variable weekday on the x axis
y = Ave_steps, ### plot the variable Ave_steps on the y axis
fill = Ave_steps)) + ### reflect the number of steps by the color of the bars
geom_bar(stat = "identity") + ### express data as barplot
theme_hc(bgcolor = "darkunica") + ### specify theme
scale_colour_hc("darkunica") + ### specify color of bars
theme(legend.key.width = unit(2, "cm")) + ### specify legend width
labs(title = "Mean number of steps\n per day of the week") ### Specify title
By altering different aspects of the plot we create a theme that alters the way the information is conveyed. The different elements of the plot contribute to the mood of the plot.
If we want to see number of steps throughout the day by 15 min intervals we can:
d$hrminsec <- substr(d$time, 12, 19) ### create new variable showing only time
hr_labels <- d$hrminsec[grep(":00:00", ### get only time that is an exact hour
d$hrminsec)]
xtick_labels <- substr(hr_labels, 1, 5) ### remove zeroes which stand for seconds
ggplot(d, aes(hrminsec, ### plot the number of steps every
steps, ### 15 minutes
fill = steps)) + ### let color denote the number of steps
geom_bar(stat = "identity") + ### create a bar chart
facet_grid(day~.) + ### show each day in a different panel
xlab("15 minute interval") + ### label x axis
ggtitle("steps by 15 min interval") + ### provide a title
scale_x_discrete(breaks = hr_labels, ### provide breaks for every hour on the x axis
labels = xtick_labels) + ### provide label for axis ticks
theme(legend.position = "bottom", ### put legend at the bottom
legend.direction = "horizontal", ### make legend span horizontally
axis.text.y = element_text(size = 11), ### specify size ofy axis tick labels
axis.text.x= element_text(angle = 50,
size = 15,
vjust = 0.5),
plot.title = element_text(face = "bold", ### specify dimensions of title
vjust = 2,
size = 20),
axis.title.y = element_text(size = 17, ### specify size and justification of y
vjust = 2), ### axis label
axis.title.x = element_text(size = 17, ### specify size and justification of x
vjust = 0), ### specify size and justification of y
legend.key.size = unit(1.2, "cm"), ### specify legend key dimensions
legend.text = element_text(size = 15),
legend.title = element_text(size = 17)) +
scale_fill_gradient(low="darkkhaki", high="darkgreen")
To help you understand the plot, the y-axis represents the time beginning at 12 midnight at the left most tick mark and ending a period of 24 hours at the rightmost tick mark. Each bar represents the number of steps at each 15 minute period throughout the day. The tick mark labels for every 15 seconds took too much space making the size of the letters too small or overlap. Luckily, ggplot has a provided a solution for this particular problem. I changed the labels on the tick marks to reflect only hours to make the labels readable.
By choosing the right color and contrast we can highlight certain aspect of the data to stick out of the rest of the plot. iIn this plot, the green bars highlight which hour of the day i was active.
I usually left the office at around 10 am and went back around 10 pm. That’s right I was in the graveyard shift. The customers we catered to were from another continent in a different time zone.
I remember distictly that I was taking the course Bayesian Statistics course offered by the Duke University in Coursera at the time and the graveyard shift didn’t help any to ease learning.
d %>%
group_by(hrminsec) %>%
summarize(Ave_steps = mean(steps)) %>%
ggplot(aes(x = hrminsec,
y = Ave_steps,
fill = Ave_steps)) +
geom_bar(stat = "identity") +
theme_igray() + ### specify theme as theme-igray from the
scale_fill_gradient_tableau("Red", ### package ggthemes. Specify color of the
name = "Mean Steps") + ### bars as red. specify legend title
theme(legend.position = c(0.65, 0.6), ### specify the following:
legend.key.width = unit(1.6, "cm"), ### width of the the legend key
legend.text = element_text(size = 13), ### size of the text of the legend
legend.title = element_text(size = 15), ### size of the title of the legend
axis.text.y = element_text(size = 15), ### size of the y axis tick labels
axis.text.x= element_text(angle = 0, ### angle of the x axis text label
size = 15, ### font size of the x axis text label
vjust = 0.5), ### distance of the label from the axis
plot.title = element_text(face = "bold", ### specify dimensions of the title
vjust = 2, ### distance of the title from the plot
size = 18), ### font size of the title
axis.title.y = element_text(size = 17, ### font size of the y axis title
vjust = 2), ### distance of the y axis title from axis
axis.title.x = element_text(size = 17, ### font size of the x axis title
vjust = 0)) + ### distance of x axis title from the axis
labs(title = "Mean number of steps\n throughout the day", ### add title
x = "Steps by 15 minute intervals", ### add x axis label
y = "") + ### make y axis label empty
scale_x_discrete(breaks = hr_labels, labels = xtick_labels) ### specify tick breaks and labels
Wow!!! That bar seems to stick out among the rest. But, that’s just me overtaking the traffic by walking the rest of the way instead of waiting for the bus to reach my stop.
Let’s turn our attention now to the .tcx files that contain the GPS data. The first file contains a run that I did around the neighborhood lasting for a little over an hour. Aside from providing information about the geo-positions, the tracker also provides information about altitude, time, heart rate, and distance. The variables speed, cadence, and power contained only missing values.
We read in the data using the readTCX
function from the trackeR
package.
library(trackeR)
am_run <- readTCX(file = "./data/fitbit.tcx", ### read tcx file
timezone = "Asia/Taipei") ### use asia timezone
str(am_run)
## 'data.frame': 3638 obs. of 9 variables:
## $ time : POSIXct, format: "2017-01-25 05:36:44" "2017-01-25 05:36:45" ...
## $ latitude : num 14.6 14.6 14.6 14.6 14.6 ...
## $ longitude : num 121 121 121 121 121 ...
## $ altitude : num 77.6 74.5 72.6 70.6 68.8 ...
## $ distance : num 0 0 0 0 0.13 0.49 1.18 1.18 3.39 4.55 ...
## $ heart.rate: num 74 74 74 74 74 74 74 74 74 74 ...
## $ speed : num NA NA NA NA NA NA NA NA NA NA ...
## $ cadence : num NA NA NA NA NA NA NA NA NA NA ...
## $ power : num NA NA NA NA NA NA NA NA NA NA ...
summary(am_run)
## time latitude longitude
## Min. :2017-01-25 05:36:44 Min. :14.61 Min. :121
## 1st Qu.:2017-01-25 05:54:24 1st Qu.:14.61 1st Qu.:121
## Median :2017-01-25 06:11:46 Median :14.62 Median :121
## Mean :2017-01-25 06:10:52 Mean :14.62 Mean :121
## 3rd Qu.:2017-01-25 06:26:54 3rd Qu.:14.62 3rd Qu.:121
## Max. :2017-01-25 06:42:03 Max. :14.62 Max. :121
##
## altitude distance heart.rate speed
## Min. :-23.50 Min. : 0 Min. : 69.0 Min. : NA
## 1st Qu.: 12.60 1st Qu.:1166 1st Qu.: 95.0 1st Qu.: NA
## Median : 20.60 Median :2467 Median :102.0 Median : NA
## Mean : 19.56 Mean :2443 Mean :100.2 Mean :NaN
## 3rd Qu.: 26.80 3rd Qu.:3711 3rd Qu.:107.0 3rd Qu.: NA
## Max. : 77.59 Max. :4960 Max. :122.0 Max. : NA
## NA's :3638
## cadence power
## Min. : NA Min. : NA
## 1st Qu.: NA 1st Qu.: NA
## Median : NA Median : NA
## Mean :NaN Mean :NaN
## 3rd Qu.: NA 3rd Qu.: NA
## Max. : NA Max. : NA
## NA's :3638 NA's :3638
We’ll tranform our data frame to a time series data to better plot the variables. We’ll use the function trackeRdata
from the trackeR package.
am_run_ts <- trackeRdata(am_run) ### transform dataframe to time series
str(am_run_ts, 2) ### show dimension of data
## List of 1
## $ :'zoo' series from 2017-01-25 05:36:39 to 2017-01-25 06:42:08
## Data: num [1:3659, 1:9] 14.6 14.6 14.6 14.6 14.6 ...
## ..- attr(*, "dimnames")=List of 2
## Index: POSIXct[1:3659], format: "2017-01-25 05:36:39" "2017-01-25 05:36:39" ...
## - attr(*, "operations")=List of 2
## ..$ smooth : NULL
## ..$ threshold: NULL
## - attr(*, "units")='data.frame': 10 obs. of 2 variables:
## ..$ variable: chr [1:10] "latitude" "longitude" "altitude" "distance" ...
## ..$ unit : chr [1:10] "degree" "degree" "m" "m" ...
## - attr(*, "class")= chr [1:2] "trackeRdata" "list"
We can see a summary of my performance by using the summary
function.
summary(am_run_ts, movingThreshold = 1) ### create a summary of data
##
## *** Session 1 ***
##
## Session times: 2017-01-25 05:36:39 - 2017-01-25 06:42:08
## Distance: 4960.91 m
## Duration: 1.09 hours
## Moving time: 0.93 hours
## Average speed: 1.26 m_per_s
## Average speed moving: 1.49 m_per_s
## Average pace (per 1 km): 13:11 min:sec
## Average pace moving (per 1 km): 11:12 min:sec
## Average cadence: NA steps_per_min
## Average cadence moving: NA steps_per_min
## Average power: NA W
## Average power moving: NA W
## Average heart rate: 98.84 bpm
## Average heart rate moving: 101.32 bpm
## Average heart rate resting: 84.74 bpm
## Work to rest ratio: 5.6
##
## Moving threshold: 1 m_per_s
The summary function not only provided a summary of the variables like: total distance, duration, average speed and average heart rate, It also combined or extended the data to come up with other summaries such as average heart rate when moving or resting and work to rest ratio.
We can also plot heartbeat and pace.
plot(am_run_ts, what = c("heart.rate", ### plot data
"distance",
"pace"))
Looking at the plot we can see a series of up and down movement in pace and heartbeat. This is due to the many crossroads and vehicular traffic in the area. It would be better if we have a sustained level of heart rate and pace.
We will use the plotRoute function from the package TrackeR
plotRoute(am_run_ts, ### use plotRoute to map data
zoom = 15, ### specify amount of zoom
source = "google") ### use google map
## Source : https://maps.googleapis.com/maps/api/staticmap?center=14.618019,121.028885&zoom=15&size=640x640&scale=2&maptype=terrain&language=en-EN
or the leaflet function from the package leaflet which gives us a lot of flexibility on how our plot should look
leaflet(am_run) %>%
addTiles() %>%
addProviderTiles("OpenStreetMap.Mapnik") %>% ### Use openstreetmap
setView(121.0289, 14.61739, zoom = 15) %>% ### set center of map
addPolylines(~longitude, ~latitude) ### plot route
The next .tcx files contain a session on the stationary bike. Wearing the tracker on the wrist will probably result in errors in the number of steps counted since the algorithm contained in the tracker was designed to monitor the swinging of arms during walking, which in turn gives the number of steps. And since riding a stationary bike does not mimic the swinging movement of the arms, I decided to wear the tracker on my ankle and see what would happen. Since the bike is also stationary, we won’t have any use for GPS data.
The value of the tracker for this exercise is to monitor the heart rate. The longer you can keep your heart rate at a higher level during an exercise session, the more calories you burn. I wanted to find out how high my heart rate would be at the peak of my effort.
bike1 <- readTCX(file = "./data/fitbit2.tcx", ### read tcx file
timezone = "Asia/Taipei") ### use asia time zone
bike2 <- readTCX(file = "./data/fitbit3.tcx",
timezone = "Asia/Taipei")
stat_bike <- rbind(bike1, bike2) ### combine the two files
stat_bike_ts <- trackeRdata(stat_bike) ### convert to time series
str(stat_bike_ts, 2)
## List of 1
## $ :'zoo' series from 2017-02-01 05:38:05 to 2017-02-01 06:09:02
## Data: num [1:1604, 1:9] 14.6 14.6 14.6 14.6 14.6 ...
## ..- attr(*, "dimnames")=List of 2
## Index: POSIXct[1:1604], format: "2017-02-01 05:38:05" "2017-02-01 05:38:05" ...
## - attr(*, "operations")=List of 2
## ..$ smooth : NULL
## ..$ threshold: NULL
## - attr(*, "units")='data.frame': 10 obs. of 2 variables:
## ..$ variable: chr [1:10] "latitude" "longitude" "altitude" "distance" ...
## ..$ unit : chr [1:10] "degree" "degree" "m" "m" ...
## - attr(*, "class")= chr [1:2] "trackeRdata" "list"
summary(stat_bike_ts, movingThreshold = 1) ### create a summary of data
##
## *** Session 1 ***
##
## Session times: 2017-02-01 05:38:05 - 2017-02-01 06:09:02
## Distance: 3227.98 m
## Duration: 30.95 mins
## Moving time: 25.55 mins
## Average speed: 1.74 m_per_s
## Average speed moving: 2.11 m_per_s
## Average pace (per 1 km): 9:35 min:sec
## Average pace moving (per 1 km): 7:55 min:sec
## Average cadence: NA steps_per_min
## Average cadence moving: NA steps_per_min
## Average power: NA W
## Average power moving: NA W
## Average heart rate: 136.33 bpm
## Average heart rate moving: 136.2 bpm
## Average heart rate resting: 138.37 bpm
## Work to rest ratio: 4.73
##
## Moving threshold: 1 m_per_s
Because I poured all my effort from start to finish of the stationary biking session I was able to maintain an average heart rate of 136.33 beats per mnute (bpm). Because of the sustained best effort, i was only able to keep going for 30 mminutes.
The plot shows that i was able to reach a peak heart rate of about 145 (bpm) and the steep incline of the plot showed how fast I achieved the peak heart rate
plot(stat_bike_ts) ### plot heart rate and pace during workout
You can also plot the percentage of the time you were able to maintain your heart beat at a certain range.
zone2 <- zones(stat_bike_ts) ### create bar chart of heart rate and speed
plot(zone2)
There are so many other useful and interesting functions in the TrackeR package but because we have a limited amount of data, we are unable to show them here. You can find the intoructory tutorial for the package trackeR at this URL https://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html
I was able find a .csv
file of the data for the same period above with variables that recorded the amount of sleep and calories burned. With the help of R we can do linear regression with our data to help us plan the amount of calories we’d like to burn in an exercise session. I had to clean the data a bit to make it more suitable for manipulating in R. One of the csv
file contained two dataframes.
week <- read.csv("fitbit.csv", ### read csv file in R
stringsAsFactors = FALSE, ### don't convert strings to factor
skip = 1) ### skip first row
week$Date <- mdy(week$Date) ### parse Date
nov17todec4 <- read.csv("fitbit_export_20161206.csv", ### read file
stringsAsFactors = FALSE, ### don't read strings as factors
skip = 22, ### skip to row 22
nrows = 18) ### read 18 rows
nov17todec4_sleep <- read.csv("fitbit_export_20161206.csv", ### read file
stringsAsFactors = FALSE, ### don't read strings as factors
skip = 43, ### skip to row 43
nrows = 19) ### read 19 rows
twoweeks <- cbind(nov17todec4, ### bind the two dataframes from
nov17todec4_sleep[, 2:5]) ### the same csv file by column
threeweeks <- rbind(week[, -c(2:4)], ### bind the two dataframes by row
twoweeks)
threeweeks$Calories.Burned <- gsub(",", ### remove commas
"",
threeweeks$Calories.Burned)
threeweeks$Steps <- gsub(",", ### remove commas
"",
threeweeks$Steps)
threeweeks$Activity.Calories <- gsub(",", ### remove commas
"",
threeweeks$Activity.Calories)
threeweeks$Minutes.Sedentary <- gsub(",", ### remove commas
"",
threeweeks$Minutes.Sedentary)
threeweeks$Calories.Burned <- as.numeric(threeweeks$Calories.Burned) ### change to class numeric
threeweeks$Steps <- as.numeric(threeweeks$Steps) ### change to class numeric
threeweeks$Minutes.Sedentary <- as.numeric(threeweeks$Minutes.Sedentary) ### change to class numeric
threeweeks$Activity.Calories <- as.numeric(threeweeks$Activity.Calories) ### change to class numeric
Below are the plots showing the linear relationship between calories burned and the number of floors climbed and below that another plot showing the linear relationship between calories burned and the number of steps taken. The blue line represents the least square line.
p1 <- threeweeks %>%
filter(Calories.Burned > 2000) %>% ### remove days with incomplete data
ggplot(aes(x = Floors, ### plot number of floors on x axis
y = Calories.Burned)) + ### plot number of cal burned on y axis
geom_point() + ### represent data by points
geom_smooth(method = "lm") ### show least square line
p2 <- threeweeks %>%
filter(Calories.Burned > 2000) %>%
ggplot(aes(x = Steps, ### plot number of stepss on x axis
y = Calories.Burned)) + ### plot number of cal burned on y axis
geom_point() + ### represent data by points
geom_smooth(method = "lm") ### show least square line
grid.arrange(p1, p2) ### show plots in single column
You might be wondering what’s the practical use of doing a regression on the number of steps or floors on the amount of calories burned? For me the answer was quite obvious.
I used to work with kids with diabetes and even for those who had great self-control, occasions arise when they take in more calories than they should. It could be because of circumstances like joining your friends in a friendly chat at a fast food place where there are no healthy alternatives or simply because a commercial of Reeses Chocolate cups flashed on the tv and you couldn’t resist.
Most diabetic kids know that the easiest way to reduce the blood sugar would be to compensate with an additional volume of insulin. Howerver, you do that often enough and you start to gain weight.
Living in a developing country can make you appreciate how precious every drop of insulin and motivate you to search for other alternatives on how to burn those extra calories. Especially when mom or dad is on your case because of the amount of insulin you consume per month.
Walking an extra block or climbing a couple of flights of stairs in the mall before going home might do the trick. The plots above can help you determine how many steps or flights of stairs you need to do in order to burn the extra calories you took. You can also check the slope of the regression line by doing the following in R.
threeweeks %>%
filter(Calories.Burned > 2000) %>% ### remove days with incomplete data
lm(data = ., ### Calories burned as dependent variable
Calories.Burned ~ Floors) %>% ### Floors as independent variable
summary() ### show summary of linear regression
##
## Call:
## lm(formula = Calories.Burned ~ Floors, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -443.72 -251.50 10.87 215.37 497.80
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2777.624 118.847 23.372 0.000000000000000541 ***
## Floors 12.172 5.257 2.316 0.0313 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 276.3 on 20 degrees of freedom
## Multiple R-squared: 0.2114, Adjusted R-squared: 0.172
## F-statistic: 5.362 on 1 and 20 DF, p-value: 0.03132
threeweeks %>%
filter(Calories.Burned > 2000) %>% ### remove days with incomplete data
lm(data = ., ### Calories burned as dependent variable
Calories.Burned ~ Steps) %>% ### Steps as independent variable
summary() ### Calories burned as dependent variable
##
## Call:
## lm(formula = Calories.Burned ~ Steps, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -325.87 -224.04 -42.04 198.62 411.64
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2053.50158 355.69424 5.773 0.000012 ***
## Steps 0.07606 0.02773 2.743 0.0125 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 265.2 on 20 degrees of freedom
## Multiple R-squared: 0.2733, Adjusted R-squared: 0.237
## F-statistic: 7.522 on 1 and 20 DF, p-value: 0.01255
Based on the summary above, for every floor I climb, I will burn about 12 calories. In terms of steps, 1 step is equivalent to 0.076 calories burned. For me that’s very usefull! Alas, the cost of a tracker is quite prohibitive for most families with type 1 diabetes in a developing country.
One thing that struck me while working as a call center agent was the volume of callers who were concerned about the number of hours of sleep they were getting. We can create a plot of the variable sleep in R.
threeweeks %>%
ggplot(aes(x = Date, ### Plot date on x axis
y = Minutes.Asleep)) + ### Plot minutes on y axis
geom_bar(stat = "identity", ### specify stat as identity
fill = "steelblue") + ### specify color of bars
geom_hline(yintercept = mean(threeweeks$Minutes.Asleep), ### show mean as horizontal line
color = "salmon") + ### color line as salmon
geom_hline(yintercept = 480, ### horizontal line at 480 minutes
color = "turquoise") ### color line as turquoise
I was averaging 3.88 hours of sleep (salmon colored horizontal line on the plot) during that three week period but that is inaccurate due to certains days when i failed to wear the tracker while sleeping or had to return the tracker to the company. I remember fondly those rare days when i could get a full 8 hours of sleep (turquoise colored horizontal line on the plot).
Wearable sensors are great motivators for individuals and groups of people to exercise by providing feedback. It can help set goals that are realistic and realizable based on past performance and provide a measure of one’s achievement.
Creating custom visualization in R is a fun way to view the data one has accumulated.
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7600)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] trackeR_1.0.0 zoo_1.8-1 bindrcpp_0.2.2
## [4] ggthemes_3.4.2 gridExtra_2.3 ggplot2_2.2.1.9000
## [7] leaflet_2.0.0 lubridate_1.7.4 dplyr_0.7.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 lattice_0.20-35 prettyunits_1.0.2
## [4] png_0.1-7 assertthat_0.2.0 rprojroot_1.3-2
## [7] digest_0.6.15 utf8_1.1.3 mime_0.5
## [10] R6_2.2.2 plyr_1.8.4 backports_1.1.2
## [13] evaluate_0.10.1 pillar_1.2.2 RgoogleMaps_1.4.1
## [16] rlang_0.2.0.9000 progress_1.1.2.9002 lazyeval_0.2.1
## [19] rstudioapi_0.7.0-9000 Matrix_1.2-14 rmarkdown_1.9.8
## [22] labeling_0.3 stringr_1.3.0 selectr_0.4-1
## [25] ansistrings_1.0.0.9000 htmlwidgets_1.2 munsell_0.4.3
## [28] shiny_1.0.5 compiler_3.4.1 httpuv_1.4.1
## [31] pkgconfig_2.0.1 mgcv_1.8-23 htmltools_0.3.6
## [34] tibble_1.4.2 XML_3.98-1.11 crayon_1.3.4
## [37] withr_2.1.2 later_0.7.2 bitops_1.0-6
## [40] grid_3.4.1 nlme_3.1-137 jsonlite_1.5
## [43] xtable_1.8-2 gtable_0.2.0 magrittr_1.5
## [46] scales_0.5.0.9000 cli_1.0.0.9001 stringi_1.1.7
## [49] reshape2_1.4.3 promises_1.0.1 xml2_1.2.0
## [52] rjson_0.2.15 rematch2_2.0.1 tools_3.4.1
## [55] ggmap_2.7.900 glue_1.2.0 hms_0.4.2
## [58] crosstalk_1.0.0 jpeg_0.1-8 yaml_2.1.19
## [61] colorspace_1.3-2 knitr_1.20 bindr_0.1.1