Wearable Body Sensors

Synopsis

I once had the opportunity to work as a customer service agent in a call center. My job was to provide assistance to customers with problems regarding their wearable body sensors. Wearable body sensors or trackers as we commonly refer to them are revolutionizing fitness programs by providing feedback to the user. It provides an assessment of how well you performed today during your exercise routine and even allows you to compare how well you have advanced since you started your program three months ago or even three years ago.

Not only has trackers improved individual exercise programs but also programs that cater to groups of people. More and more companies are seeing the benefit of incorporating a physical fitness program for their employees with some form of incentives based on the feedback from wearable body sensors. Local version of the show “Biggest loser” can easily be orgainzed using trackers to monitor progress and aid in deciding who the winner is. Companies are joining in to keep their workforce healthy, increase productivity, and decrease medical insurance cost for the employee and the company.

Manufacturers of wearable sensors have tapped into the popularity of social media and accomplished what Public Health Programs have sought to do previously, increase exercise frequency among the general population. People across geographical boundaries come together to join in social media events that arrange friendly competitions whose winner is based on who had the most number of steps or climbed the highest number of floors for the week, day, or month.

I was allowed to borrow from the company one tracker for a week at a time. There were several models with different features available. One of the tracker that i borrowed had a GPS (Global Positioning System) Receiver which allows you to track your geo-positions while doing an activity. I thought it was pretty neat!!! I liked running or taking long bike rides on weekends from Quezon City to Antipolo (Philippines). I was even able to reach Lucban, Quezon and visit the Kamay ni Hesus Shrine during a summer break. Join me as I explore the data gathered from the trackers I borrowed using R and have a great time creating visualizations using plots.

What is a tracker

The simplest tracker counts the number of steps that you take. It does this through the help of sensors that monitor changes along the x, y, and z axis during regular body movements. To put it in a simplistic way, trackers that are worn on the wrist monitors the swinging movements of the arm. When you take a step forward with your right leg, your left arm moves naturally backward. The tracker counts the number of times your arm moves backward and forward and records that as the number of steps that you took.

More advanced trackers contain sensors for changes in altitude to determine if you are climbing stairs. Other trackers can monitor heart rate, and even sleep.

Sensors are not the only source of information for trackers. When you create an account at the website of the manufacturer of your tracker, you are asked to provide information like your height, weight, sex, and age. These information are then extended to predict your basal metabolic rate, stride length, and other information.

At the heart of a tracker is an algorithm that collects all the information and process them to provide even more information. Based on your stride length and the number of steps that the sensors registers, it computes the distance you have travelled. And if you are walking upward on an inclined plane, the algorithm add a few more calories burned as it takes more energy to walk upward.

The algorithm is also responsible for the trackers accuracy. We move our arms all the time even when we are just sitting. The algorithm specifies which movements are interpreted as a step and which ones are not.

Some trackers even have algorithms that predict whether you are elevating your heart rate to the desired level to gain the most benefit from the exercise session. Others predict the amount of oxygen that your body consumes during the session. Traditionally, these are data obtained in a controlled environment like a hospital using expensive equipment. Consumers of these technology are mostly highly paid athletes who need to boost their performance to match their salary. The accuracy of these predictions from trackers however, have been questioned but its practical use has not escaped the medical community.

The Data

When I left my job I lost access to my tracker’s account. I tried accessing it but I Couldn’t remember the right password anymore and i don’t have any access to my old company email address. Luckily, I was able to save some of my data in .tcx, .csv, and .RData formats the last time I tried to explore the data while I was still with the company.

The company that owned the tracker offers a nice and easy way for its customers to view their data but i will take this opportunity to exercise my learned R skills and explore the R package TrackeR by Hannah Frick and Ioannis Kosmidis. The package is available in CRAN and has a function that can read .tcx files.

list.files("./data")

##  [1] "5744338855.tcx" "5760817276.tcx" "5793241381.tcx" "5793565236.tcx"
##  [5] "5846949327.tcx" "5846949328.tcx" "fitbit.csv"     "fitbit.tcx"    
##  [9] "fitbit1.RData"  "fitbit2.tcx"    "fitbit3.tcx"

I downloaded the .tcx files from the tracker’s website. Back then I didn’t know about the package TrackeR or it wasn’t available yet. I was disappointed that i couldn’t create customized visualization of the GPS data. We’ll leave the GPS data for now and concentrate on the number of steps that i took per day.

Intraday data

The RData files contained the data I was able to previously download using the R package fitbitScraper by Cory Nissen which is also available in CRAN.

Let’s take a look at the content of the fitbit1.RData file. By now you would have noticed the rectangular tabs labelled as “Code” on the right side of the screen. Clicking on the tabs will display the codes that created the output in R. R is a great statistical program and a lot more. With all the available resources now available, learning R has become a lot easier. The best thing about it is it’s free.

load("./data/fitbit1.RData")  ###Load variables in global environment
ls()                          ###List variables in global environment

##  [1] "cookie"     "d"          "daydf"      "dt"         "fitbit.dt" 
##  [6] "fitbit.pwd" "fitbit.usr" "i"          "iris"       "target"

The variables: cookie, fitbit.dt, fitbit.pwd, fitbit.usr, target, and i were arguments for the functions in fitbitScraper needed to download my data. It doesn’t work now because I no longer have access to my trackers account. I leave it to you to excplore the fitbitScraper package in case you want to use it to download your data.

It turns out that the variable d contains all the data and the rest are just subsets of d.

iris, as most R users know is the name of a popular dataset in R. I guess i did a couple of practice rounds with that data while downloading my Fitbit data. The .tcx files contain the gps data.

library(dplyr)
library(lubridate)
library(leaflet)
library(ggplot2)
library(gridExtra)
library(leaflet)      
library(ggthemes)

We loaded the packages: dplyr, lubridate, leaflet, and ggplot which will help us to manipulate the data and visualize it. Packages allows us to extend the functionality of R in different ways.

str(d)      ###See content of variable d

## 'data.frame':    1536 obs. of  4 variables:
##  $ time     : POSIXct, format: "2016-11-18 00:00:00" "2016-11-18 00:15:00" ...
##  $ steps    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ day      : chr  "11-18" "11-18" "11-18" "11-18" ...
##  $ timestamp: chr  "13:00:00" "13:15:00" "13:30:00" "13:45:00" ...

The R object d is a dataframe that contains 4 variables.

The variable time records the day and the time when the number of steps were recorded.
The variab1e steps contain the number of steps for the various 15 minute periods throughout the day.
The variable day records the day the number of steps were recorded and
the variable timestamp is a housekeeping variable that tells the time when the tracker recorded the steps for the fifteen minute period.

we cans see the first six rows of data below.

head(d)     ###Show first sixs rows of d

##                  time steps   day timestamp
## 1 2016-11-18 00:00:00     0 11-18  13:00:00
## 2 2016-11-18 00:15:00     0 11-18  13:15:00
## 3 2016-11-18 00:30:00     0 11-18  13:30:00
## 4 2016-11-18 00:45:00     0 11-18  13:45:00
## 5 2016-11-18 01:00:00     0 11-18  14:00:00
## 6 2016-11-18 01:15:00     0 11-18  14:15:00

We can also manipulate the data in R in order to show other details with regard to time like days of the week.

d$weekday <- wday(d$time,          ### get weekday from d$time
                  label = TRUE,    ### display as words
                  abbr = TRUE)     ### display abbreviated version

d$date <- date(d$time)             ### create new variable date 
head(d)

##                  time steps   day timestamp weekday       date
## 1 2016-11-18 00:00:00     0 11-18  13:00:00     Fri 2016-11-18
## 2 2016-11-18 00:15:00     0 11-18  13:15:00     Fri 2016-11-18
## 3 2016-11-18 00:30:00     0 11-18  13:30:00     Fri 2016-11-18
## 4 2016-11-18 00:45:00     0 11-18  13:45:00     Fri 2016-11-18
## 5 2016-11-18 01:00:00     0 11-18  14:00:00     Fri 2016-11-18
## 6 2016-11-18 01:15:00     0 11-18  14:15:00     Fri 2016-11-18

Total Number of Steps per day

Let’s summarize our data to reflect the total number of steps per day during that 2 week period.

day_sum <- d %>%                             ### create variable day_sum which
        group_by(day) %>%                    ### summarize the data as the number
        summarize(Total_steps = sum(steps))  ### of steps per day

tail(day_sum)                                ### show last 6 rows of day_sum

## # A tibble: 6 x 2
##   day   Total_steps
##   <chr>       <dbl>
## 1 11-28       10429
## 2 11-29       15245
## 3 11-30       13795
## 4 12-01       16376
## 5 12-02       14539
## 6 12-03       13404

We can appreciate that data much better in a plot.

ggplot(day_sum,                                        ### data = day_sum
       aes(x = day,                                    ### plot day on x axisl
           y = Total_steps,                            ### plot Total_steps on y axis
           fill = Total_steps)) +                      ### Use Totals_steps to color 
        geom_bar(stat = "identity") +                  ### create a bar graph
        geom_hline(yintercept = 15000) +               ### create horizontal line at 15000
        labs(title = "Number of steps per day",        ### Provide labels for axis and title 
            x = " Date (Nov 18 - Dec 3)",
            y = "Steps") +
        theme(plot.title =element_text(face = "bold",  ### specify font face
                                       size = 17,      ### specify font size of title
                                       vjust = 2),     ### specify distance of title from plot
              axis.title = element_text(size = 15),    ### specify font size of axis titles
              axis.title.x = element_text(vjust = -2), ### specify distance of label from axis  
              axis.text = element_text(size = 13)) +   ### specify font size of axis text
        scale_x_discrete(labels = c(18:30, 1:3))

We can see right away that I was only able to reach my goal of 15,000 steps twice during that 16 day period. I was transitioning from a baseline goal of 10,000 steps per day to 15,000 steps but wasn’t meeting much success. I had a 16 day total of 200081 steps.

Prettier plot

We can improve the previous graph such that it conveys the information readily.

d %>% group_by(date) %>%
        summarize(Total_steps = sum(steps)) %>%
        mutate(target_met = Total_steps >= 15000) %>%   ### create a variable which gives
        ggplot(aes(x = date,                            ### the value of true if the number 
                   y = Total_steps,                     ### of steps exceed 15000
                   fill = target_met)) +
        geom_bar(stat = "identity") +                   ### plot data as barplot
        geom_hline(yintercept = 15000,                  ### place horizontal line at 15000 steps 
                   linetype = "dashed") +               ### specify type of line
        labs(title = "Number of steps per day",         ### Provide labels for axis and title 
            x = "Date",
            y = "Steps") +
        theme(plot.title =element_text(face = "bold",   ### specify font face
                                       size = 17,       ### specify font size of title
                                       vjust = 2),      ### specify distance of title from plot
              axis.title = element_text(size = 15),     ### specify font size of axis titles
              axis.title.x = element_text(vjust = -2),  ### specify distance of label from axis  
              axis.text = element_text(size = 13)) +    ### specify font size of axis text
        scale_x_date(breaks = unique(d$date),           ### specify axis breaks and labels
                     labels = c("", "", "", "", "",
                                "", "", "", "", "", 
                                "", "Nov 29", "",
                                "Dec 1", "", ""))

Right away we see that the target of 15000 steps were met on November 29 and Decemeber 1.

Mean Number of Steps per day of the week

Let’s find out which day of the week I had the highest mean number of steps. This time let’s try to change the default output of ggplot. I like the colors in ggplot but from time to time I get that feeling that I want to try something else. With the help of the package ggthemes we can do this with little effort.

d %>%                                                               ### take the variable d
        group_by(date, weekday) %>%                                 ### group by the variable date and weekday
        summarise(Total_steps = sum(steps)) %>%                     ### sum the total number of steps / day
        ungroup() %>%                                               ### ungroup
        group_by(weekday) %>%                                       ### group by weekday
        summarize(Ave_steps = mean(Total_steps)) %>%                ### take the mean number of steps/day of the week
        ggplot(aes(x = weekday,                                     ### plot the variable weekday on the x axis
                   y = Ave_steps,                                   ### plot the variable Ave_steps on the y axis
                   fill = Ave_steps)) +                             ### reflect the number of steps by the color of the bars
        geom_bar(stat = "identity") +                               ### express data as barplot
        theme_hc(bgcolor = "darkunica") +                           ### specify theme
        scale_colour_hc("darkunica") +                              ### specify color of bars
        theme(legend.key.width = unit(2, "cm")) +                   ### specify legend width
        labs(title = "Mean number of steps\n per day of the week")  ### Specify title

By altering different aspects of the plot we create a theme that alters the way the information is conveyed. The different elements of the plot contribute to the mood of the plot.

Steps throughout the day

If we want to see number of steps throughout the day by 15 min intervals we can:

d$hrminsec <- substr(d$time, 12, 19)                     ### create new variable showing only time
hr_labels <- d$hrminsec[grep(":00:00",                   ### get only time that is an exact hour
                             d$hrminsec)]
xtick_labels <- substr(hr_labels, 1, 5)                  ### remove zeroes which stand for seconds
ggplot(d, aes(hrminsec,                                  ### plot the number of steps every
              steps,                                     ### 15 minutes
              fill = steps)) +                           ### let color denote the number of steps
        geom_bar(stat = "identity") +                    ### create a bar chart 
        facet_grid(day~.) +                              ### show each day in a different panel
        xlab("15 minute interval") +                     ### label x axis
        ggtitle("steps by 15 min interval") +            ### provide a title
        scale_x_discrete(breaks = hr_labels,             ### provide breaks for every hour on the x axis
                         labels = xtick_labels) +        ### provide label for axis ticks
        theme(legend.position = "bottom",                ### put legend at the bottom
              legend.direction = "horizontal",           ### make legend span horizontally
              axis.text.y = element_text(size = 11),     ### specify size ofy axis tick labels
              axis.text.x= element_text(angle = 50,
                                        size = 15,
                                        vjust = 0.5),
              plot.title = element_text(face = "bold",   ### specify dimensions of title
                                        vjust = 2,
                                        size = 20),
              axis.title.y = element_text(size = 17,     ### specify size and justification of y 
                                          vjust = 2),    ### axis label
              axis.title.x = element_text(size = 17,     ### specify size and justification of x
                                          vjust = 0),    ### specify size and justification of y
              legend.key.size = unit(1.2, "cm"),         ### specify legend key dimensions
              legend.text = element_text(size = 15),    
              legend.title = element_text(size = 17)) +
        scale_fill_gradient(low="darkkhaki", high="darkgreen")

To help you understand the plot, the y-axis represents the time beginning at 12 midnight at the left most tick mark and ending a period of 24 hours at the rightmost tick mark. Each bar represents the number of steps at each 15 minute period throughout the day. The tick mark labels for every 15 seconds took too much space making the size of the letters too small or overlap. Luckily, ggplot has a provided a solution for this particular problem. I changed the labels on the tick marks to reflect only hours to make the labels readable.

By choosing the right color and contrast we can highlight certain aspect of the data to stick out of the rest of the plot. iIn this plot, the green bars highlight which hour of the day i was active.

I usually left the office at around 10 am and went back around 10 pm. That’s right I was in the graveyard shift. The customers we catered to were from another continent in a different time zone.

I remember distictly that I was taking the course Bayesian Statistics course offered by the Duke University in Coursera at the time and the graveyard shift didn’t help any to ease learning.

Mean number of steps throughout the day

d %>%
        group_by(hrminsec) %>%
        summarize(Ave_steps = mean(steps)) %>%
        ggplot(aes(x = hrminsec,
                   y = Ave_steps,
                   fill = Ave_steps)) +
        geom_bar(stat = "identity") +
        theme_igray() +                                              ### specify theme as theme-igray from the
        scale_fill_gradient_tableau("Red",                           ### package ggthemes. Specify color of the 
                                    name = "Mean Steps") +           ### bars as red. specify legend title
        theme(legend.position = c(0.65, 0.6),                        ### specify the following:
              legend.key.width = unit(1.6, "cm"),                    ### width of the the legend key
              legend.text = element_text(size = 13),                 ### size of the text of the legend
              legend.title = element_text(size = 15),                ### size of the title of the legend
              axis.text.y = element_text(size = 15),                 ### size of the y axis tick labels
              axis.text.x= element_text(angle = 0,                   ### angle of the x axis text label
                                        size = 15,                   ### font size of the x axis text label
                                        vjust = 0.5),                ### distance of the label from the axis
              plot.title = element_text(face = "bold",               ### specify dimensions of the title
                                        vjust = 2,                   ### distance of the title from the plot
                                        size = 18),                  ### font size of the title
              axis.title.y = element_text(size = 17,                 ### font size of the y axis title
                                          vjust = 2),                ### distance of the y axis title from axis
              axis.title.x = element_text(size = 17,                 ### font size of the x axis title
                                          vjust = 0)) +              ### distance of x axis title from the axis
        labs(title = "Mean number of steps\n throughout the day",    ### add title
             x = "Steps by 15 minute intervals",                     ### add x axis label
             y = "") +                                               ### make y axis label empty
        scale_x_discrete(breaks = hr_labels, labels = xtick_labels)  ### specify tick breaks and labels

Wow!!! That bar seems to stick out among the rest. But, that’s just me overtaking the traffic by walking the rest of the way instead of waiting for the bus to reach my stop.

GPS data

Let’s turn our attention now to the .tcx files that contain the GPS data. The first file contains a run that I did around the neighborhood lasting for a little over an hour. Aside from providing information about the geo-positions, the tracker also provides information about altitude, time, heart rate, and distance. The variables speed, cadence, and power contained only missing values.

Read in the data

We read in the data using the readTCX function from the trackeR package.

library(trackeR)
am_run <- readTCX(file = "./data/fitbit.tcx",     ### read tcx file
                  timezone = "Asia/Taipei")       ### use asia timezone
str(am_run)

## 'data.frame':    3638 obs. of  9 variables:
##  $ time      : POSIXct, format: "2017-01-25 05:36:44" "2017-01-25 05:36:45" ...
##  $ latitude  : num  14.6 14.6 14.6 14.6 14.6 ...
##  $ longitude : num  121 121 121 121 121 ...
##  $ altitude  : num  77.6 74.5 72.6 70.6 68.8 ...
##  $ distance  : num  0 0 0 0 0.13 0.49 1.18 1.18 3.39 4.55 ...
##  $ heart.rate: num  74 74 74 74 74 74 74 74 74 74 ...
##  $ speed     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cadence   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ power     : num  NA NA NA NA NA NA NA NA NA NA ...

summary(am_run)

##       time                        latitude       longitude  
##  Min.   :2017-01-25 05:36:44   Min.   :14.61   Min.   :121  
##  1st Qu.:2017-01-25 05:54:24   1st Qu.:14.61   1st Qu.:121  
##  Median :2017-01-25 06:11:46   Median :14.62   Median :121  
##  Mean   :2017-01-25 06:10:52   Mean   :14.62   Mean   :121  
##  3rd Qu.:2017-01-25 06:26:54   3rd Qu.:14.62   3rd Qu.:121  
##  Max.   :2017-01-25 06:42:03   Max.   :14.62   Max.   :121  
##                                                             
##     altitude         distance      heart.rate        speed     
##  Min.   :-23.50   Min.   :   0   Min.   : 69.0   Min.   : NA   
##  1st Qu.: 12.60   1st Qu.:1166   1st Qu.: 95.0   1st Qu.: NA   
##  Median : 20.60   Median :2467   Median :102.0   Median : NA   
##  Mean   : 19.56   Mean   :2443   Mean   :100.2   Mean   :NaN   
##  3rd Qu.: 26.80   3rd Qu.:3711   3rd Qu.:107.0   3rd Qu.: NA   
##  Max.   : 77.59   Max.   :4960   Max.   :122.0   Max.   : NA   
##                                                  NA's   :3638  
##     cadence         power     
##  Min.   : NA    Min.   : NA   
##  1st Qu.: NA    1st Qu.: NA   
##  Median : NA    Median : NA   
##  Mean   :NaN    Mean   :NaN   
##  3rd Qu.: NA    3rd Qu.: NA   
##  Max.   : NA    Max.   : NA   
##  NA's   :3638   NA's   :3638

Convert data frame to time series data

We’ll tranform our data frame to a time series data to better plot the variables. We’ll use the function trackeRdata from the trackeR package.

am_run_ts <- trackeRdata(am_run)    ### transform dataframe to time series
str(am_run_ts, 2)                   ### show dimension of data

## List of 1
##  $ :'zoo' series from 2017-01-25 05:36:39 to 2017-01-25 06:42:08
##   Data: num [1:3659, 1:9] 14.6 14.6 14.6 14.6 14.6 ...
##   ..- attr(*, "dimnames")=List of 2
##   Index:  POSIXct[1:3659], format: "2017-01-25 05:36:39" "2017-01-25 05:36:39" ...
##  - attr(*, "operations")=List of 2
##   ..$ smooth   : NULL
##   ..$ threshold: NULL
##  - attr(*, "units")='data.frame':    10 obs. of  2 variables:
##   ..$ variable: chr [1:10] "latitude" "longitude" "altitude" "distance" ...
##   ..$ unit    : chr [1:10] "degree" "degree" "m" "m" ...
##  - attr(*, "class")= chr [1:2] "trackeRdata" "list"

Summary

We can see a summary of my performance by using the summary function.

summary(am_run_ts, movingThreshold = 1)     ### create a summary of data

## 
##  *** Session 1 ***
## 
##  Session times: 2017-01-25 05:36:39 - 2017-01-25 06:42:08 
##  Distance: 4960.91 m 
##  Duration: 1.09 hours 
##  Moving time: 0.93 hours 
##  Average speed: 1.26 m_per_s 
##  Average speed moving: 1.49 m_per_s 
##  Average pace (per 1 km): 13:11 min:sec
##  Average pace moving (per 1 km): 11:12 min:sec
##  Average cadence: NA steps_per_min 
##  Average cadence moving: NA steps_per_min 
##  Average power: NA W 
##  Average power moving: NA W 
##  Average heart rate: 98.84 bpm 
##  Average heart rate moving: 101.32 bpm 
##  Average heart rate resting: 84.74 bpm 
##  Work to rest ratio: 5.6 
## 
##  Moving threshold: 1 m_per_s

The summary function not only provided a summary of the variables like: total distance, duration, average speed and average heart rate, It also combined or extended the data to come up with other summaries such as average heart rate when moving or resting and work to rest ratio.

Plotting Heartbeat and pace

We can also plot heartbeat and pace.

plot(am_run_ts, what = c("heart.rate",       ### plot data
                         "distance",
                         "pace"))

Looking at the plot we can see a series of up and down movement in pace and heartbeat. This is due to the many crossroads and vehicular traffic in the area. It would be better if we have a sustained level of heart rate and pace.

Mapping the run

We will use the plotRoute function from the package TrackeR

plotRoute(am_run_ts,               ### use plotRoute to map data
          zoom = 15,               ### specify amount of zoom
          source = "google")       ### use google map

## Source : https://maps.googleapis.com/maps/api/staticmap?center=14.618019,121.028885&zoom=15&size=640x640&scale=2&maptype=terrain&language=en-EN

or the leaflet function from the package leaflet which gives us a lot of flexibility on how our plot should look

leaflet(am_run) %>%                                           
                addTiles() %>% 
                addProviderTiles("OpenStreetMap.Mapnik") %>%    ### Use openstreetmap
                setView(121.0289, 14.61739, zoom = 15) %>%      ### set center of map
                addPolylines(~longitude, ~latitude)             ### plot route

The next .tcx files contain a session on the stationary bike. Wearing the tracker on the wrist will probably result in errors in the number of steps counted since the algorithm contained in the tracker was designed to monitor the swinging of arms during walking, which in turn gives the number of steps. And since riding a stationary bike does not mimic the swinging movement of the arms, I decided to wear the tracker on my ankle and see what would happen. Since the bike is also stationary, we won’t have any use for GPS data.

Stationary bike data

The value of the tracker for this exercise is to monitor the heart rate. The longer you can keep your heart rate at a higher level during an exercise session, the more calories you burn. I wanted to find out how high my heart rate would be at the peak of my effort.

bike1 <- readTCX(file = "./data/fitbit2.tcx",  ### read tcx file
                 timezone = "Asia/Taipei")     ### use asia time zone
bike2 <- readTCX(file = "./data/fitbit3.tcx",  
                 timezone = "Asia/Taipei")
stat_bike <- rbind(bike1, bike2)               ### combine the two files
stat_bike_ts <- trackeRdata(stat_bike)         ### convert to time series
str(stat_bike_ts, 2)

## List of 1
##  $ :'zoo' series from 2017-02-01 05:38:05 to 2017-02-01 06:09:02
##   Data: num [1:1604, 1:9] 14.6 14.6 14.6 14.6 14.6 ...
##   ..- attr(*, "dimnames")=List of 2
##   Index:  POSIXct[1:1604], format: "2017-02-01 05:38:05" "2017-02-01 05:38:05" ...
##  - attr(*, "operations")=List of 2
##   ..$ smooth   : NULL
##   ..$ threshold: NULL
##  - attr(*, "units")='data.frame':    10 obs. of  2 variables:
##   ..$ variable: chr [1:10] "latitude" "longitude" "altitude" "distance" ...
##   ..$ unit    : chr [1:10] "degree" "degree" "m" "m" ...
##  - attr(*, "class")= chr [1:2] "trackeRdata" "list"

Summary data

summary(stat_bike_ts, movingThreshold = 1)     ### create a summary of data

## 
##  *** Session 1 ***
## 
##  Session times: 2017-02-01 05:38:05 - 2017-02-01 06:09:02 
##  Distance: 3227.98 m 
##  Duration: 30.95 mins 
##  Moving time: 25.55 mins 
##  Average speed: 1.74 m_per_s 
##  Average speed moving: 2.11 m_per_s 
##  Average pace (per 1 km): 9:35 min:sec
##  Average pace moving (per 1 km): 7:55 min:sec
##  Average cadence: NA steps_per_min 
##  Average cadence moving: NA steps_per_min 
##  Average power: NA W 
##  Average power moving: NA W 
##  Average heart rate: 136.33 bpm 
##  Average heart rate moving: 136.2 bpm 
##  Average heart rate resting: 138.37 bpm 
##  Work to rest ratio: 4.73 
## 
##  Moving threshold: 1 m_per_s

Because I poured all my effort from start to finish of the stationary biking session I was able to maintain an average heart rate of 136.33 beats per mnute (bpm). Because of the sustained best effort, i was only able to keep going for 30 mminutes.

Plotting heart rate and pace

The plot shows that i was able to reach a peak heart rate of about 145 (bpm) and the steep incline of the plot showed how fast I achieved the peak heart rate

plot(stat_bike_ts)  ### plot heart rate and pace during workout

You can also plot the percentage of the time you were able to maintain your heart beat at a certain range.

zone2 <- zones(stat_bike_ts)  ### create bar chart of heart rate and speed
plot(zone2)

There are so many other useful and interesting functions in the TrackeR package but because we have a limited amount of data, we are unable to show them here. You can find the intoructory tutorial for the package trackeR at this URL https://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html

Linear Regression

I was able find a .csv file of the data for the same period above with variables that recorded the amount of sleep and calories burned. With the help of R we can do linear regression with our data to help us plan the amount of calories we’d like to burn in an exercise session. I had to clean the data a bit to make it more suitable for manipulating in R. One of the csv file contained two dataframes.

week <- read.csv("fitbit.csv",                                             ### read csv file in R
                 stringsAsFactors = FALSE,                                 ### don't convert strings to factor
                 skip = 1)                                                 ### skip first row
week$Date <- mdy(week$Date)                                                ###  parse Date

nov17todec4 <- read.csv("fitbit_export_20161206.csv",                      ### read file
                        stringsAsFactors = FALSE,                          ### don't read strings as factors
                        skip = 22,                                         ### skip to row 22
                        nrows = 18)                                        ### read 18 rows

nov17todec4_sleep <- read.csv("fitbit_export_20161206.csv",                ### read file
                              stringsAsFactors = FALSE,                    ### don't read strings as factors
                              skip = 43,                                   ### skip to row 43
                              nrows = 19)                                  ### read 19 rows

twoweeks <- cbind(nov17todec4,                                             ### bind the two dataframes from
                  nov17todec4_sleep[, 2:5])                                ### the same csv file by column

threeweeks <- rbind(week[, -c(2:4)],                                       ### bind the two dataframes by row
                    twoweeks)                                              
threeweeks$Calories.Burned <- gsub(",",                                    ### remove commas
                                   "",                    
                                   threeweeks$Calories.Burned)
threeweeks$Steps <- gsub(",",                                              ### remove commas
                         "",
                         threeweeks$Steps)              
threeweeks$Activity.Calories <- gsub(",",                                  ### remove commas
                                     "",
                                     threeweeks$Activity.Calories)
threeweeks$Minutes.Sedentary <- gsub(",",                                  ### remove commas
                                     "",
                                     threeweeks$Minutes.Sedentary)
threeweeks$Calories.Burned <- as.numeric(threeweeks$Calories.Burned)       ### change to class numeric
threeweeks$Steps <- as.numeric(threeweeks$Steps)                           ### change to class numeric
threeweeks$Minutes.Sedentary <- as.numeric(threeweeks$Minutes.Sedentary)   ### change to class numeric
threeweeks$Activity.Calories <- as.numeric(threeweeks$Activity.Calories)   ### change to class numeric

Below are the plots showing the linear relationship between calories burned and the number of floors climbed and below that another plot showing the linear relationship between calories burned and the number of steps taken. The blue line represents the least square line.

p1 <- threeweeks %>%                            
        filter(Calories.Burned > 2000) %>%     ### remove days with incomplete data
        ggplot(aes(x = Floors,                 ### plot number of floors on x axis
                   y = Calories.Burned)) +     ### plot number of cal burned on y axis
        geom_point() +                         ### represent data by points
        geom_smooth(method = "lm")             ### show least square line

p2 <- threeweeks %>%
        filter(Calories.Burned > 2000) %>%
        ggplot(aes(x = Steps,                  ### plot number of stepss on x axis
                   y = Calories.Burned)) +     ### plot number of cal burned on y axis
        geom_point() +                         ### represent data by points
        geom_smooth(method = "lm")             ### show least square line

grid.arrange(p1, p2)                           ### show plots in single column

You might be wondering what’s the practical use of doing a regression on the number of steps or floors on the amount of calories burned? For me the answer was quite obvious.

I used to work with kids with diabetes and even for those who had great self-control, occasions arise when they take in more calories than they should. It could be because of circumstances like joining your friends in a friendly chat at a fast food place where there are no healthy alternatives or simply because a commercial of Reeses Chocolate cups flashed on the tv and you couldn’t resist.

Most diabetic kids know that the easiest way to reduce the blood sugar would be to compensate with an additional volume of insulin. Howerver, you do that often enough and you start to gain weight.

Living in a developing country can make you appreciate how precious every drop of insulin and motivate you to search for other alternatives on how to burn those extra calories. Especially when mom or dad is on your case because of the amount of insulin you consume per month.

Walking an extra block or climbing a couple of flights of stairs in the mall before going home might do the trick. The plots above can help you determine how many steps or flights of stairs you need to do in order to burn the extra calories you took. You can also check the slope of the regression line by doing the following in R.

threeweeks %>%
        filter(Calories.Burned > 2000) %>%        ### remove days with incomplete data
        lm(data = .,                              ### Calories burned as dependent variable
           Calories.Burned ~ Floors) %>%          ### Floors as independent variable
        summary()                                 ### show summary of linear regression

## 
## Call:
## lm(formula = Calories.Burned ~ Floors, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -443.72 -251.50   10.87  215.37  497.80 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 2777.624    118.847  23.372 0.000000000000000541 ***
## Floors        12.172      5.257   2.316               0.0313 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 276.3 on 20 degrees of freedom
## Multiple R-squared:  0.2114, Adjusted R-squared:  0.172 
## F-statistic: 5.362 on 1 and 20 DF,  p-value: 0.03132

threeweeks %>%
        filter(Calories.Burned > 2000) %>%        ### remove days with incomplete data
        lm(data = .,                              ### Calories burned as dependent variable
           Calories.Burned ~ Steps) %>%           ### Steps as independent variable
        summary()                                 ### Calories burned as dependent variable

## 
## Call:
## lm(formula = Calories.Burned ~ Steps, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -325.87 -224.04  -42.04  198.62  411.64 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2053.50158  355.69424   5.773 0.000012 ***
## Steps          0.07606    0.02773   2.743   0.0125 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 265.2 on 20 degrees of freedom
## Multiple R-squared:  0.2733, Adjusted R-squared:  0.237 
## F-statistic: 7.522 on 1 and 20 DF,  p-value: 0.01255

Based on the summary above, for every floor I climb, I will burn about 12 calories. In terms of steps, 1 step is equivalent to 0.076 calories burned. For me that’s very usefull! Alas, the cost of a tracker is quite prohibitive for most families with type 1 diabetes in a developing country.

Sleep

One thing that struck me while working as a call center agent was the volume of callers who were concerned about the number of hours of sleep they were getting. We can create a plot of the variable sleep in R.

threeweeks %>%
        ggplot(aes(x = Date,                                     ### Plot date on x axis
                   y = Minutes.Asleep)) +                        ### Plot minutes on y axis
        geom_bar(stat = "identity",                              ### specify stat as identity
                 fill = "steelblue") +                           ### specify color of bars
        geom_hline(yintercept = mean(threeweeks$Minutes.Asleep), ### show mean as horizontal line
                   color = "salmon") +                           ### color line as salmon
        geom_hline(yintercept = 480,                             ### horizontal line at 480 minutes
                   color = "turquoise")                          ### color line as turquoise

I was averaging 3.88 hours of sleep (salmon colored horizontal line on the plot) during that three week period but that is inaccurate due to certains days when i failed to wear the tracker while sleeping or had to return the tracker to the company. I remember fondly those rare days when i could get a full 8 hours of sleep (turquoise colored horizontal line on the plot).

Conclusion

Wearable sensors are great motivators for individuals and groups of people to exercise by providing feedback. It can help set goals that are realistic and realizable based on past performance and provide a measure of one’s achievement.

Creating custom visualization in R is a fun way to view the data one has accumulated.

sessionInfo()

## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7600)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] trackeR_1.0.0      zoo_1.8-1          bindrcpp_0.2.2    
## [4] ggthemes_3.4.2     gridExtra_2.3      ggplot2_2.2.1.9000
## [7] leaflet_2.0.0      lubridate_1.7.4    dplyr_0.7.4       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16           lattice_0.20-35        prettyunits_1.0.2     
##  [4] png_0.1-7              assertthat_0.2.0       rprojroot_1.3-2       
##  [7] digest_0.6.15          utf8_1.1.3             mime_0.5              
## [10] R6_2.2.2               plyr_1.8.4             backports_1.1.2       
## [13] evaluate_0.10.1        pillar_1.2.2           RgoogleMaps_1.4.1     
## [16] rlang_0.2.0.9000       progress_1.1.2.9002    lazyeval_0.2.1        
## [19] rstudioapi_0.7.0-9000  Matrix_1.2-14          rmarkdown_1.9.8       
## [22] labeling_0.3           stringr_1.3.0          selectr_0.4-1         
## [25] ansistrings_1.0.0.9000 htmlwidgets_1.2        munsell_0.4.3         
## [28] shiny_1.0.5            compiler_3.4.1         httpuv_1.4.1          
## [31] pkgconfig_2.0.1        mgcv_1.8-23            htmltools_0.3.6       
## [34] tibble_1.4.2           XML_3.98-1.11          crayon_1.3.4          
## [37] withr_2.1.2            later_0.7.2            bitops_1.0-6          
## [40] grid_3.4.1             nlme_3.1-137           jsonlite_1.5          
## [43] xtable_1.8-2           gtable_0.2.0           magrittr_1.5          
## [46] scales_0.5.0.9000      cli_1.0.0.9001         stringi_1.1.7         
## [49] reshape2_1.4.3         promises_1.0.1         xml2_1.2.0            
## [52] rjson_0.2.15           rematch2_2.0.1         tools_3.4.1           
## [55] ggmap_2.7.900          glue_1.2.0             hms_0.4.2             
## [58] crosstalk_1.0.0        jpeg_0.1-8             yaml_2.1.19           
## [61] colorspace_1.3-2       knitr_1.20             bindr_0.1.1

Wearable Body Sensors

Edmund Julian Ofilada

March 13, 2018

Wearable Body Sensors

Synopsis

What is a tracker

The Data

Intraday data

Total Number of Steps per day

Prettier plot

Mean Number of Steps per day of the week

Steps throughout the day

Mean number of steps throughout the day

GPS data

Read in the data

Convert data frame to time series data

Summary

Plotting Heartbeat and pace

Mapping the run

Stationary bike data

Summary data

Plotting heart rate and pace

Linear Regression

Sleep

Conclusion