For this project I will be showing how to include data from a national survey in an interactive map in R. Data that include geographic information are sometimes best presented using maps in order to highlight the geographic difference in the results of the survey.
The data we will be using will be the National Oral Health Survey (NOHS) of 2006 conducted by the Department of Education under the direction of Dr. Juan Araojo Jr, Dr. Bella Monse, Dr. Susan Mabunga, Prof Jesus Sarol, and Prof. Dr. Roswitha Heinrich Weltzien.
The study employed a modified, stratified cluster sampling design. For each region, two urban and two rural schools were selected. If you want to find out more about the survey just click on this link.
We will be using the R package Mapview authored by Tim Appelhans, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer to create the interactive map. Interactive maps are great for enlisting user participation and is a great teaching and learning tool. Aside from the visual information that is delivered, the experience of changing the settings according to one’s own interest or predisposition allow the information to be received with greater chance of retention due to its association with a previous memory.
library(dplyr)
library(mapview)
library(raster)
library(tigris)
Shapefiles are files that contain information that can be used to create maps. Different type of administrative maps are available for download at the PhilGIS website, ranging from the country level down to the level of Barangays, the smallest administrative unit in the Philippines. Instead of clicking on the link on the website to download the zipfile, we will be using R to download and unzip the file.
if (!"stormData.csv.bz2" %in% dir("./")) {
fileURL="http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile="stormData.csv.bz2", mode="wb")
}
if (!"stormData.csv" %in% dir("./")) {
bunzip2("stormData.csv.bz2", remove=FALSE)
}download.file("https://tinyurl.com/ybtpkwxz",
destfile = "census.zip", mode = "wb")
unzip("census.zip") # unzip the files
census_de = readr::read_csv2(list.files(pattern = "Gitter.csv"))
To see the content of the zip file we can use the list.files
function in R.
list.files("./regions")
## [1] "Regions.dbf" "Regions.prj" "Regions.sbn" "Regions.sbx"
## [5] "Regions.shp" "Regions.shp.xml" "Regions.shx"
We are interested in the file named Regions.shp
which contains the feature geometry (polygons) and read it in R using the shapefile
function from the raster
package. We will save it in an R object, which we will name regions
. A description of the `regions can be found below.
regions <- shapefile("./regions/Regions.shp")
regions
## class : SpatialPolygonsDataFrame
## features : 17
## extent : 116.9283, 126.6053, 4.58694, 21.07014 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 1
## names : REGION
## min values : Autonomous Region of Muslim Mindanao (ARMM)
## max values : Zamboanga Peninsula (Region IX)
To see the structure of Regions.shp
we can use the str
function.
str(regions, 2)
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 17 obs. of 1 variable:
## ..@ polygons :List of 17
## ..@ plotOrder : int [1:17] 4 12 6 10 16 9 5 8 15 2 ...
## ..@ bbox : num [1:2, 1:2] 116.93 4.59 126.61 21.07
## .. ..- attr(*, "dimnames")=List of 2
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
We see that region is of class SpatialPolygonsDataFrame and contains 5 slots.
the data slot contains the attributes or variables that describe the shapefiles.
the polygons slot contains the list of polygons or shapes from the.shp file.
the plotOrder is the index file that tells the order in which the map is to be plotted.
the bbox contains the limits of the map in terms of latitude and longitude.
the proj4string contains the map projection or the coordinate reference system.
Click on the tab at the far right labelled code to see how to access the content of a slot.
head(regions@data) ###first 6 rows of data
## REGION
## 0 Autonomous Region of Muslim Mindanao (ARMM)
## 1 Bicol Region (Region V)
## 2 CALABARZON (Region IV-A)
## 3 Cagayan Valley (Region II)
## 4 Caraga (Region XIII)
## 5 Central Luzon (Region III)
Shown above are the first 6 rows of the attribute REGION
in the data slot. it contains the names of the different administrative regions of the Philippines. We will save the unique entries of the Region variable in an R object called reg_names. We will be using it later to create the dataframe that will contain the results of the NOHS survey.
reg_names <- unique(regions@data$REGION)
reg_names
## [1] "Autonomous Region of Muslim Mindanao (ARMM)"
## [2] "Bicol Region (Region V)"
## [3] "CALABARZON (Region IV-A)"
## [4] "Cagayan Valley (Region II)"
## [5] "Caraga (Region XIII)"
## [6] "Central Luzon (Region III)"
## [7] "Central Visayas (Region VII)"
## [8] "Cordillera Administrative Region (CAR)"
## [9] "Davao Region (Region XI)"
## [10] "Eastern Visayas (Region VIII)"
## [11] "Ilocos Region (Region I)"
## [12] "MIMAROPA (Region IV-B)"
## [13] "Metropolitan Manila"
## [14] "Northern Mindanao (Region X)"
## [15] "SOCCSKSARGEN (Region XII)"
## [16] "Western Visayas (Region VI)"
## [17] "Zamboanga Peninsula (Region IX)"
Next we create a vector of the results of the survey pertaining to caries or tooth decay by region
I have a copy of the complete report of the NOHS given to me by Dr.Susan Mabunga. The report details the results of the survey according to the administrative regions of the Philippines. I will lift the data from the pdf file and create a dataframe in R containing the variable DMFT and BMI.
DMFT <- c(3.1, 2.5, 2.3, 2.3, 3.3, 2.6, 1.9, 2.9, 4.0, 3.7, 2.2, 2.7, 3.9, 3.5, 3.1, 2.1, 3.3)
BMI <-c(30.0, 31.7, 25.8, 20.3, 32.5, 9.2, 26.9, 18.6, 37.3, 29.1, 25.2, 38.7, 27.6, 31.6, 29.2, 26.1, 37.3)
mydf <- data.frame(list(REGION = reg_names, dmft = DMFT, bmi = BMI))
mydf
## REGION dmft bmi
## 1 Autonomous Region of Muslim Mindanao (ARMM) 3.1 30.0
## 2 Bicol Region (Region V) 2.5 31.7
## 3 CALABARZON (Region IV-A) 2.3 25.8
## 4 Cagayan Valley (Region II) 2.3 20.3
## 5 Caraga (Region XIII) 3.3 32.5
## 6 Central Luzon (Region III) 2.6 9.2
## 7 Central Visayas (Region VII) 1.9 26.9
## 8 Cordillera Administrative Region (CAR) 2.9 18.6
## 9 Davao Region (Region XI) 4.0 37.3
## 10 Eastern Visayas (Region VIII) 3.7 29.1
## 11 Ilocos Region (Region I) 2.2 25.2
## 12 MIMAROPA (Region IV-B) 2.7 38.7
## 13 Metropolitan Manila 3.9 27.6
## 14 Northern Mindanao (Region X) 3.5 31.6
## 15 SOCCSKSARGEN (Region XII) 3.1 29.2
## 16 Western Visayas (Region VI) 2.1 26.1
## 17 Zamboanga Peninsula (Region IX) 3.3 37.3
We just created a dataframe called mydf with the following variables:
REGION - The names of the administrative regions of the Philippines copied from the shapefile.
dmft - the mean number of decayed, missing, and filled teeth among 12 years olds surveyed in each region.
bmi - the mean percent of the population in the region that has a Body Mass Index (BMI) that is below what is considered normal.
I chose to include the dmft variable for this project because I was a dentist before I discovered R programming. I used to extract teeth now I extract bytes. Oral health, particularly caries or tooth decay remains a significant Public Health problem in the Philippines to this day.
I included the BMI data because in a visit to the island of Samar in 2013, I came to an epiphany about the state of malnutrition in our country. For some time before 2013, I have been receiving an invitation to conduct a dental mission in the town of Mondragon, Samar. I finally decided to hop on a bus and acquiesce to the invitation bringing with me my dental instruments and anesthesia.
I also brought with me used school uniforms consisting of white polo shirts and khaki pants, the usual elementary uniform for the Philippines. I solicited these from my fellow parents in my son’s school. I was going to donate them to the local Public School in Mondragon for the children who can’t afford to buy their own uniform. The uniforms I solicited was from children in grade school who were about to cross to High School which has a different set of uniforms. Hence, the parents were glad to give the old unforms away.
We’ll continue the story later. Meanwhile let’s get back to creating our interactive map. We will now merge the mydf dataframe containing the DMFT and BMI information with the SpatialPolygonsDataFrame regions that contain the information how to create the philippine map.
merged_df <- geo_join(regions, mydf, "REGION", "REGION")
We now have an R object called merged_df
, of class SpatialPolygonsDataFrame, that contains the DMFT and BMI data we want to show.
head(merged_df@data)
## REGION
## 0 Autonomous Region of Muslim Mindanao (ARMM)
## 1 Bicol Region (Region V)
## 2 CALABARZON (Region IV-A)
## 3 Cagayan Valley (Region II)
## 4 Caraga (Region XIII)
## 5 Central Luzon (Region III)
## REGION.1 dmft bmi
## 0 Autonomous Region of Muslim Mindanao (ARMM) 3.1 30.0
## 1 Bicol Region (Region V) 2.5 31.7
## 2 CALABARZON (Region IV-A) 2.3 25.8
## 3 Cagayan Valley (Region II) 2.3 20.3
## 4 Caraga (Region XIII) 3.3 32.5
## 5 Central Luzon (Region III) 2.6 9.2
let’s now examine the NOHS results for the island of Luzon in an interactive map.
luzon <- merged_df[merged_df$REGION == "Ilocos Region (Region I)" | merged_df$REGION == "Cagayan Valley (Region II)" | merged_df$REGION == "Central Luzon (Region III)" | merged_df$REGION == "Cordillera Administrative Region (CAR)" | merged_df$REGION == "CALABARZON (Region IV-A)" | merged_df$REGION == "Metropolitan Manila" | merged_df$REGION == "MIMAROPA (Region IV-B)" | merged_df$REGION == "Bicol Region (Region V)", ]
mapview(luzon, ### data we want to plot
color = "springgreen", ### color of the tiles
popup = popupTable(luzon,
zcol = c("REGION",
"dmft", ### variables to display
"bmi")))