---
title: "Introduction to R (Part 3)"
author: "PiBS Professional Skill Set"
output: html_document
---
### Introduction
There are 3 main environments in R to create graphics:
* R base utilities
* lattice package
* ggplot2 package
There is also powerful "grid" package and a couple in-development interactive packages
In this tutorial we will concentrate on a popular ggplot2 library
**Useful Online resources:**
[The R Graph Gallery: https://www.r-graph-gallery.com/](https://www.r-graph-gallery.com/)
[ggplot Quick reference: http://r-statistics.co/ggplot2-cheatsheet.html](http://r-statistics.co/ggplot2-cheatsheet.html)
```{r }
#load package
library(ggplot2)
```
### Epilepsy attacks dataset
Data from a clinical trial of 59 patients with epilepsy (Breslow, 1996).
Thall and Vail reported data from a clinical trial of 59 patients with epilepsy, 31 of whom were
randomized to receive the anti-epilepsy drug Progabide and 28 of whom received a placebo.
Baseline data consisted of the patient's age and the number of epileptic seizures recorded during 8 week
period prior to randomization.
The response consisted of counts of seizures occuring during the four consecutive
follow-up periods of two weeks each.
* ID - Patient identification number
* Y1 - Number of epilepsy attacks patients have during the first follow-up period
* Y2 - Number of epilepsy attacks patients have during the second follow-up period
* Y3 - Number of epilepsy attacks patients have during the third follow-up period
* Y4 - Number of epilepsy attacks patients have during the forth follow-up period
* Base - Number of epileptic attacks recorded during 8 week period prior to randomization
* Age - Age of the patients
* Trt - a factor with levels placebo progabide indicating whether the anti-epilepsy drug Progabide has been applied or not
* Ysum - Total number of epilepsy attacks patients have during the four follow-up periods
* Age10 - Age of the patients devided by 10
* Base4 - Variable Base devided by 4
```{r comment=""}
# Read data
dt <- read.csv("http://rcs.bu.edu/classes/FC764/epilepsy.csv")
# Explore the data
str(dt)
head(dt)
```
### ggplot() function
ggplot package works with dataframes.
The composition of ggplot2 calls has a few parts. We will start with 3 of them:
1. A data set
2. The aesthetic mapping ( aes() )
3. A geometric object ( geom_ )
ggplot( dataset, aes( x = var1, y = var2) ) + geom_*()
Building the plot is done by further adding more layers to the existing plot.
```{r}
ggplot(dt, aes(x=Base, y=Ysum))
```
Once the plot is initialized we can start adding "geom" layers to it.
To make a simple scatterplot we use geom_point()
```{r}
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point()
```
We can add a prediction line through the points using geom_smooth(). By default it adds confidense bands to the graph:
```{r, warning=FALSE, message=FALSE}
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth()
```
If we want to draw a linear model regression line, we can specify *lm* method:
```{r, warning=FALSE, message=FALSE}
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth(method="lm")
```
To find all various options to geom_smooth see help for this function
The graph can be stored in R object and additional layers can be added to it later:
```{r, warning=FALSE, message=FALSE}
g <- ggplot(dt, aes(x=Base, y=Ysum))
g1 <- g + geom_point()
g2 <- g1 + geom_smooth(method="lm")
g2
```
### Specifying the title and axis labels
Let's add a title and labels to our existing plot:
```{r}
g2 + ggtitle("Epilepsy attacks",
subtitle="based on the data from Breslow, 1996") +
xlab("Number of attacks prior to randomization") +
ylab("Number of attacks after randomization")
```
Similar result can be achived using a single function:
```{r}
g2 + labs(title = "Epilepsy attacks",
caption="Breslow, 1996",
x="Number of attacks prior to randomization",
y="Number of attacks after randomization")
```
### Color and Size of the points
The color and size of the elements in the plot are controlled through the aes() function of the related geom :
```{r}
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(col = "blue", size=3) + geom_smooth(method="lm", col="brown")
```
Change color based on the value of categorical variable use aes() function:
```{r}
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")
```
We can move position of the legend with the theme() function:
```{r}
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")+
theme( legend.position="bottom")+
labs(color="Treatment") # change legend title
```
Set legned.position to *None* if you want to remove legend completly
#### ggplot2 themes
There are a few themes you can use:
* theme_gray
* theme_bw
* theme_light
* theme_dark
* theme_minimal
* theme_classic
* theme_linedraw
* theme_void
Final scatterplot:
```{r}
g <- ggplot(dt, aes(x=Base, y=Ysum)) + geom_point(aes(col=Trt), size=5) + geom_smooth(method="lm", col="firebrick")+
theme( legend.position="bottom")+ ggtitle("Epilepsy attacks",
subtitle="based on the data from Breslow, 1996") +
xlab("Number of attacks prior to randomization") +
ylab("Number of attacks after randomization")+
labs(color="Treatment")
g
```
### Boxplot
```{r}
g<- ggplot(dt, aes(x=Trt, y=Ysum)) + geom_boxplot()
g
```
#### Exersize:
Change X and Y labels and add a title to the plot
```{r}
g + geom_jitter(width=0.2)
```
Let's change the size of the title and make it bold:
```{r}
g + geom_jitter(width=0.2)+
ggtitle("Number of attacks vs. Treatment")+
theme ( plot.title=element_text(size=14,face="bold" ) )
```
### Histogram
```{r}
g<- ggplot(dt, aes(x=Age)) + geom_histogram()
g
```
By default the number of bins is not necessarily optimal. Let's try some other number of bins
```{r}
g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5)
g
```
Let's improve the appearance:
```{r}
g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5, col="black", fill="white")
g
```
We can add density curve to the plot:
```{r}
g<-ggplot(dt, aes(x=Age)) +
geom_histogram(binwidth=5, col="black", fill="white", aes(y=..density..)) +
geom_density( alpha=.2, fill="darkorange")
g
```
### A few more useful links:
* Data Visualization with ggplot2: [http://r4ds.had.co.nz/data-visualisation.html](http://r4ds.had.co.nz/data-visualisation.html)
* Graphics for Communication: [http://r4ds.had.co.nz/graphics-for-communication.html](http://r4ds.had.co.nz/graphics-for-communication.html)
* A ggplot2 tutorial with examples: [http://r-statistics.co/](http://r-statistics.co/)