--- title: "Introduction to R (Part 3)" author: "PiBS Professional Skill Set" output: html_document ---

### Introduction There are 3 main environments in R to create graphics: * R base utilities * lattice package * ggplot2 package There is also powerful "grid" package and a couple in-development interactive packages In this tutorial we will concentrate on a popular ggplot2 library **Useful Online resources:** [The R Graph Gallery: https://www.r-graph-gallery.com/](https://www.r-graph-gallery.com/) [ggplot Quick reference: http://r-statistics.co/ggplot2-cheatsheet.html](http://r-statistics.co/ggplot2-cheatsheet.html) ```{r } #load package library(ggplot2) ``` ### Epilepsy attacks dataset Data from a clinical trial of 59 patients with epilepsy (Breslow, 1996). Thall and Vail reported data from a clinical trial of 59 patients with epilepsy, 31 of whom were randomized to receive the anti-epilepsy drug Progabide and 28 of whom received a placebo.
Baseline data consisted of the patient's age and the number of epileptic seizures recorded during 8 week period prior to randomization.
The response consisted of counts of seizures occuring during the four consecutive follow-up periods of two weeks each. * ID - Patient identification number * Y1 - Number of epilepsy attacks patients have during the first follow-up period * Y2 - Number of epilepsy attacks patients have during the second follow-up period * Y3 - Number of epilepsy attacks patients have during the third follow-up period * Y4 - Number of epilepsy attacks patients have during the forth follow-up period * Base - Number of epileptic attacks recorded during 8 week period prior to randomization * Age - Age of the patients * Trt - a factor with levels placebo progabide indicating whether the anti-epilepsy drug Progabide has been applied or not * Ysum - Total number of epilepsy attacks patients have during the four follow-up periods * Age10 - Age of the patients devided by 10 * Base4 - Variable Base devided by 4 ```{r comment=""} # Read data dt <- read.csv("http://rcs.bu.edu/classes/FC764/epilepsy.csv") # Explore the data str(dt) head(dt) ```

### ggplot() function ggplot package works with dataframes. The composition of ggplot2 calls has a few parts. We will start with 3 of them: 1. A data set 2. The aesthetic mapping ( aes() ) 3. A geometric object ( geom_ ) ggplot( dataset, aes( x = var1, y = var2) ) + geom_*() Building the plot is done by further adding more layers to the existing plot. ```{r} ggplot(dt, aes(x=Base, y=Ysum)) ``` Once the plot is initialized we can start adding "geom" layers to it. To make a simple scatterplot we use geom_point() ```{r} ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() ``` We can add a prediction line through the points using geom_smooth(). By default it adds confidense bands to the graph: ```{r, warning=FALSE, message=FALSE} ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth() ``` If we want to draw a linear model regression line, we can specify *lm* method: ```{r, warning=FALSE, message=FALSE} ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth(method="lm") ``` To find all various options to geom_smooth see help for this function The graph can be stored in R object and additional layers can be added to it later: ```{r, warning=FALSE, message=FALSE} g <- ggplot(dt, aes(x=Base, y=Ysum)) g1 <- g + geom_point() g2 <- g1 + geom_smooth(method="lm") g2 ```

### Specifying the title and axis labels Let's add a title and labels to our existing plot: ```{r} g2 + ggtitle("Epilepsy attacks", subtitle="based on the data from Breslow, 1996") + xlab("Number of attacks prior to randomization") + ylab("Number of attacks after randomization") ``` Similar result can be achived using a single function: ```{r} g2 + labs(title = "Epilepsy attacks", caption="Breslow, 1996", x="Number of attacks prior to randomization", y="Number of attacks after randomization") ```

### Color and Size of the points The color and size of the elements in the plot are controlled through the aes() function of the related geom : ```{r} g <- ggplot(dt, aes(x=Base, y=Ysum)) g + geom_point(col = "blue", size=3) + geom_smooth(method="lm", col="brown") ``` Change color based on the value of categorical variable use aes() function: ```{r} g <- ggplot(dt, aes(x=Base, y=Ysum)) g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick") ``` We can move position of the legend with the theme() function: ```{r} g <- ggplot(dt, aes(x=Base, y=Ysum)) g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")+ theme( legend.position="bottom")+ labs(color="Treatment") # change legend title ``` Set legned.position to *None* if you want to remove legend completly #### ggplot2 themes There are a few themes you can use: * theme_gray * theme_bw * theme_light * theme_dark * theme_minimal * theme_classic * theme_linedraw * theme_void Final scatterplot: ```{r} g <- ggplot(dt, aes(x=Base, y=Ysum)) + geom_point(aes(col=Trt), size=5) + geom_smooth(method="lm", col="firebrick")+ theme( legend.position="bottom")+ ggtitle("Epilepsy attacks", subtitle="based on the data from Breslow, 1996") + xlab("Number of attacks prior to randomization") + ylab("Number of attacks after randomization")+ labs(color="Treatment") g ``` ### Boxplot ```{r} g<- ggplot(dt, aes(x=Trt, y=Ysum)) + geom_boxplot() g ``` #### Exersize: Change X and Y labels and add a title to the plot ```{r} g + geom_jitter(width=0.2) ``` Let's change the size of the title and make it bold: ```{r} g + geom_jitter(width=0.2)+ ggtitle("Number of attacks vs. Treatment")+ theme ( plot.title=element_text(size=14,face="bold" ) ) ``` ### Histogram ```{r} g<- ggplot(dt, aes(x=Age)) + geom_histogram() g ``` By default the number of bins is not necessarily optimal. Let's try some other number of bins ```{r} g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5) g ``` Let's improve the appearance: ```{r} g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5, col="black", fill="white") g ``` We can add density curve to the plot: ```{r} g<-ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5, col="black", fill="white", aes(y=..density..)) + geom_density( alpha=.2, fill="darkorange") g ``` ### A few more useful links: * Data Visualization with ggplot2: [http://r4ds.had.co.nz/data-visualisation.html](http://r4ds.had.co.nz/data-visualisation.html) * Graphics for Communication: [http://r4ds.had.co.nz/graphics-for-communication.html](http://r4ds.had.co.nz/graphics-for-communication.html) * A ggplot2 tutorial with examples: [http://r-statistics.co/](http://r-statistics.co/)