Introduction

There are 3 main environments in R to create graphics: * R base utilities * lattice package * ggplot2 package There is also powerful “grid” package and a couple in-development interactive packages

In this tutorial we will concentrate on a popular ggplot2 library

Useful Online resources:

The R Graph Gallery: https://www.r-graph-gallery.com/
ggplot Quick reference: http://r-statistics.co/ggplot2-cheatsheet.html

#load package
library(ggplot2)

Epilepsy attacks dataset

Data from a clinical trial of 59 patients with epilepsy (Breslow, 1996). Thall and Vail reported data from a clinical trial of 59 patients with epilepsy, 31 of whom were randomized to receive the anti-epilepsy drug Progabide and 28 of whom received a placebo.
Baseline data consisted of the patient’s age and the number of epileptic seizures recorded during 8 week period prior to randomization.
The response consisted of counts of seizures occuring during the four consecutive follow-up periods of two weeks each.

# Read data
dt <- read.csv("http://rcs.bu.edu/classes/FC764/epilepsy.csv")

# Explore the data
str(dt)
'data.frame':   59 obs. of  11 variables:
 $ ID   : int  104 106 107 114 116 118 123 126 130 135 ...
 $ Y1   : int  5 3 2 4 7 5 6 40 5 14 ...
 $ Y2   : int  3 5 4 4 18 2 4 20 6 13 ...
 $ Y3   : int  3 3 0 1 9 8 0 23 6 6 ...
 $ Y4   : int  3 3 5 4 21 7 2 12 5 0 ...
 $ Base : int  11 11 6 8 66 27 12 52 23 10 ...
 $ Age  : int  31 30 25 36 22 29 31 42 37 28 ...
 $ Trt  : chr  "placebo" "placebo" "placebo" "placebo" ...
 $ Ysum : int  14 14 11 13 55 22 12 95 22 33 ...
 $ Age10: num  3.1 3 2.5 3.6 2.2 2.9 3.1 4.2 3.7 2.8 ...
 $ Base4: num  2.75 2.75 1.5 2 16.5 6.75 3 13 5.75 2.5 ...
head(dt)
   ID Y1 Y2 Y3 Y4 Base Age     Trt Ysum Age10 Base4
1 104  5  3  3  3   11  31 placebo   14   3.1  2.75
2 106  3  5  3  3   11  30 placebo   14   3.0  2.75
3 107  2  4  0  5    6  25 placebo   11   2.5  1.50
4 114  4  4  1  4    8  36 placebo   13   3.6  2.00
5 116  7 18  9 21   66  22 placebo   55   2.2 16.50
6 118  5  2  8  7   27  29 placebo   22   2.9  6.75



ggplot() function

ggplot package works with dataframes.

The composition of ggplot2 calls has a few parts. We will start with 3 of them:

  1. A data set
  2. The aesthetic mapping ( aes() )
  3. A geometric object ( geom_ )

ggplot( dataset, aes( x = var1, y = var2) ) + geom_*()

Building the plot is done by further adding more layers to the existing plot.

ggplot(dt, aes(x=Base, y=Ysum))

Once the plot is initialized we can start adding “geom” layers to it. To make a simple scatterplot we use geom_point()

ggplot(dt, aes(x=Base, y=Ysum)) + geom_point()

We can add a prediction line through the points using geom_smooth(). By default it adds confidense bands to the graph:

ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth()

If we want to draw a linear model regression line, we can specify lm method:

ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth(method="lm")

To find all various options to geom_smooth see help for this function

The graph can be stored in R object and additional layers can be added to it later:

g <- ggplot(dt, aes(x=Base, y=Ysum)) 
g1 <- g + geom_point() 
g2 <- g1 + geom_smooth(method="lm")
g2



Specifying the title and axis labels

Let’s add a title and labels to our existing plot:

g2 + ggtitle("Epilepsy attacks",
             subtitle="based on the data from Breslow, 1996") +
  xlab("Number of attacks prior to randomization") + 
  ylab("Number of attacks after randomization")
## `geom_smooth()` using formula 'y ~ x'

Similar result can be achived using a single function:

g2 + labs(title = "Epilepsy attacks",
          caption="Breslow, 1996",
          x="Number of attacks prior to randomization", 
          y="Number of attacks after randomization")
## `geom_smooth()` using formula 'y ~ x'



Color and Size of the points

The color and size of the elements in the plot are controlled through the aes() function of the related geom :

g <- ggplot(dt, aes(x=Base, y=Ysum)) 
g + geom_point(col = "blue", size=3) + geom_smooth(method="lm", col="brown")
## `geom_smooth()` using formula 'y ~ x'

Change color based on the value of categorical variable use aes() function:

g <- ggplot(dt, aes(x=Base, y=Ysum)) 
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")
## `geom_smooth()` using formula 'y ~ x'

We can move position of the legend with the theme() function:

g <- ggplot(dt, aes(x=Base, y=Ysum)) 
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")+
  theme( legend.position="bottom")+
  labs(color="Treatment")   # change legend title
## `geom_smooth()` using formula 'y ~ x'

Set legned.position to None if you want to remove legend completly

ggplot2 themes

There are a few themes you can use:

  • theme_gray
  • theme_bw
  • theme_light
  • theme_dark
  • theme_minimal
  • theme_classic
  • theme_linedraw
  • theme_void

Final scatterplot:

g <- ggplot(dt, aes(x=Base, y=Ysum)) + geom_point(aes(col=Trt), size=5) + geom_smooth(method="lm", col="firebrick")+
  theme( legend.position="bottom")+ ggtitle("Epilepsy attacks",
             subtitle="based on the data from Breslow, 1996") +
  xlab("Number of attacks prior to randomization") + 
  ylab("Number of attacks after randomization")+
  labs(color="Treatment")   
g
## `geom_smooth()` using formula 'y ~ x'

Boxplot

g<- ggplot(dt, aes(x=Trt, y=Ysum)) + geom_boxplot()
g

Exersize:

Change X and Y labels and add a title to the plot

g + geom_jitter(width=0.2)

Let’s change the size of the title and make it bold:

g + geom_jitter(width=0.2)+
  ggtitle("Number of attacks vs. Treatment")+
  theme ( plot.title=element_text(size=14,face="bold" ) )

Histogram

g<- ggplot(dt, aes(x=Age)) + geom_histogram()
g
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

By default the number of bins is not necessarily optimal. Let’s try some other number of bins

g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5)
g

Let’s improve the appearance:

g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5, col="black", fill="white")
  
g

We can add density curve to the plot:

g<-ggplot(dt, aes(x=Age)) + 
  geom_histogram(binwidth=5, col="black", fill="white", aes(y=..density..)) +
  geom_density( alpha=.2, fill="darkorange")
  
g