There are 3 main environments in R to create graphics: * R base utilities * lattice package * ggplot2 package There is also powerful “grid” package and a couple in-development interactive packages
In this tutorial we will concentrate on a popular ggplot2 library
Useful Online resources:
The R Graph Gallery:
https://www.r-graph-gallery.com/
ggplot Quick
reference: http://r-statistics.co/ggplot2-cheatsheet.html
#load package
library(ggplot2)
Data from a clinical trial of 59 patients with epilepsy (Breslow,
1996). Thall and Vail reported data from a clinical trial of 59 patients
with epilepsy, 31 of whom were randomized to receive the anti-epilepsy
drug Progabide and 28 of whom received a placebo.
Baseline data
consisted of the patient’s age and the number of epileptic seizures
recorded during 8 week period prior to randomization.
The response
consisted of counts of seizures occuring during the four consecutive
follow-up periods of two weeks each.
# Read data
dt <- read.csv("http://rcs.bu.edu/classes/FC764/epilepsy.csv")
# Explore the data
str(dt)
'data.frame': 59 obs. of 11 variables:
$ ID : int 104 106 107 114 116 118 123 126 130 135 ...
$ Y1 : int 5 3 2 4 7 5 6 40 5 14 ...
$ Y2 : int 3 5 4 4 18 2 4 20 6 13 ...
$ Y3 : int 3 3 0 1 9 8 0 23 6 6 ...
$ Y4 : int 3 3 5 4 21 7 2 12 5 0 ...
$ Base : int 11 11 6 8 66 27 12 52 23 10 ...
$ Age : int 31 30 25 36 22 29 31 42 37 28 ...
$ Trt : chr "placebo" "placebo" "placebo" "placebo" ...
$ Ysum : int 14 14 11 13 55 22 12 95 22 33 ...
$ Age10: num 3.1 3 2.5 3.6 2.2 2.9 3.1 4.2 3.7 2.8 ...
$ Base4: num 2.75 2.75 1.5 2 16.5 6.75 3 13 5.75 2.5 ...
head(dt)
ID Y1 Y2 Y3 Y4 Base Age Trt Ysum Age10 Base4
1 104 5 3 3 3 11 31 placebo 14 3.1 2.75
2 106 3 5 3 3 11 30 placebo 14 3.0 2.75
3 107 2 4 0 5 6 25 placebo 11 2.5 1.50
4 114 4 4 1 4 8 36 placebo 13 3.6 2.00
5 116 7 18 9 21 66 22 placebo 55 2.2 16.50
6 118 5 2 8 7 27 29 placebo 22 2.9 6.75
ggplot package works with dataframes.
The composition of ggplot2 calls has a few parts. We will start with 3 of them:
ggplot( dataset, aes( x = var1, y = var2) ) + geom_*()
Building the plot is done by further adding more layers to the existing plot.
ggplot(dt, aes(x=Base, y=Ysum))
Once the plot is initialized we can start adding “geom” layers to it. To make a simple scatterplot we use geom_point()
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point()
We can add a prediction line through the points using geom_smooth(). By default it adds confidense bands to the graph:
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth()
If we want to draw a linear model regression line, we can specify lm method:
ggplot(dt, aes(x=Base, y=Ysum)) + geom_point() + geom_smooth(method="lm")
To find all various options to geom_smooth see help for this function
The graph can be stored in R object and additional layers can be added to it later:
g <- ggplot(dt, aes(x=Base, y=Ysum))
g1 <- g + geom_point()
g2 <- g1 + geom_smooth(method="lm")
g2
Let’s add a title and labels to our existing plot:
g2 + ggtitle("Epilepsy attacks",
subtitle="based on the data from Breslow, 1996") +
xlab("Number of attacks prior to randomization") +
ylab("Number of attacks after randomization")
## `geom_smooth()` using formula 'y ~ x'
Similar result can be achived using a single function:
g2 + labs(title = "Epilepsy attacks",
caption="Breslow, 1996",
x="Number of attacks prior to randomization",
y="Number of attacks after randomization")
## `geom_smooth()` using formula 'y ~ x'
The color and size of the elements in the plot are controlled through the aes() function of the related geom :
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(col = "blue", size=3) + geom_smooth(method="lm", col="brown")
## `geom_smooth()` using formula 'y ~ x'
Change color based on the value of categorical variable use aes() function:
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")
## `geom_smooth()` using formula 'y ~ x'
We can move position of the legend with the theme() function:
g <- ggplot(dt, aes(x=Base, y=Ysum))
g + geom_point(aes(col=Trt), size=3) + geom_smooth(method="lm", col="firebrick")+
theme( legend.position="bottom")+
labs(color="Treatment") # change legend title
## `geom_smooth()` using formula 'y ~ x'
Set legned.position to None if you want to remove legend completly
There are a few themes you can use:
Final scatterplot:
g <- ggplot(dt, aes(x=Base, y=Ysum)) + geom_point(aes(col=Trt), size=5) + geom_smooth(method="lm", col="firebrick")+
theme( legend.position="bottom")+ ggtitle("Epilepsy attacks",
subtitle="based on the data from Breslow, 1996") +
xlab("Number of attacks prior to randomization") +
ylab("Number of attacks after randomization")+
labs(color="Treatment")
g
## `geom_smooth()` using formula 'y ~ x'
g<- ggplot(dt, aes(x=Trt, y=Ysum)) + geom_boxplot()
g
Change X and Y labels and add a title to the plot
g + geom_jitter(width=0.2)
Let’s change the size of the title and make it bold:
g + geom_jitter(width=0.2)+
ggtitle("Number of attacks vs. Treatment")+
theme ( plot.title=element_text(size=14,face="bold" ) )
g<- ggplot(dt, aes(x=Age)) + geom_histogram()
g
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
By default the number of bins is not necessarily optimal. Let’s try some
other number of bins
g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5)
g
Let’s improve the appearance:
g<- ggplot(dt, aes(x=Age)) + geom_histogram(binwidth=5, col="black", fill="white")
g
We can add density curve to the plot:
g<-ggplot(dt, aes(x=Age)) +
geom_histogram(binwidth=5, col="black", fill="white", aes(y=..density..)) +
geom_density( alpha=.2, fill="darkorange")
g