DESeq2 tutorial website: (https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-to-get-help-for-deseq2)

Harvard’s workshop for DESeq2 analysis: (https://hbctraining.github.io/DGE_workshop/lessons/04_DGE_DESeq2_analysis.html)

We first need to load all the libraries we need to use in the script

library(DESeq2)
library(airway)

For this example we will use the dataset that comes from the airway package:

data("airway")

You may see the following type of instructions:

Prepare the data

Convert the data to a DESeqDataSet object

dds <- DESeqDataSetFromMatrix(countData = assay(airway),
                              colData = colData(airway),
                              design = ~ cell + dex)

Hmmmm… What is assay(airway) or colData(airway) or what does ~cell+dex mean?

Let’s try to figure out.

In R, anything that has round parenthesis is a function, so we can explore each function alone and see what objects it produce. Be careful though… Bioinformatics data might be very large! Explore the object before you try to print it!

x <- assay(airway)

First see what class of an object it is:

class(x)
[1] "matrix" "array" 

Great! This is a matrix, so we should be able to find its dimensions:

dim(x)
[1] 64102     8

Since there are many rows but only 8 columns we can take a look at the beginning of this matrix. head() function works for data frames, matrices and many other R objects.

head(x)
                SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520 SRR1039521
ENSG00000000003        679        448        873        408       1138       1047        770        572
ENSG00000000005          0          0          0          0          0          0          0          0
ENSG00000000419        467        515        621        365        587        799        417        508
ENSG00000000457        260        211        263        164        245        331        233        229
ENSG00000000460         60         55         40         35         78         63         76         60
ENSG00000000938          0          0          2          0          1          0          0          0

Great! By looking at the first few rows, we now know that this matrix has row names and row columns.

Let’s now move to the next function - colData(airway). Let’s see what it does:

x <- colData(airway)

Again, let’s see the “class” of this object:

class(x)
[1] "DFrame"
attr(,"package")
[1] "S4Vectors"

The output says “DFrame”. It is not exactly a data frame object we saw before, but maybe something similar? Let’s try head() function:

head(x)
DataFrame with 6 rows and 9 columns
           SampleName     cell      dex    albut        Run avgLength Experiment    Sample    BioSample
             <factor> <factor> <factor> <factor>   <factor> <integer>   <factor>  <factor>     <factor>
SRR1039508 GSM1275862  N61311     untrt    untrt SRR1039508       126  SRX384345 SRS508568 SAMN02422669
SRR1039509 GSM1275863  N61311     trt      untrt SRR1039509       126  SRX384346 SRS508567 SAMN02422675
SRR1039512 GSM1275866  N052611    untrt    untrt SRR1039512       126  SRX384349 SRS508571 SAMN02422678
SRR1039513 GSM1275867  N052611    trt      untrt SRR1039513        87  SRX384350 SRS508572 SAMN02422670
SRR1039516 GSM1275870  N080611    untrt    untrt SRR1039516       120  SRX384353 SRS508575 SAMN02422682
SRR1039517 GSM1275871  N080611    trt      untrt SRR1039517       126  SRX384354 SRS508576 SAMN02422673

Ah! It does look like a data frame! How many rows and how many columns does it have?

dim(x)
[1] 8 9

The last argument that was passed to the `function wasdesign = ~ cell + dex. The expression that follows the tilde~` sign is a formula. Usually it is a list of column names from a data frame that we want to include in our analysis as independent variables.

So now, let’s execute that function:

dds <- DESeqDataSetFromMatrix(countData = assay(airway),
                              colData = colData(airway),
                              design = ~ cell + dex)

The output is stored in a variable dds. What is in it? What can be done with it. Again, try using functions we used before:

class(dds)
[1] "DESeqDataSet"
attr(,"package")
[1] "DESeq2"

Well, it is some “DESeq” Data Set. What can I do with it. Usually you will be given directions, but you also can look for help:

help(DESeqDataSet)

Let’s use other funtions to explore what is hidden in this object:

head(dds)
class: DESeqDataSet 
dim: 6 8 
metadata(1): version
assays(1): counts
rownames(6): ENSG00000000003 ENSG00000000005 ... ENSG00000000460 ENSG00000000938
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(9): SampleName cell ... Sample BioSample

Function str() is very helpful, but for very complex large objects, the output could be a little too big….

str(dds)

Now we are ready to perform differential expression analysis

dds_a <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

And again, it might be a good idea to understand what did we get as a result. So we use the same functions - class, head, str to “see” what is hidden in this object

class(dds_a)
[1] "DESeqDataSet"
attr(,"package")
[1] "DESeq2"
head(dds_a)
class: DESeqDataSet 
dim: 6 8 
metadata(1): version
assays(4): counts mu H cooks
rownames(6): ENSG00000000003 ENSG00000000005 ... ENSG00000000460 ENSG00000000938
rowData names(34): baseMean baseVar ... deviance maxCooks
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(10): SampleName cell ... BioSample sizeFactor

And the tutorial (or lecture directions) tells us to use results() function to “view” the results.

# Extract results
res <- results(dds_a)
# View the results
head(res)
log2 fold change (MLE): dex untrt vs trt 
Wald test p-value: dex untrt vs trt 
DataFrame with 6 rows and 6 columns
                  baseMean log2FoldChange     lfcSE      stat      pvalue       padj
                 <numeric>      <numeric> <numeric> <numeric>   <numeric>  <numeric>
ENSG00000000003 708.602170      0.3812540  0.100654  3.787752 0.000152016 0.00128363
ENSG00000000005   0.000000             NA        NA        NA          NA         NA
ENSG00000000419 520.297901     -0.2068126  0.112219 -1.842943 0.065337292 0.19654609
ENSG00000000457 237.163037     -0.0379204  0.143445 -0.264356 0.791505742 0.91145948
ENSG00000000460  57.932633      0.0881682  0.287142  0.307054 0.758801924 0.89503278
ENSG00000000938   0.318098      1.3782270  3.499873  0.393793 0.693733530         NA
# Plot results
plotMA(res, main="DESeq2", ylim=c(-2,2))

# Save results to a CSV file
write.csv(as.data.frame(res), file="DESeq2_results.csv")
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKREVTZXEyIHR1dG9yaWFsIHdlYnNpdGU6CihodHRwczovL3d3dy5iaW9jb25kdWN0b3Iub3JnL3BhY2thZ2VzL3JlbGVhc2UvYmlvYy92aWduZXR0ZXMvREVTZXEyL2luc3QvZG9jL0RFU2VxMi5odG1sI2hvdy10by1nZXQtaGVscC1mb3ItZGVzZXEyKQoKSGFydmFyZCdzIHdvcmtzaG9wIGZvciAqREVTZXEyKiBhbmFseXNpczogKGh0dHBzOi8vaGJjdHJhaW5pbmcuZ2l0aHViLmlvL0RHRV93b3Jrc2hvcC9sZXNzb25zLzA0X0RHRV9ERVNlcTJfYW5hbHlzaXMuaHRtbCkKCgoKCldlIGZpcnN0IG5lZWQgdG8gbG9hZCBhbGwgdGhlIGxpYnJhcmllcyB3ZSBuZWVkIHRvIHVzZSBpbiB0aGUgc2NyaXB0CgpgYGB7cn0KbGlicmFyeShERVNlcTIpCmxpYnJhcnkoYWlyd2F5KQpgYGAKCkZvciB0aGlzIGV4YW1wbGUgd2Ugd2lsbCB1c2UgdGhlIGRhdGFzZXQgdGhhdCBjb21lcyBmcm9tIHRoZSAqYWlyd2F5KiBwYWNrYWdlOgoKYGBge3J9CmRhdGEoImFpcndheSIpCmBgYAoKIyBZb3UgbWF5IHNlZSB0aGUgZm9sbG93aW5nIHR5cGUgb2YgaW5zdHJ1Y3Rpb25zOgojIFByZXBhcmUgdGhlIGRhdGEKIyBDb252ZXJ0IHRoZSBkYXRhIHRvIGEgREVTZXFEYXRhU2V0IG9iamVjdApgYGB7cn0KZGRzIDwtIERFU2VxRGF0YVNldEZyb21NYXRyaXgoY291bnREYXRhID0gYXNzYXkoYWlyd2F5KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY29sRGF0YSA9IGNvbERhdGEoYWlyd2F5KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZGVzaWduID0gfiBjZWxsICsgZGV4KQpgYGAKCgpIbW1tbS4uLiBXaGF0IGlzIGBhc3NheShhaXJ3YXkpYCBvciBgY29sRGF0YShhaXJ3YXkpYCBvciB3aGF0IGRvZXMgYH5jZWxsK2RleGAgbWVhbj8KCkxldCdzIHRyeSB0byBmaWd1cmUgb3V0LgoKSW4gUiwgYW55dGhpbmcgdGhhdCBoYXMgcm91bmQgcGFyZW50aGVzaXMgaXMgYSBmdW5jdGlvbiwgc28gd2UgY2FuIGV4cGxvcmUgZWFjaCAKZnVuY3Rpb24gYWxvbmUgYW5kIHNlZSB3aGF0IG9iamVjdHMgaXQgcHJvZHVjZS4KQmUgY2FyZWZ1bCB0aG91Z2guLi4gQmlvaW5mb3JtYXRpY3MgZGF0YSBtaWdodCBiZSB2ZXJ5IGxhcmdlISAKRXhwbG9yZSB0aGUgb2JqZWN0IGJlZm9yZSB5b3UgdHJ5IHRvIHByaW50IGl0IQoKYGBge3J9CnggPC0gYXNzYXkoYWlyd2F5KQpgYGAKCkZpcnN0IHNlZSB3aGF0IGNsYXNzIG9mIGFuIG9iamVjdCBpdCBpczoKYGBge3J9CmNsYXNzKHgpCmBgYAoKR3JlYXQhIFRoaXMgaXMgYSBtYXRyaXgsIHNvIHdlIHNob3VsZCBiZSBhYmxlIHRvIGZpbmQgaXRzIGRpbWVuc2lvbnM6CmBgYHtyfQpkaW0oeCkKYGBgCgpTaW5jZSB0aGVyZSBhcmUgbWFueSByb3dzIGJ1dCBvbmx5IDggY29sdW1ucyB3ZSBjYW4gdGFrZSBhIGxvb2sgYXQgdGhlIGJlZ2lubmluZyBvZiB0aGlzIG1hdHJpeC4KYGhlYWQoKWAgZnVuY3Rpb24gd29ya3MgZm9yIGRhdGEgZnJhbWVzLCBtYXRyaWNlcyBhbmQgbWFueSBvdGhlciBSIG9iamVjdHMuCmBgYHtyfQpoZWFkKHgpCmBgYApHcmVhdCEgQnkgbG9va2luZyBhdCB0aGUgZmlyc3QgZmV3IHJvd3MsIHdlIG5vdyBrbm93IHRoYXQgdGhpcyBtYXRyaXggaGFzIHJvdyBuYW1lcyBhbmQgcm93IGNvbHVtbnMuIAoKTGV0J3Mgbm93IG1vdmUgdG8gdGhlIG5leHQgZnVuY3Rpb24gLSBgY29sRGF0YShhaXJ3YXkpYC4gTGV0J3Mgc2VlIHdoYXQgaXQgZG9lczoKCmBgYHtyfQp4IDwtIGNvbERhdGEoYWlyd2F5KQpgYGAKCkFnYWluLCBsZXQncyBzZWUgdGhlICJjbGFzcyIgb2YgdGhpcyBvYmplY3Q6CgpgYGB7cn0KY2xhc3MoeCkKYGBgCgpUaGUgb3V0cHV0IHNheXMgIkRGcmFtZSIuIApJdCBpcyBub3QgZXhhY3RseSBhICpkYXRhIGZyYW1lKiBvYmplY3Qgd2Ugc2F3IGJlZm9yZSwgYnV0IG1heWJlIHNvbWV0aGluZyBzaW1pbGFyPwpMZXQncyB0cnkgYGhlYWQoKWAgZnVuY3Rpb246CgpgYGB7cn0KaGVhZCh4KQpgYGAKQWghIEl0IGRvZXMgbG9vayBsaWtlIGEgZGF0YSBmcmFtZSEgSG93IG1hbnkgcm93cyBhbmQgaG93IG1hbnkgY29sdW1ucyBkb2VzIGl0IGhhdmU/CgpgYGB7cn0KZGltKHgpCmBgYAoKClRoZSBsYXN0IGFyZ3VtZW50IHRoYXQgd2FzIHBhc3NlZCB0byB0aGUgYGAgZnVuY3Rpb24gd2FzIGBkZXNpZ24gPSB+IGNlbGwgKyBkZXhgLiAgIApUaGUgZXhwcmVzc2lvbiB0aGF0IGZvbGxvd3MgdGhlIHRpbGRlIGB+YCBzaWduIGlzIGEgKmZvcm11bGEqLiBVc3VhbGx5IGl0IGlzIGEgbGlzdCBvZiBjb2x1bW4gbmFtZXMgZnJvbSBhIGRhdGEgZnJhbWUgdGhhdCB3ZSB3YW50IHRvIGluY2x1ZGUgaW4gb3VyIGFuYWx5c2lzIGFzIGluZGVwZW5kZW50IHZhcmlhYmxlcy4KClNvIG5vdywgbGV0J3MgZXhlY3V0ZSB0aGF0IGZ1bmN0aW9uOgpgYGB7cn0KZGRzIDwtIERFU2VxRGF0YVNldEZyb21NYXRyaXgoY291bnREYXRhID0gYXNzYXkoYWlyd2F5KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY29sRGF0YSA9IGNvbERhdGEoYWlyd2F5KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZGVzaWduID0gfiBjZWxsICsgZGV4KQpgYGAKClRoZSBvdXRwdXQgaXMgc3RvcmVkIGluIGEgdmFyaWFibGUgYGRkc2AuIFdoYXQgaXMgaW4gaXQ/IFdoYXQgY2FuIGJlIGRvbmUgd2l0aCBpdC4gQWdhaW4sIHRyeSB1c2luZyBmdW5jdGlvbnMgd2UgdXNlZCBiZWZvcmU6CmBgYHtyfQpjbGFzcyhkZHMpCmBgYApXZWxsLCBpdCBpcyBzb21lICJERVNlcSIgRGF0YSBTZXQuIFdoYXQgY2FuIEkgZG8gd2l0aCBpdC4gVXN1YWxseSB5b3Ugd2lsbCBiZSBnaXZlbiBkaXJlY3Rpb25zLCBidXQgeW91IGFsc28gY2FuIGxvb2sgZm9yIGhlbHA6CmBgYHtyfQpoZWxwKERFU2VxRGF0YVNldCkKYGBgCgpMZXQncyB1c2Ugb3RoZXIgZnVudGlvbnMgdG8gZXhwbG9yZSB3aGF0IGlzIGhpZGRlbiBpbiB0aGlzIG9iamVjdDoKYGBge3J9CmhlYWQoZGRzKQpgYGAKCkZ1bmN0aW9uIGBzdHIoKWAgaXMgdmVyeSBoZWxwZnVsLCBidXQgZm9yIHZlcnkgY29tcGxleCBsYXJnZSBvYmplY3RzLCB0aGUgb3V0cHV0IGNvdWxkIGJlIGEgbGl0dGxlIHRvbyBiaWcuLi4uCmBgYHtyfQpzdHIoZGRzKQpgYGAKCgpOb3cgd2UgYXJlIHJlYWR5IHRvIHBlcmZvcm0gZGlmZmVyZW50aWFsIGV4cHJlc3Npb24gYW5hbHlzaXMKCmBgYHtyfQpkZHNfYSA8LSBERVNlcShkZHMpCmBgYApBbmQgYWdhaW4sIGl0IG1pZ2h0IGJlIGEgZ29vZCBpZGVhIHRvIHVuZGVyc3RhbmQgd2hhdCBkaWQgd2UgZ2V0IGFzIGEgcmVzdWx0LiBTbyB3ZSB1c2UgdGhlIHNhbWUgZnVuY3Rpb25zIC0gCmBjbGFzc2AsIGBoZWFkYCwgYHN0cmAgdG8gInNlZSIgd2hhdCBpcyBoaWRkZW4gaW4gdGhpcyBvYmplY3QKYGBge3J9CmNsYXNzKGRkc19hKQpgYGAKCmBgYHtyfQpoZWFkKGRkc19hKQpgYGAKQW5kIHRoZSB0dXRvcmlhbCAob3IgbGVjdHVyZSBkaXJlY3Rpb25zKSB0ZWxscyB1cyB0byB1c2UgYHJlc3VsdHMoKWAgZnVuY3Rpb24gdG8gInZpZXciIHRoZSByZXN1bHRzLgpgYGB7cn0KIyBFeHRyYWN0IHJlc3VsdHMKcmVzIDwtIHJlc3VsdHMoZGRzX2EpCmBgYAoKYGBge3J9CiMgVmlldyB0aGUgcmVzdWx0cwpoZWFkKHJlcykKYGBgCgpgYGB7cn0KIyBQbG90IHJlc3VsdHMKcGxvdE1BKHJlcywgbWFpbj0iREVTZXEyIiwgeWxpbT1jKC0yLDIpKQpgYGAKCgpgYGB7cn0KIyBTYXZlIHJlc3VsdHMgdG8gYSBDU1YgZmlsZQp3cml0ZS5jc3YoYXMuZGF0YS5mcmFtZShyZXMpLCBmaWxlPSJERVNlcTJfcmVzdWx0cy5jc3YiKQpgYGAKCgoK