Chapter 2 R project setup
There are a few ways you can setup an R project. We will use the devtools
package
2.1 Selecting a name for your package
Select a name for your package.
The official CRAN guidelines Writing R Extensions provides the following constrains:
The mandatory ‘Package’ field gives the name of the package. This should contain only (ASCII) letters, numbers and dot, have at least two characters and start with a letter and not end in a dot.
Please note, that underscore character _
is not allowed
More than 70% of R packages named using lowercase, but you can use Upper case or Camel Case style if the name of your package is an abbreviation or consists of multiple words.
Avoid names that already exist in CRAN.
# get all package names in CRAN
options(repos = list(CRAN="http://cran.rstudio.com/"))
<- available.packages(filters = c("CRAN", "duplicates"))[,'Package']
pkgs
# check if pkgs vector contains a name "myutils"
"myutils" %in% pkgs
If you plan to use Bioconductor, check if the package name already exist there. It is a good idea to check it anyways. You can do it by checking the name in the Bioconductor Package list.
Alternatively, you can install package available and use function with the same name:
::available("myutils") available
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories'
for details
replacement repositories:
CRAN: https://cran.us.r-project.org
Urban Dictionary can contain potentially offensive results,
should they be included? [Y]es / [N]o:
1: Y
── myutils ──────────────────────────────────────────────────────────────────────────────
Name valid: ✔
Available on CRAN: ✔
Available on Bioconductor: ✔
Available on GitHub: ✔
Abbreviations: http://www.abbreviations.com/myutils
Wikipedia: https://en.wikipedia.org/wiki/myutils
Wiktionary: https://en.wiktionary.org/wiki/myutils
Urban Dictionary:
Not found.
Sentiment:???
2.2 Package Initialization
Make sure you set your current directory. The R package will be created in a sub-directory.
# set current directory where the package will reside
setwd("/Users/ktrn/Dropbox (BOSTON UNIVERSITY)/Projects/R/")
# create a new package
::create_package("myutils") usethis
Alternatively, you can use RStudio menu: File > New Project > New Directory > R Package.
Openning an R package project
create_package()
function creates a sub-folder with a name after the package name. It also creates an R project and a few files and folders there:
dir("myutils", all.files=TRUE)
## [1] "." ".." ".gitignore" ".Rbuildignore"
## [5] ".Rhistory" ".Rproj.user" "DESCRIPTION" "LICENSE.md"
## [9] "myutils.Rproj" "NAMESPACE" "R"
Let’s open myutils.Rproj
R project in the RStudio.
2.3 R package structure
An R package usually includes the following components:
- Functions
- Data (optional)
- Documentation ( function manuals and examples of how to use functions included in the package)
- Vignettes (longer descriptions of package function usage)
- Tests (R scripts to test the package functions)
| - DESCRIPTION | - LICENSE | - NAMESPACE | - R | | - script1.R | | - script2.R | - man | | - function1.Rd | | - function2.Rd | | - function3.Rd | - myutils.Rproj
DESCRIPTION- file containing metadata about your package: authors, current version, dependencies
LICENSE - file describing the package usage agreement
NAMESPACE - file containing information about functions your package imports from other packages
R - directory containing R scripts
man - directory containing documentation files
my_package.Rproj - R project file associated with the package
2.4 DESCRIPTION file: R package information
Let’s open the DESCRIPTION file, created by the create()
function. By default it looks like this:
Package: myutils
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"),
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Each line consists of a field name and a value, separated by a colon. When values span multiple lines, they need to be indented.
It is important to set these fields approprietly and provide a good Title and Description for your package as this data will be displayed on the CRAN Download webpage:
There are two other fields that are not present in the DESCRIPTION file but which we might need to add later - Imports and Suggests. If our package uses functions from other R packages (e.g. ggplot2, dplyr, etc.) we will need to list them in the Imports field. If there are some optional functionality in our package that uses some other packages, they need to be listed in the Suggests field.
2.4.1 Title and Description
The Title field describes what package does in one line, while Description is a one paragraph text that contains a more detailed overview of the package functionality.
Do not include the package name in the title. Do not start the title or Description with “This package …” or similar words. Take a look at the list of CRAN packages to view examples of titles of existing R packages.
Each line in the Description field should contain no more than 80 characters and indented with 4 spaces.
2.4.2 Version
The version must include at least two integers separated by dot .
and/or dash -
symbols. Usually, the version contains three or four integers:
<major>.<minor>.<patch>
or <major>.<minor>.<patch>.<development>
To increase any component of the version value, you can use use_version()
function from the usethis package:
You can read more on how to set version at Semantic Versioning
2.4.4 License
In the US, copyright is automatic: if you don’t choose a license for your software, no one else can use it!
There are two major types of open-source licenses - permissive and copyleft.
Permissive licenses allow the code to be copied, modified in any way, and published as long as the license is preserved. The modified code do not have to be distributed as an open source code. A popular example is an MIT license.
To create an MIT license, execute: usethis:use_mit_license("Company Name")
Copyleft licenses allow to copy and modify code for personal use. An example of a copyleft license is a GPL license. If the modified code is then published or distributed it must be also licensed with the GPL and must be distributed as an open source code.
To create a GPL license, execute: usethis::use_gpl_license(version = 3, include_future = TRUE)
Most CRAN packages use copyleft licenses.
You can read more about R licenses at www.r-project.org/Licenses/
- Open-source licenses: Choose a License
- Proprietary licenses: binPress Software License Generator
2.4.5 .Rbuildignore
The .Rbuildignore file tells R which files to ignore when the package is built. It can include full names, or a regex expression:
^.*\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
2.5 Adding R functions to the package
Let’s now create a new R script and add a couple R functions there. Make sure you have devtools package loaded we will use it throughout the workshop! We will place our R function in the file named my_summaries.R. This function should be placed into the R sub-folder. We will use use_r()
function from the usethis package (which is included when we loaded the devtools package):
library(devtools)
use_r("my_summaries")
Once the file is created, we can open it and add a couple functions (the first draft of our functions):
# This function creates a summary for a numeric vector
<- function(x, na.rm=FALSE){
numeric_summary
= min(x, na.rm=na.rm)
min = max(x, na.rm=na.rm)
max = mean(x, na.rm=na.rm)
mean = sd(x, na.rm=na.rm)
sd = length(x)
length = sum(is.na(x))
Nmiss
c(min=min, max=max, mean=mean, sd=sd, length=length, Nmiss=Nmiss)
}
# This function creates a summary for a character vector
<- function(x, na.rm=FALSE){
char_summary
= length(x)
length = sum(is.na(x))
Nmiss = length(unique(x))
Nunique
c(length = length,
Nmiss = Nmiss,
Nunique = Nunique )
}
To use this functions from within the package we will not use the source()
function as we usually do. Instead we will use load_all()
function from the devtools package:
load_all()
<- c("Boston", NA, "Brookline", "Brighton", NA, "Boston")
cvec <- c(2022, 2021, NA, 2021, 2021, NA, 2022)
nvec
char_summary(cvec)
numeric_summary(nvec, na.rm=TRUE)
We probably want to modify our functions, for example, to improve the output format. But we will return to this later.
2.6 Summary: Create new R Package workflow
- Set working directory
library
(devtools)` - Load devtools packagecreate("package_name")
- Create an R project that contains main components for an R package- Set up information in the DESCRIPTION file
use_r("r_file_name")
- Create an R script and place it into R sub-folder- Open r_file_name.R script and add desired functions
load_all()
- load all functions in the package
Once the package is created you can continue to add new R scripts using use_r()
and load them into environment using load_all()
function.