tl;dr
Today I learned how to
- Build an R package into source and binary bundles for distribution.
- Create a local drat repository.
- Add an R package to the repository and install it from there.
- Host the repository remotely in an AWS S3 bucket.
Many thanks to Dirk Eddelbuettel for creating and documenting the drat R package! (As always, any mistakes are my own.)
Motivation
There are multiple ways for developers to share R packages publicly, e.g.
- Submit them to the The Comprehensive R Archive Network (CRAN),
- Contribute them to the Bioconductor project,
- Publish them via rOpenSci’s R-universe
User can then install these packages via the familiar install.packages()
command.
Alternatively, authors can share their code through version control systems like github or gitlab, and users can install them with third-party tools e.g. the remotes R package.
But how can you make an R package available privately, e.g. for use within an organization?
In this tutorial, I demonstrate how to set up your own package repository with Dirk Eddelbuettel’s drat R package, add a package, make R aware of the new repo - and host it remotely on AWS S3.
Why drat?
Dirk Eddelbuettel highlights two main advantages:
- A package installed from a drat repository will be supported by
install.packages()
andupdate.packages()
, so the user has easy methods for keeping up-to-date. - The package author has better control over the package version users install, because they actively push specific releases into the repository.
Please see Dirk’s Drat FAQ’s for additional points, e.g. ‘Why could install_github be wrong?’
Prequisites
Hadley Wickham and Jenny Bryan have documented how to author, document and build R packages in their freely-available R Packages book. In this walkthrough I am using Mac OS X (v13.1), but you can find instructions to set up Windows or Linux build environments in their R build toolchain chapter.
Bundling an R package’s source code for distribution
First, we need an R package that’s ready for distribution. Here, I am using the toy
R package that you can retrieve from github, either via git clone https://github.com/tomsing1/toy
or by downloading its source code as a zip file. (Feel free to follow along with another R package instead - as long as you have the source package, the following steps apply.)
Next, we bundle the package into a single compressed file with the .tar.gz
file extension. Let’s download the .zip
file linked above into the ~/Downloads
folder and use the R CMD build
command to create a source bundle 1:
cd ~/Downloads
curl -s -L -O https://github.com/tomsing1/toy/archive/refs/heads/main.zip
unzip -o -q main.zip
rm main.zip
R CMD build --force toy-main
* checking for file ‘toy-main/DESCRIPTION’ ... OK
* preparing ‘toy’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
Omitted ‘LazyData’ from DESCRIPTION
* building ‘toy_0.1.0.tar.gz’
We now have the toy_0.1.0.tar.gz
file, ready to be inserted into a new (or existing) drat repository.
Creating a local drat repository
To create a new repository, we start by installing the drat R package itself (if it’s not available on your system already) with the following R commands:
if (!requireNamespace("drat", quietly = TRUE)) {
install.packages("drat")
}library(drat)
You can specify the path of your drat repository either by setting the dratRepo
option 2:
options(dratRepo = "~/drat-tutorial")
getOption("dratRepo")
[1] "~/drat-tutorial"
or by providing it as an argument to the drat::insertPackage()
function (see below).
Let’s create a new drat repository in our home directory 3, and populate it with a minimal index.html
file (to avoid HTTP 404 Not Found
errors later).
dir.create("~/drat-tutorial", showWarnings = FALSE)
writeLines(
text = "<!doctype html><title>My awesome drat repository!</title>",
con = "~/drat-tutorial/index.html"
)
Now we are ready to insert the toy
package bundle into the repository with drat’s insertPackage()
command 4:
::insertPackage(file = "~/Downloads/toy_0.1.0.tar.gz",
dratrepodir = "~/drat-tutorial")
Now, the ~/drat-tutorial
folder contains the following files:
Accessing the local drat repository
When you prompt your R installation to install or update R packages, it searches repositories specified in the repos
option. On my system, only the default repository is set in a fresh R session 5:
getOption("repos")
CRAN
"https://cloud.r-project.org"
If I try to install our example toy
R package, I don’t succeed:
install.packages("toy", type = "source")
Installing package into '/Users/sandmann/Library/R/x86_64/4.2/library'
(as 'lib' is unspecified)
Warning: package 'toy' is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
because R is not aware of our new repository, yet.
At this point, we must add the type="source"
argument, because we have only added the source bundle to the repository. We will add a compiled version in a moment - read on!
To test our local repository, we add its path to the list of known repositories.
::addRepo("LocalRepo", "file://Users/sandmann/drat-tutorial")
dratgetOption("repos")
CRAN LocalRepo
"https://cloud.r-project.org" "file://Users/sandmann/drat-tutorial"
By default, drat’s addRepo()
command assumes that repositories are hosted on github-pages. Because we want to access a repo via the filesystem (either locally or on a network drive), we need to explicitly add the file:/
prefix - and use the absolute file path (e.g. returned by path.expand("~/drat-tutorial")
) to specify its location.
In this case, concatenating file:/
with /Users/sandmann/drat-tutorial
produces the final file://Users/sandmann/drat-tutorial
location (note the double forward slashes).
Now, we can install it with the usual install.packages()
command 6:
install.packages("toy", type = "source")
Installing package into '/Users/sandmann/Library/R/x86_64/4.2/library'
(as 'lib' is unspecified)
Great! We have successfully installed our toy
R package from our brand new repository. Now it is time to make it available to other users as well.
Building binary packages
Windows and Mac users who install packages from CRAN or any user installing files from the Posit Public Package Manager (PPPM) will usually receive a binary package. CRAN accepts package bundles and creates the platform-specific binary file for distribution. To offer the same service to users of our drat repository, we need to compile the binary package ourselves.
Here, I create the Mac OS binary package from the bundle we obtained above by executing the following command on my Mac OS operating system:
cd ~/Downloads
R CMD INSTALL --build toy_0.1.0.tar.gz
* installing to library ‘/Users/sandmann/Library/R/x86_64/4.2/library’
* installing *source* package ‘toy’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* creating tarball
packaged installation of ‘toy’ as ‘toy_0.1.0.tgz’
* DONE (toy)
This command will first install the package into my default R library, and then create the binary toy_0.1.0.tgz
file.
Next, we add it to our local drat repository (note the .tgz
file suffix).
::insertPackage(file = "~/Downloads/toy_0.1.0.tgz",
dratrepodir = "~/drat-tutorial")
Now, the ~/drat-tutorial
folder contains a new subdirectory (bin
) with the binary files for Mac OS X:
At long last, now we can omit the type="source"
argument from calls to install.packages()
:
install.packages("toy")
Installing package into '/Users/sandmann/Library/R/x86_64/4.2/library'
(as 'lib' is unspecified)
The downloaded binary packages are in
/var/folders/wc/9tswmr4s74s0x90wqh2007300000gp/T//Rtmpg6tHJA/downloaded_packages
Hosting your drat repository on AWS S3
drat repositories can be hosted in any location
- that you can write files to and
- that can serve files via http
But unless you placed your drat repository into a network drive that is accessible by multiple users, it is currently only useful to yourself.
If you chose a network drive as the location of your drat repository, then other user can benefit from it right - as long as they can read from the shared directory. As before, the absolute path must be prefixed with the file:/
prefix. For example, a repository that is available on the user’s systems at /nfs/groups/groupABC/R/drat
would be added to the list of R repositories via drat::addRepo("workgroup", "file://nfs/groups/groupABC/R/drat")
.
The drat documentation illustrates how you can use git and github pages to make your repository publicly available.
Here, we are interested in hosting a repository privately instead, e.g. in a location that is only accessible from within our own organization:
- If you already have access to a private server that serves files to your users (e.g. via HTTP), then you can simply copy your repository there.
- If your organization uses Amazon Web Services (AWS), you can also use an S3 bucket to host your repository and take advantage of the access controls set by your organization.
Although this use case focuses on hosting private repositories, you can of course also make repositories in S3 buckets publicly available. Alas, data storage in S3 buckets incurs cost, while other options (e.g. github-pages, CRAN, Bioconductor, etc) are free, so this might not be your preferred option.
We will assume that you have write access to an S3 bucket that is configured to serve static files via HTTP. (For a brief outline of the necessary steps, please see the appendix ). Here, I am using a bucket called drat-tutorial
- but you should create / access your own bucket to follow along.
AWS S3 buckets can be configured to either be visible publicly, or access can be restricted to specific IP addresses, security groups or other AWS resources. Please make sure you have configured your bucket in a way that suits your needs.
S3 buckets do not support the HTTPS protocol. If you require an encrypted file transfer, you might need a different solution.
To share our repository, we must first copy its folder to the S3 bucket, either via the AWS Console or (more conveniently) with the aws command line interface7. (If you are adventurous, you can also mount an S3 bucket as a filey system with goofys).
Assuming you have set the necessary AWS credentials, the following aws s3 sync
command copies our repository to the repo
folder within drat-tutorial
bucket that I created in the us-west-1
AWS region.
aws s3 sync ~/drat-tutorial s3://drat-tutorial/repo
We can use the aws s3 ls
command to confirm the upload:
aws s3 ls s3://drat-tutorial/repo/
PRE bin/
PRE src/
2023-01-21 20:28:30 58 index.html
Whenever we make changes to our local repository, e.g. after adding new packages or package versions, we have to rerun the aws s3 sync
command to copy the new files to the S3 bucket.
Now that the files are in place, we can add our remote repository to the the list of R repositories in our R session. First, we remove the LocalRepo
repository that we had added earlier, which points to the folder on our local filesystem.
options(repos = getOption("repos")[
setdiff(names(getOption("repos")), "LocalRepo")
])
The we add the remote repository instead, by pointing to the URL of the S3 bucket 8.
::addRepo("S3repo", "http://drat-tutorial.s3.us-west-1.amazonaws.com/repo/")
dratgetOption("repos")
CRAN
"https://cloud.r-project.org"
S3repo
"http://drat-tutorial.s3.us-west-1.amazonaws.com/repo/"
Let’s try to install the toy
package from our S3 drat repository:
install.packages("toy")
Installing package into '/Users/sandmann/Library/R/x86_64/4.2/library'
(as 'lib' is unspecified)
The downloaded binary packages are in
/var/folders/wc/9tswmr4s74s0x90wqh2007300000gp/T//Rtmpg6tHJA/downloaded_packages
Success! R has successfully connected to the remote repository and installed the (binary) R package.
Conclusions
- The
drat
R package makes it extremely simple to create a CRAN-like repository. - The static files can be served via HTTP, making it straightforward to host the repository e.g. in an AWS S3 bucket with a restrictive access policy.
Appendix
Creating and configuring an S3 bucket to host static files
The following steps briefly outline how to create and configure an S3 bucket to act as a static web server via the AWS web interface (e.g. the AWS Console). For more details, please read the AWS S3 documentation and / or consult your local AWS expert.
Storing files on AWS S3 is not free. In this tutorial, we only upload a limited number of small files, but please don’t forget to purge them from your AWS account afterward.
- Create a new bucket (skip if you already have one)
Make sure you create the bucket in the
region
that works best for your organization (e.g.us-west-1
if you want to host your files in California).You do not need to enable
public access
, stick to the defaults for your organization.Create an S3 bucket
Next, navigate to your bucket’s properties,
Bucket properties scroll all the way to the bottom of the page and enable Static website hosting.
Enable static hosting (Typically) specify
index.html
as theIndex document
.Define the index document Under the
Permissions
tab, add a bucket policy that makes your content available within your organizationWarningThese settings determine who can access your files. Proceed with caution to avoid inadvertently exposing your data to the world!
For example, the following policy grants read access to all files in the
s3://drat-tutorial/
bucket to requests originating (only) from the192.0.2.0
IP address. (Your own configuration will be different, of course.){ "Version": "2012-10-17", "Statement": [ { "Sid": "PublicReadGetObject", "Effect": "Allow", "Principal": "*", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::drat-tutorial/*" ], "Condition": { "IpAddress": { "aws:SourceIp": "192.0.2.0/32" } } } ] }
This work is licensed under a Creative Commons Attribution 4.0 International License.
Footnotes
Alternatively, you can also create the bundle from within R using the devtools::build() command.↩︎
Of course, you can place it anywhere you like, including e.g. network drives, as long as you can write to the directory. If you are using Windows, please remember to use backward instead of forward slashes in your paths.↩︎
In this tutorial, I use the
::
notation to highlight which package a function originates from. Because we attached the package with thelibrary(drat)
command before, thedrat::
prefix could be omitted.↩︎In this tutorial, I use the
::
notation to highlight in which package functions originate from. Because we attached the package with thelibrary(drat)
command before, thedrat::
prefix could be omitted.↩︎If you use Bioconductor, the
BiocManager::repositories()
specifies additional repositories that host its annotation and software packages.↩︎You can look up the URL for your bucket in the AWS S3 console:
↩︎