install.packages("boxr")
tl;dr
This week, I learned how to create custom apps within the box cloud storage service, and how to read & write files from within R using the awesome boxr R package.
I used two different ways of authenticating with box: OAuth 2.0 and Java Web Tokens (JWT), best suited for interactive or unattended use cases, respectively.
Overview
box is a commercial cloud storage provider. Many of my colleagues use it through its web interface and share data and metadata files with me.
While box doesn’t offer an official API for R, the excellent boxr R package has enabled me to read & write files without having to manually download them first. The boxr
package has excellent documentation, but I tend to forget how to set it up, especially on a new system, or to assist other colleagues who are new to boxr
. That’s why I am documenting my workflow here today.
This tutorial is split into two main parts, using either
- An interactive app authenticating via OAuth 2.0 or
- A service app authenticating with Java Web Tokens (JWT)
The former is straightforward to set up and doesn’t require a developer account or authorization by an admin. The latter requires more work to set up, but is a great choice for unattended processes. Please refer to this article in the official boxr documentation for more details.
Creating a box developer account
To interact with box, you need to create an account first (or ask your organization to provide you with one). For individuals, you can sign up for a free developer plan. which is all you need to follow along 1.
Here, I will assume that you have either already signed up for a box developer account or have access to a paid account e.g. through your organiation, and that you are familiar with logging into it through your web browser.
Creating a Custom App to interact with box
Box offers multiple different ways to access content computationally. Instead of a single API, users can create one or more Apps, and configure them to fit their needs.
Here, I am creating a Custom App, a versatile option that offers three different ways to authenticate.
- OAuth 2.0: An app using OAuth 2.0 exposes the same permissions to
boxr
that I have as an interactive user, e.g. it can browse, read, write and share the same files that I can interact with when I am logged into the box web site. That’s a great option e.g. for interactive analyses. - Java web token: An app using a JWT creates a new service account, which can only access files and folders that have been explicitly shared with it. (See below for an example). This allows for more granular access control and is my method of choice for unsupervised dashboards or other automation tasks.
- Client Credentials Grant: Finally, apps can use a Client Credentials Grant (CCG), but that’s not something I have explored, yet.
In this walkthrough, I will set up two Custom Apps, one using OAuth 2.0 and another using a JSON Web Token (JWT) for authentication.
Using OAuth2 for authentication
Creating a new app: r_oauth
If you want to connect to box computationally, you need to create an app that governs how you interact with the box API. The following instructions assume that you don’t have any apps, yet, so I will create one from scratch.
- Log into the box developer console, with your usual box username & password.
- If this is the first app you are creating, you will be presented with three choices. For my purposes, a
Custom App
is the best choice. (If you already created at least one app before, you will see it listed, and can use theCreate Platform App
button on the upper right instead, see the JWT section below where I create a second app, below.)
- Next, I choose a suitable name for the app (e.g.
r_oauth
), and add a brief description before clicking theNext
button.
- On the second page of the modal, I pick
User Authentication (OAuth 2.0)
as the authentication method and then hit theCreate App
button. - This takes me to main configuration page of my new
r_oauth
app.
- The
Configuration
tab includes theOAuth 2.0 Credentials
section that provides me with theClient ID
and theClient Secret
strings, which are used in the same way as myusername
andpassword
for my box account. (So I will keep the secret and not share them with anybody!)- I can invalidate my current ID / Secret pair and obtain a new set any time by clicking on the
Reset
button.
- I can invalidate my current ID / Secret pair and obtain a new set any time by clicking on the
- Next, I scroll down to the
OAuth 2.0 Redirect URI
section of the page, pastehttp://localhost:1410/
into theRedirect URI
text box and then hit the blueAdd
button. The new URI is now shown below the text box.
- Finally, I scroll down further to the
Application Scopes
section. By default, my app only has theRead all files and folders in Box
box ticked, providing read-only access. Because I also want to be able to write files and create folders, I also check theWrite all files and folders in Box
option.
If I forget to grant write access to box, operations that try to modify files or folders fail with the $ operator is invalid for atomic vectors
error message.
To make these change permanent, I click the
Save Changes
button at the top of the Configuration page.I keep this configuration page open, so I can refer back to it below.
Authenticating
Let’s start by accessing my box account from within R on my local system, e.g. where I can open a web browser to complete the authentication process.
First, I install the boxr
package, which is available from CRAN.
Next, I attach the package to my R session and use the box_auth()
function to authenticate with the r_oauth
app I created above.
library(boxr)
options(boxr.progress = FALSE, boxr.verbose = FALSE)
I can fine tune the behavior of the boxr
package through different options.
boxr_options()
$boxr.interactive
[1] TRUE
$boxr.print_tibble
[1] FALSE
$boxr.progress
[1] FALSE
$boxr.verbose
[1] FALSE
$boxr.wd
$boxr.wd$id
[1] 0
$boxr.wd$name
[1] "All Files"
$boxr.wd.path
[1] ""
For this tutorial, I switch off the progress bar and suppress messages with by setting the boxr.progress
and boxr.verbose
options to FALSE
.
Environmental variables
To communicate the Client ID
and Client Secret
information shown in the Configuration -> OAuth 2.0 Credentials
section of my app’s configuration page (see above), I expose them as environmental variables.
The Client ID and Client Secret strings shown in this tutorial are no longer valid, so please replace them with your own to follow along.
Sys.setenv(BOX_CLIENT_ID="0o5h6s0mey2l5f6l89bkjrmk42vjv9g2")
Sys.setenv(BOX_CLIENT_SECRET="Zfv4UX5Fi2rQAK9Cskcf3RgtFlptrH2c")
Usually, I add the BOX_CLIENT_ID
and BOX_CLIENT_SECRET
to my .Renviron
file instead of specifying them via Sys.setenv()
. That way, box_auth()
will use them automatically next time and avoid including any credentials in my analysis code.
::box_auth() boxr
Providing the BOX_CLIENT_ID
and BOX_CLIENT_SECRET
strings is not sufficient to gain access to my box account, though. To complete the OAuth 2.0 hand-shake, I am now directed to a website, which opens in my default web browser, where I can log into my box account interactively (using my usual username and password). Afterward, I confirm that I want to grant my r_oauth
app access to my files on my behalf.
A call to box_auth()
is required every time I want to interact with box in an R session, e.g. I include the following two lines at the top of each of my R markdown / quarto documents:
library(boxr)
box_auth()
The cache file
When box_auth()
runs successfully for the first time, it creates a cache file (by default as the hidden ~/.box-oauth
file in my home directory)2. The next call to box_auth()
will refresh the information stored in the cache file instead of prompting me to sign in via the box website again.
file.info("~/.boxr-oauth")[, c("size", "mode", "mtime", "ctime")]
size mode mtime ctime
~/.boxr-oauth 5940 600 2024-11-30 10:23:46 2024-11-30 10:23:46
Writing and reading files from box
Now that I have authenticated with my r_oauth
app, I can browse the folders and files available to me on box. The box_getwd()
function confirms that I am looking at the root directory of my box account, which has the directory ID 0
:
::box_getwd() boxr
[1] 0
The box_ls
function returns a boxr_object_list S3 object with all objects that are stored in the location specified by the dir_id
argument. It can be coerced into a data.frame, e.g. to filter the output with verbs from the dplyr or poorman R packages.
<- boxr::box_ls(dir_id = box_getwd())
objects class(objects)
[1] "boxr_object_list"
Right now, there is a single folder tutorial
in my box account3. Its unique identifier is shown in the id
column and I can refer to objects in my box account (both folders and files) using this identifier.
as.data.frame(objects)
name type id size description owner
1 tutorial folder 296254108479 12754 tomsing1+box@gmail.com
path modified_at content_modified_at sha1 version version_no
1 All Files 2024-11-30 10:35:37 2024-11-30 10:35:37 <NA> 1 1
version_id
1 <NA>
I extract the unique identifier of the tutorial
folder, so I can refer to it downstream:
suppressPackageStartupMessages(library(poorman))
<- objects |>
folder_id as.data.frame() |>
::filter(name == "tutorial") |>
poorman::pull("id")
poorman folder_id
[1] "296254108479"
Alternatively, I can also copy this identifier from the end of the URL that is showin in my web browser when I open the tutorial
folder. Similarly pasting the URL https://app.box.com/folder/296254108479 into a web browser opens my tutorial
folder there.
So let’s store a copy of the mtcars
dataset included in R’s dataset
package as a CSV file in the tutorials
folder (specified by its id, see above).
::box_write(object = mtcars, file_name = "mtcars.csv", dir_id = folder_id) boxr
box.com remote file reference
name : mtcars.csv
file id : 1713588728473
version : V10
size : 1.3 kB
modified at : 2024-11-30 10:36:46
created at : 2024-11-30 10:36:46
uploaded by : tomsing1+box@gmail.com
owned by : tomsing1+box@gmail.com
shared link : None
parent folder name : tutorial
parent folder id : 296254108479
To verify the successful upload, I can list the contents of the tutorial
folder:
::box_ls(dir_id = folder_id) |>
boxras.data.frame()
name type id size description owner
1 mtcars.csv file 1713588728473 1281 tomsing1+box@gmail.com
2 mtcars.rds file 1713594291978 1225 tomsing1+box@gmail.com
path modified_at content_modified_at
1 All Files/tutorial 2024-11-30 10:36:46 2024-11-30 10:36:46
2 All Files/tutorial 2024-11-30 10:35:37 2024-11-30 10:35:37
sha1 version version_no version_id
1 dfd44f97e2bd9b5efe4bd2e87851cfe717f3fc2f 10 10 1886588891382
2 c41ea4b33768b6a4ae886e592e13090939fab642 1 1 1886592083178
To read the file into my R session, I extract the CSV file’s id and pass it to the box_read_csv
function:
<- boxr::box_ls(dir_id = folder_id) |>
file_id as.data.frame() |>
::filter(name == "mtcars.csv") |>
poorman::pull(id)
poorman::box_read_csv(file_id) |>
boxrhead()
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The boxr
packages offers convenience functions for different file types, e.g.
- box_write()
- box_save_rds()
- box_save()
- box_read_csv()
- box_read_tsv()
- box_read_json()
- box_read_excel()
- box_read_rds()
The box_write
and box_read
functions allow me to specify the function that write / reads the file myself via the write_fun
and read_fun
arguments, respectively.
To simply copy files from / to box without reading them, I can use the box_dl() and box_ul(), respectively.
Finally, I can also move the file into the trash (from which it will be automatically removed after one month) with the box_delete_file() function.
For other functions, e.g. to search the box index, please refer to the boxr reference.
Success!
Great! Now I have a working app that allows me to interact with folders and files in my box account, all from within the comfort of my R session.
Using a JSON Web Token for authentication
Another way of authenticating a Custom App within box is via a JSON Web Token. This requires that the jose R package is installed on your system:
library(jose)
In this mode, the app authenticates using a signed JWT token, which requires a public/private key-pair for added security (see below).
To authenticate via this route, I need to create another app in my box developer console. Because I already created a first app (see above), the console’s home screen now looks different, listing my r_oauth
app. To add another app, I click the blue Create Platform App
in the upper right corner of the page.
Once again, I choose to create a Custom App
and pick a unique name for my app, e.g. r_jwt
. But on the second page of the modal, I now choose Server Authentication (with JWT)
as my authentication method before clicking the Create App
button.
Afterwards, I am redirected to the configuration page of my new r_jwt
app, where I make the following changes:
As in my first example, I allow the app to write / modify files and folders on my behalf (see Figure 6).
To make these changes permanent, I click the blue
Save Changes
button on the upper right of the Configuration page.To use a
JWT
for authentication, I need one more piece of information: an RSA keypair to sign and authenticate the requests made by my app. I scroll down to theAdd and Manage Public Keys
section.There, I use the
Generate a new Public/Private Key pair
button to generate and download a new key pair for my app.
The key information is downloaded in a JSON file with content similar to this:
{
"boxAppSettings": {
"clientID": "jlccq2qe365jy2wo4llm9z9isz9didhv",
"clientSecret": "VSN5WNe9q4hI2h9PJbppQKsIxm8oxJp8",
"appAuth": {
"publicKeyID": "wvslfab2",
"privateKey": "-----BEGIN ENCRYPTED PRIVATE KEY-----\nMIIFHDBOBgkqhki[TRUNCATED]RvOqeWR1mg==\n-----END ENCRYPTED PRIVATE KEY-----\n",
"passphrase": "87babeaed61e4ad87309f8063918bd04"
}
},
"enterpriseID": "1246725007"
}
This JSON file contains the
clientID
: this unique identifier of the app - I will need it to authorize my app in the Admin console (see below).clientSecret
: the secret associated with theclientID
appAuth
: the public and private RSA key pair, along with their pass phraseenterpriseID
: the unique identifier of myenterprise
, e.g. the organization that manages my box account4.
The contents (or location) of this file is required to authenticate with box. The boxr package authors reccommend creating storing this file in a newly created ~/.boxr-auth
folder, and in this tutorial stored the file there under the path ~/.boxr-auth/jwt_config.json
5.
The contents of the JSON file is sufficient to access my box account. So I need to keep it secret, and - if in doubt - revoke and replace the key pair with a new one.
Authenticating with the JSON web-token
For JWT authentication, the box_auth_service()
function is used. I can either point its token_file
argument to the JSON file downloaded above, or provide its content as a string via its token_text
argument 7. (Alternatively, I could also provide the path to the JSON file via the BOX_TOKEN_FILE
environmental variable.)
::box_auth_service(token_file = "~/.boxr-auth/jwt_config.json") boxr
Once the box_auth_service()
call has completed successfully, I can explore the files and folders in box.
::box_ls() boxr
box.com remote object list: Empty (no objects returned)
Wait, the box_ls()
command does return any objects - where is my tutorials
folder? This bring me to a key difference between OAuth 2.0 and JWT backed apps: the former has access to my usual box account. But the latter uses a service account, an entirely separate box account that has access to its own file system, but cannot access those of my main account.
Modifying collaborations
We can also revoke the permissions of the service account, by deleting the collaboration, if necessary:
1::box_auth()
boxr2<- boxr::box_collab_get(folder_id)
collab <- collab |>
collab_id as.data.frame() |>
::pull(id)
poorman::box_collab_delete(collab_id)
boxr3<- boxr::box_collab_get(folder_id) collab
- 1
- Re-authenticate as the original owner of the folder.
- 2
-
Retrieve information about the sharing settings / collaborations of the folder, including the
id
of the collaboration. - 3
- Delete the collaboration by specifying its unique id.
Reproducibility
Session Information
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jose_1.2.1 openssl_2.2.2 poorman_0.2.7 boxr_0.3.6
loaded via a namespace (and not attached):
[1] bit_4.5.0 jsonlite_1.8.9 dplyr_1.1.4 compiler_4.4.1
[5] renv_1.0.11 tidyselect_1.2.1 stringr_1.5.1 assertthat_0.2.1
[9] yaml_2.3.10 fastmap_1.2.0 mime_0.12 R6_2.5.1
[13] generics_0.1.3 curl_6.0.1 knitr_1.49 tibble_3.2.1
[17] pillar_1.9.0 R.utils_2.12.3 rlang_1.1.4 utf8_1.2.4
[21] stringi_1.8.4 xfun_0.49 fs_1.6.5 bit64_4.5.2
[25] cli_3.6.3 magrittr_2.0.3 rio_1.2.3 digest_0.6.37
[29] askpass_1.2.1 lifecycle_1.0.4 R.methodsS3_1.8.2 R.oo_1.27.0
[33] vctrs_0.6.5 evaluate_1.0.1 glue_1.8.0 data.table_1.16.2
[37] fansi_1.0.6 rmarkdown_2.29 purrr_1.0.2 httr_1.4.7
[41] tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1
This work is licensed under a Creative Commons Attribution 4.0 International License.
Footnotes
The main difference between a free developer account and a regular free account is that the developer version has access to the administrator console, which will allow you to authorize JWT and CCG applications (source). If you are part of an organization with dedicated admins, then they can authorize apps that use these authentication methods for you. With a regular (e.g. non-developer) free account you can still use OAuth 2.0 for authentication, but won’t be able to run the code examples shown in the JWT section of this tutorial.↩︎
To create cache file under a different path, I could specify it’s location via the
cache
argument of thebox_auth()
function. For example, I could point to a network drive that is accessible at the same path from different computers.↩︎The
dir_id
defaults to the output ofbox_getwd()
and I only specify it here explicitly to highlight that this argument is available. And yes, there is also abox_setwd()
function to set the remote working directory.↩︎Please note that my
enterpriseID
is set, even though I am a single developer on a free box plan. If yourenterpriseID
is set to “0”, you probably didn’t sign up for a Developer account. (Only free Developer accounts or paid plans can authorize apps via JWT or CCG).↩︎The authors also recommend to keep access to the folder restricted if you are on a system shared with others, e.g. by creating the directory with the
fs::dir_create("~/.boxr-auth", mode = 700)
command.↩︎If you are part of an organization with dedicated admins, you need to ask them to authorize your app for you, and provide them with its
ClienID
.↩︎I do some of my analyses using Saturn Cloud, which offers convenient ways to inject secrets into cloud instances - either as environmental variables or text files. Because the JWT file is plain text, it works very well with that setup.↩︎