Interacting with box cloud storage: the boxr R package

R
box
TIL
Author

Thomas Sandmann

Published

November 30, 2024

tl;dr

This week, I learned how to create custom apps within the box cloud storage service, and how to read & write files from within R using the awesome boxr R package.

I used two different ways of authenticating with box: OAuth 2.0 and Java Web Tokens (JWT), best suited for interactive or unattended use cases, respectively.

Overview

box is a commercial cloud storage provider. Many of my colleagues use it through its web interface and share data and metadata files with me.

While box doesn’t offer an official API for R, the excellent boxr R package has enabled me to read & write files without having to manually download them first. The boxr package has excellent documentation, but I tend to forget how to set it up, especially on a new system, or to assist other colleagues who are new to boxr. That’s why I am documenting my workflow here today.

This tutorial is split into two main parts, using either

  1. An interactive app authenticating via OAuth 2.0 or
  2. A service app authenticating with Java Web Tokens (JWT)

The former is straightforward to set up and doesn’t require a developer account or authorization by an admin. The latter requires more work to set up, but is a great choice for unattended processes. Please refer to this article in the official boxr documentation for more details.

Creating a box developer account

To interact with box, you need to create an account first (or ask your organization to provide you with one). For individuals, you can sign up for a free developer plan. which is all you need to follow along 1.

Here, I will assume that you have either already signed up for a box developer account or have access to a paid account e.g. through your organiation, and that you are familiar with logging into it through your web browser.

Creating a Custom App to interact with box

Box offers multiple different ways to access content computationally. Instead of a single API, users can create one or more Apps, and configure them to fit their needs.

Here, I am creating a Custom App, a versatile option that offers three different ways to authenticate.

  1. OAuth 2.0: An app using OAuth 2.0 exposes the same permissions to boxr that I have as an interactive user, e.g. it can browse, read, write and share the same files that I can interact with when I am logged into the box web site. That’s a great option e.g. for interactive analyses.
  2. Java web token: An app using a JWT creates a new service account, which can only access files and folders that have been explicitly shared with it. (See below for an example). This allows for more granular access control and is my method of choice for unsupervised dashboards or other automation tasks.
  3. Client Credentials Grant: Finally, apps can use a Client Credentials Grant (CCG), but that’s not something I have explored, yet.

In this walkthrough, I will set up two Custom Apps, one using OAuth 2.0 and another using a JSON Web Token (JWT) for authentication.

Using OAuth2 for authentication

Creating a new app: r_oauth

If you want to connect to box computationally, you need to create an app that governs how you interact with the box API. The following instructions assume that you don’t have any apps, yet, so I will create one from scratch.

  • Log into the box developer console, with your usual box username & password.
  • If this is the first app you are creating, you will be presented with three choices. For my purposes, a Custom App is the best choice. (If you already created at least one app before, you will see it listed, and can use the Create Platform App button on the upper right instead, see the JWT section below where I create a second app, below.)
Figure 1
  • Next, I choose a suitable name for the app (e.g. r_oauth), and add a brief description before clicking the Next button.
Figure 2
Figure 3
  • On the second page of the modal, I pick User Authentication (OAuth 2.0) as the authentication method and then hit the Create App button.
  • This takes me to main configuration page of my new r_oauth app.
Figure 4
  • The Configuration tab includes the OAuth 2.0 Credentials section that provides me with the Client ID and the Client Secret strings, which are used in the same way as my username and password for my box account. (So I will keep the secret and not share them with anybody!)
    • I can invalidate my current ID / Secret pair and obtain a new set any time by clicking on the Reset button.
  • Next, I scroll down to the OAuth 2.0 Redirect URI section of the page, paste http://localhost:1410/ into the Redirect URI text box and then hit the blue Add button. The new URI is now shown below the text box.
Figure 5
  • Finally, I scroll down further to the Application Scopes section. By default, my app only has the Read all files and folders in Box box ticked, providing read-only access. Because I also want to be able to write files and create folders, I also check the Write all files and folders in Box option.
Figure 6
$ operator is invalid for atomic vectors

If I forget to grant write access to box, operations that try to modify files or folders fail with the $ operator is invalid for atomic vectors error message.

  • To make these change permanent, I click the Save Changes button at the top of the Configuration page.

  • I keep this configuration page open, so I can refer back to it below.

Authenticating

Let’s start by accessing my box account from within R on my local system, e.g. where I can open a web browser to complete the authentication process.

First, I install the boxr package, which is available from CRAN.

install.packages("boxr")

Next, I attach the package to my R session and use the box_auth() function to authenticate with the r_oauth app I created above.

library(boxr)
options(boxr.progress = FALSE, boxr.verbose = FALSE)

I can fine tune the behavior of the boxr package through different options.

boxr_options()
$boxr.interactive
[1] TRUE

$boxr.print_tibble
[1] FALSE

$boxr.progress
[1] FALSE

$boxr.verbose
[1] FALSE

$boxr.wd
$boxr.wd$id
[1] 0

$boxr.wd$name
[1] "All Files"


$boxr.wd.path
[1] ""

For this tutorial, I switch off the progress bar and suppress messages with by setting the boxr.progress and boxr.verbose options to FALSE.

Environmental variables

To communicate the Client ID and Client Secret information shown in the Configuration -> OAuth 2.0 Credentials section of my app’s configuration page (see above), I expose them as environmental variables.

Important

The Client ID and Client Secret strings shown in this tutorial are no longer valid, so please replace them with your own to follow along.

Sys.setenv(BOX_CLIENT_ID="0o5h6s0mey2l5f6l89bkjrmk42vjv9g2")
Sys.setenv(BOX_CLIENT_SECRET="Zfv4UX5Fi2rQAK9Cskcf3RgtFlptrH2c")

Usually, I add the BOX_CLIENT_ID and BOX_CLIENT_SECRET to my .Renviron file instead of specifying them via Sys.setenv(). That way, box_auth() will use them automatically next time and avoid including any credentials in my analysis code.

boxr::box_auth()

Providing the BOX_CLIENT_ID and BOX_CLIENT_SECRET strings is not sufficient to gain access to my box account, though. To complete the OAuth 2.0 hand-shake, I am now directed to a website, which opens in my default web browser, where I can log into my box account interactively (using my usual username and password). Afterward, I confirm that I want to grant my r_oauth app access to my files on my behalf.

Figure 7
Figure 8
Tip

A call to box_auth() is required every time I want to interact with box in an R session, e.g. I include the following two lines at the top of each of my R markdown / quarto documents:

library(boxr)
box_auth()

The cache file

When box_auth() runs successfully for the first time, it creates a cache file (by default as the hidden ~/.box-oauth file in my home directory)2. The next call to box_auth() will refresh the information stored in the cache file instead of prompting me to sign in via the box website again.

file.info("~/.boxr-oauth")[, c("size", "mode", "mtime", "ctime")]
              size mode               mtime               ctime
~/.boxr-oauth 5940  600 2024-11-30 10:23:46 2024-11-30 10:23:46

Writing and reading files from box

Now that I have authenticated with my r_oauth app, I can browse the folders and files available to me on box. The box_getwd() function confirms that I am looking at the root directory of my box account, which has the directory ID 0:

boxr::box_getwd()
[1] 0

The box_ls function returns a boxr_object_list S3 object with all objects that are stored in the location specified by the dir_id argument. It can be coerced into a data.frame, e.g. to filter the output with verbs from the dplyr or poorman R packages.

objects <- boxr::box_ls(dir_id = box_getwd())
class(objects)
[1] "boxr_object_list"

Right now, there is a single folder tutorial in my box account3. Its unique identifier is shown in the id column and I can refer to objects in my box account (both folders and files) using this identifier.

as.data.frame(objects)
      name   type           id  size description                  owner
1 tutorial folder 296254108479 12754             tomsing1+box@gmail.com
       path         modified_at content_modified_at sha1 version version_no
1 All Files 2024-11-30 10:35:37 2024-11-30 10:35:37 <NA>       1          1
  version_id
1       <NA>

I extract the unique identifier of the tutorial folder, so I can refer to it downstream:

suppressPackageStartupMessages(library(poorman))

folder_id <- objects |>
  as.data.frame() |> 
  poorman::filter(name == "tutorial") |> 
  poorman::pull("id")
folder_id
[1] "296254108479"
Tip

Alternatively, I can also copy this identifier from the end of the URL that is showin in my web browser when I open the tutorial folder. Similarly pasting the URL https://app.box.com/folder/296254108479 into a web browser opens my tutorial folder there.

So let’s store a copy of the mtcars dataset included in R’s dataset package as a CSV file in the tutorials folder (specified by its id, see above).

boxr::box_write(object = mtcars, file_name = "mtcars.csv", dir_id = folder_id)

box.com remote file reference

 name        : mtcars.csv 
 file id     : 1713588728473 
 version     : V10 
 size        : 1.3 kB 
 modified at : 2024-11-30 10:36:46 
 created at  : 2024-11-30 10:36:46 
 uploaded by : tomsing1+box@gmail.com 
 owned by    : tomsing1+box@gmail.com 
 shared link : None 

 parent folder name :  tutorial 
 parent folder id   :  296254108479 

To verify the successful upload, I can list the contents of the tutorial folder:

boxr::box_ls(dir_id = folder_id) |>
  as.data.frame()
        name type            id size description                  owner
1 mtcars.csv file 1713588728473 1281             tomsing1+box@gmail.com
2 mtcars.rds file 1713594291978 1225             tomsing1+box@gmail.com
                path         modified_at content_modified_at
1 All Files/tutorial 2024-11-30 10:36:46 2024-11-30 10:36:46
2 All Files/tutorial 2024-11-30 10:35:37 2024-11-30 10:35:37
                                      sha1 version version_no    version_id
1 dfd44f97e2bd9b5efe4bd2e87851cfe717f3fc2f      10         10 1886588891382
2 c41ea4b33768b6a4ae886e592e13090939fab642       1          1 1886592083178

To read the file into my R session, I extract the CSV file’s id and pass it to the box_read_csv function:

file_id <- boxr::box_ls(dir_id = folder_id) |>
  as.data.frame() |>
  poorman::filter(name == "mtcars.csv") |>
  poorman::pull(id)
boxr::box_read_csv(file_id) |>
  head()
   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
Functions to interact with files stored in box

The boxr packages offers convenience functions for different file types, e.g.

The box_write and box_read functions allow me to specify the function that write / reads the file myself via the write_fun and read_fun arguments, respectively.

To simply copy files from / to box without reading them, I can use the box_dl() and box_ul(), respectively.

Finally, I can also move the file into the trash (from which it will be automatically removed after one month) with the box_delete_file() function.

For other functions, e.g. to search the box index, please refer to the boxr reference.

Success!

Great! Now I have a working app that allows me to interact with folders and files in my box account, all from within the comfort of my R session.

Using a JSON Web Token for authentication

Another way of authenticating a Custom App within box is via a JSON Web Token. This requires that the jose R package is installed on your system:

library(jose)

In this mode, the app authenticates using a signed JWT token, which requires a public/private key-pair for added security (see below).

To authenticate via this route, I need to create another app in my box developer console. Because I already created a first app (see above), the console’s home screen now looks different, listing my r_oauth app. To add another app, I click the blue Create Platform App in the upper right corner of the page.

Once again, I choose to create a Custom App and pick a unique name for my app, e.g. r_jwt. But on the second page of the modal, I now choose Server Authentication (with JWT) as my authentication method before clicking the Create App button.

Figure 9
Figure 10

Afterwards, I am redirected to the configuration page of my new r_jwt app, where I make the following changes:

  • As in my first example, I allow the app to write / modify files and folders on my behalf (see Figure 6).

  • To make these changes permanent, I click the blue Save Changes button on the upper right of the Configuration page.

  • To use a JWT for authentication, I need one more piece of information: an RSA keypair to sign and authenticate the requests made by my app. I scroll down to theAdd and Manage Public Keys section.

    There, I use the Generate a new Public/Private Key pair button to generate and download a new key pair for my app.

Figure 11

The key information is downloaded in a JSON file with content similar to this:

{
"boxAppSettings": {
  "clientID": "jlccq2qe365jy2wo4llm9z9isz9didhv",
  "clientSecret": "VSN5WNe9q4hI2h9PJbppQKsIxm8oxJp8",
  "appAuth": {
    "publicKeyID": "wvslfab2",
    "privateKey": "-----BEGIN ENCRYPTED PRIVATE KEY-----\nMIIFHDBOBgkqhki[TRUNCATED]RvOqeWR1mg==\n-----END ENCRYPTED PRIVATE KEY-----\n",
    "passphrase": "87babeaed61e4ad87309f8063918bd04"
  }
},
"enterpriseID": "1246725007"
}

This JSON file contains the

  • clientID: this unique identifier of the app - I will need it to authorize my app in the Admin console (see below).
  • clientSecret: the secret associated with the clientID
  • appAuth: the public and private RSA key pair, along with their pass phrase
  • enterpriseID: the unique identifier of my enterprise, e.g. the organization that manages my box account4.

The contents (or location) of this file is required to authenticate with box. The boxr package authors reccommend creating storing this file in a newly created ~/.boxr-auth folder, and in this tutorial stored the file there under the path ~/.boxr-auth/jwt_config.json 5.

Important

The contents of the JSON file is sufficient to access my box account. So I need to keep it secret, and - if in doubt - revoke and replace the key pair with a new one.

Authorizing the JWT-backed app

In contrast to the OAuth 2.0 app I created above, any JWT-backed app needs to be explicitly authorized by the organization’s admin. Here, I am using a free developer box account, so I am my own admin and can perform the following steps.

  1. I navigate to the home page of my box account and click on the Admin Console button on the lower left of the page. (If you don’t see an option to access the Admin Console the you either have not signed up for a Developer account, or your organization has dedicated admins6 and you are not one of them.)
Figure 12
  1. Within the admin console, I select Integrations in the left hand menu, and then navigate to the Platform Apps Manager tab. Here, I can see all previously authorized apps - but as this is my first JWT app there aren’t any listed.
Figure 13
  1. To add my new app, I click on the Add Platform App button on the upper right of the page.
  • Within the modal that pops up, I paste my app’s ClientID, which I can find either in the JSON token file or on the app’s configuration page ( see above).
  • After pressing the Next button, I can confirm the details and authorize my app.

  1. Once the authorization is complete, I can see my r_jwt app listed in the Platform Apps Manager list.
Figure 14

Authenticating with the JSON web-token

For JWT authentication, the box_auth_service() function is used. I can either point its token_file argument to the JSON file downloaded above, or provide its content as a string via its token_text argument 7. (Alternatively, I could also provide the path to the JSON file via the BOX_TOKEN_FILE environmental variable.)

boxr::box_auth_service(token_file = "~/.boxr-auth/jwt_config.json")

Once the box_auth_service() call has completed successfully, I can explore the files and folders in box.

boxr::box_ls()

box.com remote object list: Empty (no objects returned)

Wait, the box_ls() command does return any objects - where is my tutorials folder? This bring me to a key difference between OAuth 2.0 and JWT backed apps: the former has access to my usual box account. But the latter uses a service account, an entirely separate box account that has access to its own file system, but cannot access those of my main account.

Sharing files and folders with the service account

For some applications, up- and down-loading files that are only accessible via the service account might be all that is needed. But to collaborate with others (including my interactive account), it is helpful to share files with this new account. That’s what collaboration workflows are for.

Here, I will give the service account access to my tutorial folder, which is owned by my main account (and which I accessed using OAuth 2.0 above).

To share the folder (e.g. to create a collaboration in box’ lingo), I need to

  1. Log in to my main account, either interactive via the box website or using OAuth 2.0.
  2. Provide the identifier of the folder I want to share
  3. Provide user_id of the service account I want to share the folder with.
  • This information was returned by the box_auth_service() call above. In this example the user_id is 38323581037.
  1. Decide on the role, e.g. the set of permissions, I want to grant to the service account.
1box_auth()
2objects <- boxr::box_ls()
3folder_id <- objects |>
  as.data.frame() |> 
  poorman::filter(name == "tutorial") |> 
  poorman::pull("id")
4boxr::box_collab_get(folder_id)
5boxr::box_collab_create(
  dir_id = folder_id, 
  user_id = "38323581037", 
  role = "co-owner") |>
  as.data.frame()
1
Re-authenticate with my OAuth 2.0 credentials.
2
List files & folders at the root of my box account.
3
Extract the identifier of the tutorial folder.
4
Confirm that the folder currently doesn’t have collaborators, e.g. is not shared with any other box account (service or otherwise).
5
Share the tutorial folder with the service account, whose use_id is 38323581037.
           type          id created_by.type created_by.id created_by.name
1 collaboration 58843247194            user   38300848207 Thomas Sandmann
        created_by.login                created_at               modified_at
1 tomsing1+box@gmail.com 2024-11-30T10:36:50-08:00 2024-11-30T10:36:50-08:00
  expires_at   status accessible_by.type accessible_by.id accessible_by.name
1       <NA> accepted               user      38323581037              r_jwt
                                  accessible_by.login accessible_by.is_active
1 AutomationUser_2316060_8nQ4K9ugzs@boxdevedition.com                    TRUE
  invite_email     role           acknowledged_at item.type      item.id
1         <NA> co-owner 2024-11-30T10:36:50-08:00    folder 296254108479
  item.sequence_id item.etag item.name is_access_only
1                0         0  tutorial          FALSE

Now that the service account has access to the tutorial folder, I re-authenticate via box_auth_service(). Ther_jwtapp can now list the contents of thetutorial` folder:

boxr::box_auth_service(token_file = "~/.boxr-auth/jwt_config.json")
objects <- boxr::box_ls(folder_id)
objects |> as.data.frame()
        name type            id size description                  owner
1 mtcars.csv file 1713588728473 1281             tomsing1+box@gmail.com
2 mtcars.rds file 1713594291978 1225             tomsing1+box@gmail.com
                path         modified_at content_modified_at
1 All Files/tutorial 2024-11-30 10:36:46 2024-11-30 10:36:46
2 All Files/tutorial 2024-11-30 10:35:37 2024-11-30 10:35:37
                                      sha1 version version_no    version_id
1 dfd44f97e2bd9b5efe4bd2e87851cfe717f3fc2f      10         10 1886588891382
2 c41ea4b33768b6a4ae886e592e13090939fab642       1          1 1886592083178

and it can also write to it:

boxr::box_save_rds(object = mtcars,
                   dir_id = folder_id, 
                   file_name = "mtcars.rds",
                   description = "The mtcars dataset serialized as an RDS file")

box.com remote file reference

 name        : mtcars.rds 
 description : The mtcars dataset serialized as an RDS file 
 file id     : 1713594291978 
 version     : V3 
 size        : 1.2 kB 
 modified at : 2024-11-30 10:36:53 
 created at  : 2024-11-30 10:36:53 
 uploaded by : AutomationUser_2316060_8nQ4K9ugzs@boxdevedition.com 
 owned by    : tomsing1+box@gmail.com 
 shared link : None 

 parent folder name :  tutorial 
 parent folder id   :  296254108479 
boxr::box_ls(dir_id = folder_id) |>
  as.data.frame()
        name type            id size
1 mtcars.csv file 1713588728473 1281
2 mtcars.rds file 1713594291978 1225
                                   description                  owner
1                                              tomsing1+box@gmail.com
2 The mtcars dataset serialized as an RDS file tomsing1+box@gmail.com
                path         modified_at content_modified_at
1 All Files/tutorial 2024-11-30 10:36:46 2024-11-30 10:36:46
2 All Files/tutorial 2024-11-30 10:36:53 2024-11-30 10:36:53
                                      sha1 version version_no    version_id
1 dfd44f97e2bd9b5efe4bd2e87851cfe717f3fc2f      10         10 1886588891382
2 c41ea4b33768b6a4ae886e592e13090939fab642       3          3 1886587197566

Modifying collaborations

We can also revoke the permissions of the service account, by deleting the collaboration, if necessary:

1boxr::box_auth()
2collab <- boxr::box_collab_get(folder_id)
collab_id <- collab |>
  as.data.frame() |>
  poorman::pull(id)
boxr::box_collab_delete(collab_id)
3collab <- boxr::box_collab_get(folder_id)
1
Re-authenticate as the original owner of the folder.
2
Retrieve information about the sharing settings / collaborations of the folder, including the id of the collaboration.
3
Delete the collaboration by specifying its unique id.

Reproducibility

Session Information
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jose_1.2.1    openssl_2.2.2 poorman_0.2.7 boxr_0.3.6   

loaded via a namespace (and not attached):
 [1] bit_4.5.0         jsonlite_1.8.9    dplyr_1.1.4       compiler_4.4.1   
 [5] renv_1.0.11       tidyselect_1.2.1  stringr_1.5.1     assertthat_0.2.1 
 [9] yaml_2.3.10       fastmap_1.2.0     mime_0.12         R6_2.5.1         
[13] generics_0.1.3    curl_6.0.1        knitr_1.49        tibble_3.2.1     
[17] pillar_1.9.0      R.utils_2.12.3    rlang_1.1.4       utf8_1.2.4       
[21] stringi_1.8.4     xfun_0.49         fs_1.6.5          bit64_4.5.2      
[25] cli_3.6.3         magrittr_2.0.3    rio_1.2.3         digest_0.6.37    
[29] askpass_1.2.1     lifecycle_1.0.4   R.methodsS3_1.8.2 R.oo_1.27.0      
[33] vctrs_0.6.5       evaluate_1.0.1    glue_1.8.0        data.table_1.16.2
[37] fansi_1.0.6       rmarkdown_2.29    purrr_1.0.2       httr_1.4.7       
[41] tools_4.4.1       pkgconfig_2.0.3   htmltools_0.5.8.1

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Footnotes

  1. The main difference between a free developer account and a regular free account is that the developer version has access to the administrator console, which will allow you to authorize JWT and CCG applications (source). If you are part of an organization with dedicated admins, then they can authorize apps that use these authentication methods for you. With a regular (e.g. non-developer) free account you can still use OAuth 2.0 for authentication, but won’t be able to run the code examples shown in the JWT section of this tutorial.↩︎

  2. To create cache file under a different path, I could specify it’s location via the cache argument of the box_auth() function. For example, I could point to a network drive that is accessible at the same path from different computers.↩︎

  3. The dir_id defaults to the output of box_getwd() and I only specify it here explicitly to highlight that this argument is available. And yes, there is also a box_setwd() function to set the remote working directory.↩︎

  4. Please note that my enterpriseID is set, even though I am a single developer on a free box plan. If your enterpriseID is set to “0”, you probably didn’t sign up for a Developer account. (Only free Developer accounts or paid plans can authorize apps via JWT or CCG).↩︎

  5. The authors also recommend to keep access to the folder restricted if you are on a system shared with others, e.g. by creating the directory with the fs::dir_create("~/.boxr-auth", mode = 700) command.↩︎

  6. If you are part of an organization with dedicated admins, you need to ask them to authorize your app for you, and provide them with its ClienID.↩︎

  7. I do some of my analyses using Saturn Cloud, which offers convenient ways to inject secrets into cloud instances - either as environmental variables or text files. Because the JWT file is plain text, it works very well with that setup.↩︎