How to encrypt a SPSS-file using cypher

74 Views Asked by At

Is there a way to encrypt SPSS-files (.sav) using the cyphr-package? Encrypting .csv works fine, but when I try to encrypt .sav, I get following error-message:

  Error in db_lookup(dat$ns, dat$name, file_arg) : 
  Rewrite rule for haven::write_sav not found
2

There are 2 best solutions below

0
Dierforth On BEST ANSWER

I have found a solution where I first convert the original files (*.csv and .sav) into *.rds files. After that they are encrypted. This works as intended.

With this procedure, encrypted *.rds files with the same name are created and saved in a separate folder for all *.csv and *.sav in the original folder.

Load packages:

library(rio)
library(stringr)
library(cyphr)

Set paths to the folder with original unencrypted data (data_originals) and to the folder to store the encrypted data (data_encypted):

path_originals <- "./data_originals"
path_encrypted <- "./data_encypted"

Set working directory:

setwd(path_originals)

Specify the directory in which the encrypted files are to be stored (data_encypted).

data_dir <- file.path(path_encrypted)

Set path of personal key:

path_key_user <- "~/.ssh/"

Create a key for the data and encrypt that key with personal key:

data_admin_init(data_dir, path_user = path_key_user)

Get the data key and add encrypted data to the directory:

key <- cyphr::data_key(data_dir, path_user = path_key_user)

For *.csv-files:

Write all *.csv files in the folder data_originals to a list:

filenames_csv <- list.files(path = path_originals, pattern = "*.csv")

Read in *.csv files located in the folder data_originals:

df_csv <- lapply(filenames_csv, read.csv2)

Create a list of what the *.csv files should be named as *.rds files:

filenames_csv %>% str_replace(".csv", ".rds") -> filenames_csv2rds

Save the *.csv files as *.rds files to the folder created for the encrypted files (data_encrypted):

for (i in 1:length(df_csv)) {
  setwd(path_encrypted)
  export(df_csv[i], filenames_csv2rds[i]) #
}

For *.sav-files:

Set working directory:

setwd(path_originals)

Write all *.sav files in the folder data_originals to a list:

filenames_sav <- list.files(path = path_originals, pattern = "*.sav")

Read in *.sav files located in the folder data_originals:

df_sav <-
  lapply(filenames_sav,
         Hmisc::spss.get,
         use.value.labels = T,
         lowername = T)

Create a list of what the *.sav files should be named as *.rds files:

filenames_sav %>% str_replace(".sav", ".rds") -> filenames_sav2rds

Save the *.sav files as *.rds files to the folder created for the encrypted files (data_encrypted):

for (i in 1:length(df_sav)) {
  setwd(path_encrypted)
  export(df_sav[i], filenames_sav2rds[i]) #
}

Write the names of the *.rds files that are now in the data_encrypted folder and are still to be encrypted in a list:

filenames <- list.files(path = path_encrypted, pattern = "*.rds")

Read in all *.rds files located in the folder data_encrypted.

ldf <- lapply(filenames, readRDS)

Define paths:

paths <- file.path(data_dir, paste0(filenames))

Encrypt and save all files in folder data_encrypted:

for (i in 1:length(ldf)) {
  for (i in 1:length(paths)) {
    encrypt(saveRDS(ldf[i], paths[i]), key)
  }
}
0
caldwellst On

If the read/write functions aren't explicitly supported by cyphr, you have to specify the arguments in the relevant functions that direct to the path so encryption can happen. Do this with the file_arg argument.

library(cyphr)
library(haven)

df <- data.frame(a = 1:3, b = 6:4)

# save to temporary SPSS file
f <- tempfile(fileext = ".sav")

# encrypt file
key <- key_sodium(sodium::keygen())

encrypt(
  expr = write_sav(df, f),
  key = key,
  file_arg = "path"
)

# check file is encrypted
read_sav(f)
#> Error: Failed to parse /private/var/folders/b7/_6hwb39d43l71kpy59b_clhr0000gn/T/RtmpV23nxX/file1325251862c5d.sav: Invalid file, or file has unsupported features.

# decrypt the file
decrypt(
  read_sav(f),
  key,
  file_arg = "file"
)
#> # A tibble: 3 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     6
#> 2     2     5
#> 3     3     4

This is detailed in the package README, which also covers how to register the function if you want to do this frequently.