Can't find table when using {pointblank} in a {targets} pipeline

50 Views Asked by At

I am trying to use the {pointblank} package to validate some data in a {targets} pipeline and keep getting an error that it can't find the data object to operate over. When I just execute the line validate_data(my_data, yaml_file) it produces the expected output.

The following is a limited reprex:

library(targets)
library(tarchetypes)
library(pointblank)
library(data.table)

validate_data <- function(my_data, qc_yaml_file) {
  yaml_read_agent(qc_yaml_file) |>
    interrogate()
}

create_agent(
  tbl = ~ my_data,
  label = "check on mtcars",
  actions = action_levels(
    warn_at = 0.10,
    stop_at = 0.25
  )
) |>
  col_exists(columns = c(cyl, vs, am, gear, carb)) |>
  yaml_write(filename = "check_mtcars.yaml")



# Run the R scripts in the R/ folder with your custom functions:
tar_source()

# Replace the target list below with your own:
tar_plan(
  my_data = setDT(copy(mtcars)),
  tar_file(
    yaml_file,
    "check_mtcars.yaml"
  ),
  tar_target(
    qc_results,
    validate_data(my_data, yaml_file)
  )
)

The yaml file produced by the create_agent pipe looks like:

type: agent
tbl: ~my_data
tbl_name: ~my_data
label: check on mtcars
lang: en
locale: en
actions:
  warn_fraction: 0.1
  stop_fraction: 0.25
steps:
- col_exists:
    columns: c(cyl, vs, am, gear, carb)

Any guidance on why the targets pipeline isn't finding the my_data object would be appreciated.

1

There are 1 best solutions below

0
brodrigues On

So, anytime there’s side effects it gets tricky to use {targets} so I would recommend avoiding relying on the .yaml file:

library(targets)
library(pointblank)
library(data.table)

wrapper_create_agent <- function(dataset){

create_agent(
  tbl = dataset,
  tbl_name = "my_data",
  label = "check on mtcars",
  actions = action_levels(
    warn_at = 0.10,
    stop_at = 0.25
  )
) |>
  col_exists(columns = c(cyl, vs, am, gear, carb))

}


# Replace the target list below with your own:
list(

  tar_target(
    my_data,
    setDT(copy(mtcars))
  ),

  tar_target(
    my_agent,
    wrapper_create_agent(my_data)
  ),

  tar_target(
    qc_results,
    interrogate(my_agent)
  )
)

you can then check the results with tar_read(qc_results)