I am trying to use the {pointblank} package to validate some data in a {targets} pipeline and keep getting an error that it can't find the data object to operate over. When I just execute the line validate_data(my_data, yaml_file) it produces the expected output.
The following is a limited reprex:
library(targets)
library(tarchetypes)
library(pointblank)
library(data.table)
validate_data <- function(my_data, qc_yaml_file) {
yaml_read_agent(qc_yaml_file) |>
interrogate()
}
create_agent(
tbl = ~ my_data,
label = "check on mtcars",
actions = action_levels(
warn_at = 0.10,
stop_at = 0.25
)
) |>
col_exists(columns = c(cyl, vs, am, gear, carb)) |>
yaml_write(filename = "check_mtcars.yaml")
# Run the R scripts in the R/ folder with your custom functions:
tar_source()
# Replace the target list below with your own:
tar_plan(
my_data = setDT(copy(mtcars)),
tar_file(
yaml_file,
"check_mtcars.yaml"
),
tar_target(
qc_results,
validate_data(my_data, yaml_file)
)
)
The yaml file produced by the create_agent pipe looks like:
type: agent
tbl: ~my_data
tbl_name: ~my_data
label: check on mtcars
lang: en
locale: en
actions:
warn_fraction: 0.1
stop_fraction: 0.25
steps:
- col_exists:
columns: c(cyl, vs, am, gear, carb)
Any guidance on why the targets pipeline isn't finding the my_data object would be appreciated.
So, anytime there’s side effects it gets tricky to use
{targets}so I would recommend avoiding relying on the.yamlfile:you can then check the results with
tar_read(qc_results)