I'm trying to local test a kubeflow component from kfp.v2.ds1 (which works on a pipeline) using pytest, but struggling with the input/output arguments together with fixtures.
Here is a code example to illustrate the issue:
First, I created a fixture to mock a dataset. This fixture is also a kubeflow component.
# ./fixtures/
@pytest.fixture
@component()
def sample_df(dataset: Output[Dataset]):
df = pd.DataFrame(
{
'name': ['Ana', 'Maria', 'Josh'],
'age': [15, 19, 22],
}
)
dataset.path += '.csv'
df.to_csv(dataset.path, index=False)
return
Lets suppose the component double the ages.
# ./src/
@component()
def double_ages(df_input: Input[Dataset], df_output: Output[Dataset]):
df = pd.read_csv(df_input.path)
double_df = df.copy()
double_df['age'] = double_df['age']*2
df_output.path += '.csv'
double_df.to_csv(df_output.path, index=False)
Then, the test:
#./tests/
@pytest.mark.usefixtures("sample_df")
def test_double_ages(sample_df):
expected_df = pd.DataFrame(
{
'name': ['Ana', 'Maria', 'Josh'],
'age': [30, 38, 44],
}
)
df_component = double_ages(sample_df) # This is where I call the component, sample_df is an Input[Dataset]
df_output = df_component.outputs['df_output']
df = pd.read_csv(df_output.path)
assert df['age'].tolist() == expected_df['age'].tolist()
But that's when the problem occurs. The Output[Dataset] that should be passed as an output, is not, so the component cannot properly work with it, then I would get the following error on assert df['age'].tolist() == expected_df['age'].tolist():
AttributeError: 'TaskOutputArgument' object has no attribute 'path'
Aparently, the object is of the type TaskOutputArgument, instead of Dataset.
Does anyone knows how to fix this? Or how to properly use pytest with kfp components? I've searched a lot on internet but couldn't find a clue about it.
After spending my afternoon on this, I finally figured out a way to pytest a python-based KFP component. As I found no other lead on this subject, I hope this can help:
Access the function to test
The trick is not to directly test the KFP component created by the
@componentdecorator. However you can access the inner decorated Python function through the component attributepython_func.Mock artifacts
Regarding the
InputandOutputartifacts, as you get around KFP to access and call the tested function, you have to create them manually and pass them to the function:I had to come up with a workaround for how the
Artifact.pathproperty works (which also applies for all KFPArtifactsubclasses:Dataset,Model, ...). If you look in KFP source code, you'll find that it uses the_get_path()method that returnsNoneif theuriattribute does not start with one of the defined cloud prefixes:"gs://","s3://"or"minio://". As we're manually building artifacts with local paths, the tested component that wants to read thepathproperty of an artifact would read aNonevalue.So I made a simple method that builds a subclass of an
Artifact(or aDatasetor any otherArtifactchild class). The built subclass is simply altered to return theurivalue instead ofNonein this specific case of a non-clouduri.Your example
Putting this all together for your test and your fixture, we can get the following code to work:
src/double_ages_component.py: your component to testNothing changes here. I just added the
pandasimport:tests/utils.py: the Artifact subclass builderI am still not sure it is the most proper workaround. You could also manually create a subclass for each Artifact that you use (
Datasetin your example). Or you could directly mock thekfp.v2.dsl.Artifactclass using pytest-mock.tests/conftest.py: your fixtureI separated the sample dataframe creator component from the fixture. Hence we have a standard KFP component definition + a fixture that builds its output artifact and calls its python function:
The fixture returns an artifact object referencing a selected local path where the sample dataframe has been saved to.
tests/test_component.py: your actual component testOnce again, the idea is to build the I/O artifact(s) and to call the component's
python_func:Result