Problem with SDV (synthetic data vault): Getting back identical synthetic datasets

177 Views Asked by user3786999 At 03 October 2023 at 00:56

I'm using the following code from the SDV library to create a synthetic dataset that's the same shape as my original dataset. While each synthetic dataset is different than the original dataset, all synthetic datasets are identical to each other. I would have thought there would be some randomness built into the synthetic data generation process so that each output would be slightly different. This occurs across sessions even when I set a different random seed. How should I interpret what's happening?

    metadata.detect_from_dataframe(data=input_data)
    synthesizer = SingleTablePreset(metadata=metadata,name='FAST_ML')
    synthesizer.fit(data=input_data)
    synthetic_data = synthesizer.sample(num_rows=len(input_data))```

Original Q&A

There are 1 best solutions below

Neha Patki On 15 December 2023 at 16:47

I believe SDV synthesizers set an internal seed when they run, which explains the determinism you're seeing. This is expected behavior.

If you want different data, you can call the sample method multiple times. Every subsequent run should give you different data. In the code below, all 3 samples of synthetic data will be different.

synthetic_data_1 = synthesizer.sample(num_rows=len(input_data))
synthetic_data_2 = synthesizer.sample(num_rows=len(input_data))
synthetic_data_3 = synthesizer.sample(num_rows=len(input_data))

For more info, see the sampling docs, particularly the reset_sampling method to get back to the initial state.

BTW the team is always looking for feedback. For supporting more randomization options, you can file a feature request directly on the GitHub.

Problem with SDV (synthetic data vault): Getting back identical synthetic datasets

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DATA-GENERATION

Related Questions in SYNTHETIC

Related Questions in SDV

Trending Questions

Popular # Hahtags

Popular Questions