TLDR; How can I conduct a sweep in weights and biases across not just hyperparameters, but across model architectures and datasets as well, in a way that respects sensible aggregation and interpretability?
I'm new to using weights and biases, and I have a conceptual/organizational question:
I'm trying to conduct a sweep across multiple datasets, multiple model types, and multiple hyperparameter options. Generally the work should look something like this:
- Iterate over all datasets, dividing them into train, validate, and test partitions
- Iterate over all models
- Iterate over hyperparameters
- Rank model performance across each dataset and generate informative aggregations.
I have around 59 different datasets with both classification and regression targets.
I have a few objectives in terms of general organization which I think would be useful to respect:
- I'd like to keep a singly ran instance as small as possible. In other words, I don't really want to train on every dataset with a specific model-hyperparameter pair.
- I think, practically, I'll have to train, validate, and test on every epoch. That should be fine, and will allow me to choose a model from the validation results and grab the test results from that epoch.
- Each model has it's own hyperparameters and range of exploration, so the chosen hyperparameters need to be appropriate for the current model.
- This is probably the most important part, and why I'm posing this question: I want the results to be organized within weights and biases in a reasonable way: being able to compare across a specific dataset, or across all datasets given a particular model, etc.
Currently my biggest conceptual issue is in setting this up, in a sensible way, with wandb.config. I think the best way to do it would be by letting wandb decide what dataset, modeling strategy, and hyperparameters a worker should be using for a particular run. However, that means using a combination of grid-search (for the dataset and model) and random search (for the hyperparameters, but also that random search has to respect the desired hyperparameter space for that particular model architecture). Also, I'm not sure how to set up the configuration(s) /project(s) to respect my desired querying abilities.
So, given a task like this, how would you use WandB for your orchestration/observability? Feel free to tell me I'm doing everything wrong.