I have a training algorithm that needs to load a huge dataset into memory and then translate it into another format. After this operation I can free the memory used to hold the first copy of the dataset. This single operation takes a fraction of my total compute time but requires allocating twice as much memory as my dataset consumes. Has anyone had success swapping to disk for cases like this?
I understand that Sagemaker is running my docker container on the training instances and I have tried passing ContainerArguments=["--memory-swap", "1g"] to the constructor of the Estimator. I logged the output of swapon -s before and after this change but it does not produce any output in either case.
Estimator's
container_argumentsis to pass arguments to the entry point (container_entry_point), rather than to container runtime, so--memory-swapyou are passing is not taking expected effect.