Adding random positional variance to the MNIST dataset

42 Views Asked by At

I agm trying to train an autoencoder on the MNIST set, where the digits are supposed to have a random translation applied to them. Using the torch transforms, I can resize and translate, but this doens't have the desired effect (the digit gets translated out of frame). Does anyone here know of a transform or some other method that would allow me to get a smaller digit randomnly translated?

I have tried to do so manually using the following code:

image = dataset[0][0][0]
background = np.zeros((56,56))
topLeft = (random.randint(0,27), random.randint(0,27))
background[topLeft[0]:topLeft[0]+28, topLeft[1]:topLeft[1]+28] = image[0][0]

but I am unable to do this transformation on the actual MNIST set. Any help would be greatly appreciated.

1

There are 1 best solutions below

0
Prajot Kuvalekar On BEST ANSWER

i have done it with Affine transform

from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

import torch
from torchvision.transforms import v2

plt.rcParams["savefig.bbox"] = 'tight'


torch.manual_seed(0)

# you can download the assets and the
# helpers from https://github.com/pytorch/vision/tree/main/gallery/
from helpers import plot
orig_img = Image.open(Path('gallery/assets/astronaut.jpg'))

affine_transfomer = v2.RandomAffine(degrees=0,translate=(0.1, 0.3),scale=(0.5,0.5))
affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
plot([orig_img] + affine_imgs)

enter image description here

On top of this you can also use 56x56 resize method
here you can see more details, you can play with translate and scale params to shift the image from center

I hope this helps