I have a set of PNG 240x240px images (~1Gb of images) for training a classification algorithm and I need to crop the bottom 26px (resulting in 240x214px) from those images, which contain some number and date which I do not want in the final image set.
I've started with a python library named Pillow, but soon realised that after reading, cropping and resaving the PNG images the file size increased about 5x (~5Gb of images).
I think I know it will not affect the outcome of the CNN, but mainly the time it will take to train it, but I still wonder if there is a way to better compress or any other method to keep the original file size.
I will attach one image before (11Kb) and after (74Kb) processing as an example.
I'll also attach a link to imgur where the images are in their original PNG format.
This is the code I've used for cropping the images:
from os import listdir
from PIL import Image
def cropp_images_in_folder(from_folder, to_folder):
for image in listdir(from_folder):
if image.endswith(".png"):
full_path = from_folder + image
im = Image.open(full_path)
im1 = im.crop((0, 0, 240, 214))
full_path = to_folder + image
im1.save(full_path, optimize=True, compress_level = 9)
i += 1


The first image is not PNG. It's JPEG. The second image is PNG. If you want a more comparable size for the second image, then compress it to JPEG as well.