I am new to deep learning and computer vision. I recently did a tutorial on implementation of yolov1. Now I am trying it with a different dataset.
Here is my code-
def get_bboxes(filename):
name = filename[:-4]
indices = meta_df[meta_df['new_img_id'] == float(name)]
bounding_boxes = []
for index, row in indices.iterrows():
x = float(row['x'])
y = float(row['y'])
w = float(row['width'])
h = float(row['height'])
im_w = float(row['img_width'])
im_h = float(row['img_height'])
class_id = int(row['cat_id'])
bounding_box = [(x+w/2)/(im_w), (y+h/2)/(im_h), w/im_w, h/im_h, class_id]
bounding_boxes.append(bounding_box)
return tf.convert_to_tensor(bounding_boxes, dtype=tf.float32)
def generate_output(bounding_boxes):
output_label = np.zeros([int(SPLIT_SIZE), int(SPLIT_SIZE), int(N_CLASSES+5)])
for b in range(len(bounding_boxes)):
grid_x = bounding_boxes[...,b,0]*SPLIT_SIZE
grid_y = bounding_boxes[...,b,1]*SPLIT_SIZE
i = int(grid_x)
j = int(grid_y)
output_label[i, j, 0:5] = [1., grid_x%1, grid_y%1, bounding_boxes[...,b,2], bounding_boxes[...,b,3]]
output_label[i, j, 5+int(bounding_boxes[...,b,4])] = 1.
return tf.convert_to_tensor(output_label, tf.float32)
def get_imboxes(im_path, map):
img = tf.io.decode_jpeg(tf.io.read_file('./new_dataset/'+im_path))
img = tf.cast(tf.image.resize(img, [H,W]), dtype=tf.float32)
bboxes = tf.numpy_function(func=get_bboxes, inp=[im_path], Tout=tf.float32)
return img, bboxes
train_ds2 = train_ds1.map(get_imboxes)
val_ds2 = val_ds1.map(get_imboxes)
transforms = A.Compose([
A.Resize(H,W),
A.RandomCrop(
width = np.random.randint(int(0.9*W), W),
height = np.random.randint(int(0.9*H), H), p=0.5),
A.RandomScale(scale_limit=0.1, interpolation=cv.INTER_LANCZOS4, p=0.5),
A.HorizontalFlip(p=0.5),
A.Resize(H,W)
], bbox_params =A.BboxParams(format='yolo'))
def aug_albument(image, bboxes):
augmented = transforms(image = image, bboxes = bboxes)
return [tf.convert_to_tensor(augmented['image'], dtype=tf.float32), tf.convert_to_tensor(augmented['bboxes'], dtype=tf.float32)]
def process_data(image, bboxes):
aug = tf.numpy_function(func=aug_albument, inp=[image, bboxes], Tout=(tf.float32, tf.float32))
return aug[0], aug[1]
train_ds3 = train_ds2.map(process_data)
I am getting a error in the albumentation.
when I write these lines-
for i, j in train_ds3:
print(j)
I get this error-InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} ValueError: Expected y_min for bbox (0.203125, -0.002777785062789917, 0.8187500238418579, 0.8694444596767426, 5.0) to be in the range [0.0, 1.0], got -0.002777785062789917.
I checked all the labels of train_ds2. But I could not find any negative value. What am I missing here?
I tried commenting the all the albumentation operations on image but still same problem
In your get_bboxes() function, you will get the values as follows:
x = float(row['x']) y = float(row['y']) w = float(row['width']) h = float(row['height']). When you retrieve data in the form x, y, w, h it implies that it is already the x_center and the y_center. So when you do(x+w/2)/(im_w)or even(y+h/2)/(im_h), it calculates the x_max and y_max instead. The format will be incorrect and until then you will have no problem intrain_ds2. The problem will occur when you use albumentations withformat='yolo'. It will receive an incorrect format and that is probably the reason for the negative values.The solution I think will be to modify your
get_bboxes()function as follows:I don't know the format of your data so if I'm wrong, please provide more details on the format used.