InceptionResnet (vggface2) Pytorch giving incorrect facial predictions

596 Views Asked by At

I am creating a facial recognition system without around 40 faces to be recognized. The process involved

  1. Using OpenCV to stream the IP camera
  2. Facenet-Pytorch MTCCN to detect faces
  3. InceptionResnetV1 (vggface2) to generate embeddings
  4. Load pickle file with the trained model
  5. SVM to classify and predict the face

I gathered the data using the laptop webcam and then trained the model. I then deployed the model to an IP camera and I am getting very few errors using the laptop webcam but the number of incorrect predictions is greater on the IP camera.

I believe this could be due to two reasons but I am unable to figure out the issue:-

  1. When i use OpenCV to stream the IP camera, the resolution is very large which I then resize into a smaller frame to fit the window. I believe this is messing up the embeddings which are generated from the IP camera stream, hence giving the incorrect predictions.
frame = cv2.resize(frame, (1280, 720), interpolation = cv2.INTER_AREA)
  1. There is a problem of unbalanced data and embeddings generation. Some faces have around 200/300 images whereas others have 50 to a 100. Some of the images contain side poses and are blurred with lower lighting which then coupled with the environment of the IP camera lead to incorrect predictions. This is the code for the facial detection and recognition
class FaceRecognition:
    def __init__(self):
        
        self.device_one = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        self.aligner = MTCNN(min_face_size=90) # min_face_size=80
        self.facenet_preprocess = transforms.Compose([modules.Whitening()])
        self.facenet = InceptionResnetV1(pretrained='vggface2').eval()
  
        print('[INFO] FaceRecognition initialized')


    def detect_faces(self, img):

        bbs, confidence = self.aligner.detect(img) 
        if bbs is None:
            return None, None
        return bbs, confidence


    def extract_features(self, img, bbs):
        
        pil_image = Image.fromarray(img)
        faces = torch.stack([extract_face(pil_image, bb) for bb in bbs])
        embeddings = self.facenet(self.facenet_preprocess(faces)).detach().numpy()
        return embeddings
1

There are 1 best solutions below

0
On

Either try to use neural networks (don't generate embeddings separately) or either try to have the same resolution for test images and training