Understanding the implementation of T-Net in the PointNet model

27 Views Asked by ado sar At 26 February 2024 at 21:45

I reading the PointNet paper and I am trying to understand how I should implement the T-Net block of the model (it is the same idea for both input and feature transform). All the PyTorch implementations I have looked at, do the same thing this snippet is taken from here:

class STNkd(nn.Module):
    def __init__(self, k=64):
        super(STNkd, self).__init__()
        self.conv1 = torch.nn.Conv1d(k, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)
        self.conv3 = torch.nn.Conv1d(128, 1024, 1)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, k*k)
        self.relu = nn.ReLU()

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(128)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(256)

        self.k = k

    def forward(self, x):
        batchsize = x.size()[0]
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)

        x = F.relu(self.bn4(self.fc1(x)))
        x = F.relu(self.bn5(self.fc2(x)))
        x = self.fc3(x)

        iden = Variable(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1,self.k*self.k).repeat(batchsize,1)
        if x.is_cuda:
            iden = iden.cuda()
        x = x + iden
        x = x.view(-1, self.k, self.k)
        return x

I am able to follow all the steps, until the iden part. Basically, if I understand correctly, they add an identity matrix to the regressed one. Does this make sense?

According to the paper:

The output matrix is initialized as an identity matrix.
The feature transformation matrix to be close to orthogonal matrix

The second point is clear. Just add (I - xx^T)^2 in the final loss. The first point seems unclear to me. I think the word initialized is a bad one, since this matrix is actually predicted by the network. Adding at each step (even at inference) an identity matrix, doesn't sound like an initialization.

Can someone fill these gaps between the paper and the implementation?

Some of the PyTorch implementations I have checked:

Original Q&A

Understanding the implementation of T-Net in the PointNet model

There are 0 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in DEEP-LEARNING

Related Questions in NEURAL-NETWORK

Related Questions in COMPUTER-VISION

Related Questions in POINT-CLOUDS

Trending Questions

Popular # Hahtags

Popular Questions