I’m trying to profile inferences of a tiny model with dlprof, but I can’t seem to capture iteration information when i let it run for multiple iterations, this is what the code does
class SmallModel(nn.Module):
def __init__(self):
super(SmallModel, self).__init__()
self.layer1 = nn.Linear(784, 512)
self.layer2 = nn.Linear(512, 256)
def forward(self, x):
x = torch.relu(self.layer1(x))
x = torch.relu(self.layer2(x))
return x
model = SmallModel().cuda().half()
input_data = torch.randn(64, 784).cuda().half()
nvidia_dlprof_pytorch_nvtx.init(enable_function_stack=True)
parser = argparse.ArgumentParser("Nvidia Profiler")
parser.add_argument("--num_iter", dest='num_iter', help="no of iterations to perform", type=int)
args = parser.parse_args()
with torch.no_grad():
with torch.autograd.profiler.emit_nvtx():
for i in range(args.num_iter):
_ = model(input_data)
This is the command I'm running
dlprof --mode=pytorch --key_node=LINEAR_1 -f true --reports=summary,detail,iteration --iter_start=5 --iter_stop=8 python profile_sample_model.py --num_iter 10
This is what the dlprof log generates:
Found 2 iterations using key_op “LINEAR_1” Iterations: [12495162999, 12520617892] Aggregating data over 1 iterations: iteration 1 start (12495162999 ns) to iteration 1 end (12520617892 ns)**
i want dlprof to capture from iter 5 to iter 8 independently, instead it skips aggregation until the first instance it encounters the specified key_node and then aggregates the rest of the 9 iterations as a one iteration, what am i doing wrong here, --iter_start=5 --iter_stop=8 doesn’t seem to have any effect
Really appreciate any guidance on this.