Problem
I dont see anything in the profile section in Tensorboard. I get the following Tensorboard interface after running tensorboard --logdir logdir
The tree of logdir is as follow:
logdir
├── events.out.tfevents.17026478. gpu.profile-empty
└── plugins
└── profile
├── 2023_12_15_12_41_18
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
├── 2023_12_15_12_41_21
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
├── 2023_12_15_12_41_22
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
├── 2023_12_15_12_41_23
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
├── 2023_12_15_12_41_24
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
├── 2023_12_15_12_41_25
│ ├── gpu.input_pipeline.pb
│ ├── gpu.kernel_stats.pb
│ ├── gpu.memory_profile.json.gz
│ ├── gpu.overview_page.pb
│ ├── gpu.tensorflow_stats.pb
│ ├── gpu.trace.json.gz
│ └── gpu.xplane.pb
└── 2023_12_15_12_41_26
├── gpu.input_pipeline.pb
├── gpu.kernel_stats.pb
├── gpu.memory_profile.json.gz
├── gpu.overview_page.pb
├── gpu.tensorflow_stats.pb
├── gpu.trace.json.gz
└── gpu.xplane.pb
9 directories, 50 files
Code to generate logdir
This is a simple training loop using the tf.profiler.experimental.Profile API inspired from this TF tutorial :
for epoch in range(1, epochs+1):
if dataset_exists is True:
#with tf.profiler.experimental.Trace('train', step_num=epoch, _r=1):
with tf.profiler.experimental.Profile("logdir"):
loss_train = model.training_step(dataset, optimizer)
else:
loss_train = training_step(model._model, X_train, Y_train, optimizer)
Additional information
I am running the code on a cluster in order to use a GPU. Then I use scp to copy the logdir folder from the cluster to my personal laptop.
Output of the command tensorboard --logdir logdir --inspect:
======================================================================
Processing event files... (this can take a few minutes)
======================================================================
Found event files in:
logdir
These tags are in logdir:
audio -
histograms -
images -
scalars -
tensor -
======================================================================
Event statistics for logdir:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
=====================================================================
I can add the warnings in the terminal if necessary.
Versions
- tensorboard==2.8.0
- tensorboard-data-server==0.6.1
- tensorboard-plugin-profile==2.15.0
- tensorboard-plugin-wit==1.8.1
- tensorflow==2.8.0
- tensorflow-io-gcs-filesystem==0.34.0
Questions/Remarks
- By looking at Tensorboard releases, a user is supposed to use the same versions of Tensorflow and Tensorboard, since "Tensorboard version X tracks Tensorflow version X"
- For those who manage to use Tensorboard with Tensorflow, which versions of Tensorflow and Tensorboard did you use?
