What are the limits on CUDA printf arguments?

56 Views Asked by At

I am doing printf from a __device__ function for debugging purposes. The printf is massive, with a multiline format string and 62 numerical values. Eh, debugging, what else can you do?

Unfortunately, the output of said printf is complete gibberish at the tail of the output. It feels like an uninitialized memory dump. After a day of poking around I started suspecting the printf and split it into 3 separate calls to printf, with 20-ish parameters each. And, hooray! the output made sense.

Thus the first question: is there a documented limit on the number of arguments or on the size of the output that CUDA printf can handle?

Now, after splitting the call into three, I have another problem: although each call to printf seems to lock stdout, so that output from separate threads don't interfere with each other, but that's no longer true for multiple calls. I am now getting output like this:

First printf from thread 0
First printf from thread 1
Second printf from thread 0
First printf from thread 2
Third printf from thread 0
...

Thus the second question: if CUDA printf indeed has limitations and massive calls have to be broken into smaller ones, is there a way to force a cohesive output, without a race?

0

There are 0 best solutions below