As far as I know, you need a host code (for CPU) and a device code (for GPU), without them you can't run something on GPU.
I am learning PTX ISA and I don't know how to execute it on Windows. Do I need a .cu file to run it or is there another way to run it?
TL;DR:
You use the CUDA driver API. Relevant sample codes are
vectorAddDrv(or perhaps any other driver API sample code) as well asptxjit.You do not need a
.cufile (nor do you neednvcc) to use the driver API method, if you start with device code in PTX form.Details:
The remainder of this answer is not intended to be a tutorial on driver API programming (use the references already given and the API reference manual here), nor is it intended to be a tutorial on PTX programming. For PTX programming I refer you to the PTX documentation.
To start with, we need an appropriate PTX kernel definition. (For that, rather than writing my own kernel PTX code, I will use the one from the
vectorAddDrvsample code, from the CUDA 11.1 toolkit, converting that CUDA C++ kernel definition to an equivalent PTX kernel definition vianvcc -ptx vectorAdd_kernel.cu):vectorAdd_kernel.ptx:
We'll also need a driver API C++ source code file that does all the host-side work to load this kernel and launch it. Again I will use the source code from the
vectorAddDrvsample project (the .cpp file), with modifications to load PTX instead of fatbin:vectorAddDrv.cpp:
(Note that I have stripped out various items such as deallocation calls. This is intended to demonstrate the overall method; the code above is merely a demonstrator.)
On Linux:
We can compile and run the code as follows:
On Windows/Visual Studio: Create a new C++ project in visual studio. Add the above .cpp file to the project. Make sure the
vectorAdd_kernel.ptxfile is in the same directory as the built executable. You will also need to modify the project definition to point the location of the CUDA include files and the CUDA library files. Here's what I did in VS2019:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\includeC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64cuda.libfile (for the driver API library)vectorAdd_kernel.ptxfile is in that directory, and run the executable from that directory. (i.e. open a command prompt. change to that directory. run the application from the command prompt)NOTE: If you are not using the CUDA 11.1 toolkit or newer, or if you are running on a GPU of compute capability 5.0 or lower, the above PTX code will not work, and so this example will not work verbatim. However the overall method will work, and this question is not about how to write PTX code.
EDIT: Responding to a question in the comments:
I'm not aware of a method provided by the NVIDIA toolchain to do this. It's pretty much the domain of the runtime API to create these unified binaries, from my perspective.
However the basic process seems to be evident from what can be seen of the driver API flow already in the above example: whether we start with a
.cubinor a.ptxfile, either way the file is loaded into a string, and the string is handed off tocuModuleLoad(). Therefore, it doesn't seem that difficult to build a string out of a.cubinbinary with a utility, and then incorporate that in the build process.I'm really just hacking around here, you should use this at your own risk, and there may be any number of factors that I haven't considered. I'm just going to demonstrate on linux for this part. Here is the source code and build example for the utility:
The next step here is to create a
.cubinfile for use. In the above example, I created the ptx file vianvcc -ptx vectorAdd_kernel.cu. We can just change that tonvcc -cubin vectorAdd_kernel.cuor you can use whatever method you like to generate the.cubinfile.With the cubin file created, we need to convert that into something that can be sucked into our C++ code build process. That is the purpose of the
f2sutility. You would use it like this:(probably it would be good to allow the
f2sutility to accept an input filename as a command-line argument. Exercise left to reader. This is just for demonstration/amusement.)After the creation of the above header file, we need to modify our
.cppfile as follows:It seems to work. YMMV. For further amusement, this approach seems to create a form of obfuscation:
LoL