I am struggling with "-ta" flag in pgi compiler in order to use GPU acceleration using OpenACC. I did not find any comprehensive answer. Yes, I know that it is called target accelerator to boost using information about the hardware. So, what -ta should I set, if my GPU hardware is:
weugene@landau:~$ sudo lspci -vnn | grep VGA -A 12
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [10de:1bb1] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GP104GL [Quadro P4000] [10de:11a3]
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, IRQ 46, NUMA node 0
Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
CUDA versions for pgi compiler (/opt/pgi/linux86-64/2019/cuda) are: 9.2, 10.0, 10.1
As you note, "-ta" stands for "target accelerator" and is a way for you to override the default target device when using "-acc" ("-acc" tells the compiler to use OpenACC and using just "-ta" implies "-acc"). PGI currently supports two targets, "multicore" to target a mult-core CPU, or "tesla" to target an NVIDIA Tesla device. Other NVIDIA products such as Quadro and GeForce will also work under the "tesla" flag provided they share the same architecture as a Tesla product.
By default when using "-ta=tesla", the PGI compiler will create a unified binary supporting multiple NVIDIA architectures. The exact set of architectures will depend on the compiler version and the CUDA device driver on the build system. For example with PGI 19.4 on a system with a CUDA 9.2 driver, the compiler will target Kepler (cc35), Maxwell (cc50), Pascal (cc60), and Volta (cc70) architectures. "cc" stands for the compute capability. Note if no CUDA driver can be found on the system, then the 19.4 compiler default to use CUDA 10.0.
In your case, a Quadro P4000 uses the Pascal architecture (cc60) so would be targeted by default. If you wanted to have the compiler only target your device, as opposed to creating a unified binary, you'd used the option "-ta=tesla:cc60"
You can also override which Cuda version to use as a sub-option. For example "-ta=tesla:cuda10.1". For a complete list of sub-options please run "pgcc -help -ta" from the command line or consult PGI's documentation.
If you don't know the compute capability of the device, run the PGI utility "pgaccelinfo" which will give you this information. For example, here's the output for my system which has a V100:
Hope this helps!