clpeak – peak performance of your opencl device
Samsung R&D Institute India, Bangalore, India
clpeak is a benchmarking tool intended toward developers to fine-tune opencl kernels for a particular device/class of device. It calculates bandwidth & compute performance for different vector-widths of a datatype, say float, float4. Traditionally it is recommended to use scalar code and we expect opencl compiler to auto-vectorize it. But, most of the times compiler will not be able to vectorize a scalar code. A hand-written vector code is always efficient in performance critical scenarios. This tool gives an idea about internal architecture of the device and what vector-widths should be used to realize full potential. It also measures host to device transfer bandwidths and vice-versa. Transfers can be done using enqueueWriteBuffer or enqueueMapBuffer. Map can happen through pinned-memory or sometimes zero-copy. This tool can indicate a zero-copy transfer and memcpy bandwidth on zero-copied memory.
January 23, 2014 by hgpu