Posts
Aug, 26
Optimizing Web Virtual Reality
Performance has always been a key factor in any virtual and augmented reality experience. Since Virtual Reality was conceived, performance has always been the factor that has often slowed down, or at times even halted the adoption of Virtual Reality related technologies. More recently, the hardware advancements have caught up with the development so that […]
Aug, 26
Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU
The trend in computer architectures has for several years been heterogeneous systems consisting of a regular CPU and at least one additional, specialized processing unit, such as a GPU.The different characteristics of the processing units and the requirement of multiple tools and programming languages makes programming of such systems a challenging task. Although there exist […]
Aug, 26
Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU […]
Aug, 26
A Qualitative Comparison Study Between Common GPGPU Frameworks
The development of graphic processing units have during the last decade improved significantly in performance while at the same time becoming cheaper. This has developed a new type of usage of the device where the massive parallelism available in modern GPU’s are used for more general purpose computing, also known as GPGPU. Frameworks have been […]
Aug, 19
Kernel Tuner: A search-optimizing GPU code auto-tuner
A very common problem in GPU programming is that some combination of thread block dimensions and other code optimization parameters, like tiling or unrolling factors, results in dramatically better performance than other kernel configurations. To obtain highly-efficient kernels it is often required to search vast and discontinuous search spaces that consist of all possible combinations […]
Aug, 19
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting […]
Aug, 19
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading […]
Aug, 19
Matrix Factorization on GPUs with Memory Optimization and Approximate Computing
Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability e.g. easy to handle positive-unlabeled inputs, fast convergence and parallelization […]
Aug, 19
libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms
Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must […]
Aug, 11
A Cross-platform Evaluation of Graphics Shader Compiler Optimization
For real-time graphics applications such as games and virtual reality, performance is crucial to provide a smooth user experience. Central to this is the performance of shader programs which render images on the GPU. The rise of low-level graphics APIs such as Vulkan means compilation tools play an increasingly important role in the graphics ecosystem. […]
Aug, 11
GPU parallelization of a hybrid pseudospectral fluid turbulence framework using CUDA
An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristics provided to show a potential benefit of the CUDA implementation. The method draws heavily on the CUDA runtime library to […]
Aug, 11
A Case Study in Using OpenCL on FPGAs: Creating an Open-Source Accelerator of the AutoDock Molecular Docking Software
In recent years, OpenCL has been increasingly adopted as it enables software programmers to harness the performance and power efficiency of FPGAs. Despite simplifying the FPGA programming challenge, achieving high performance and energy efficiency with OpenCL is still a difficult task. In order to further contribute to the advance of the OpenCL usage for FPGAs, […]