12437

Posts

Apr, 30

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

GPUVerify is a static analysis tool for verifying that GPU kernels are free from data races and barrier divergence. It is intended as an automatic tool, but its usability is impaired by the fact that the user must explicitly supply the kernel source code, the number of threads, and some kernel arguments. Extracting this information […]
Apr, 27

Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures

Based on the premise that preconditioners needed for scientific computing are not only required to be robust in the numerical sense, but also scalable for up to thousands of light-weight cores, we argue that this two-fold goal is achieved for the recently developed self-adaptive multi-elimination preconditioner. For this purpose, we revise the underlying idea and […]
Apr, 21

GPU Encrypt: AES Encryption on Mobile Devices

In this report, we have taken the first steps in investigating the feasibility of using the GPU as a cryptographic accelerator for the AES algorithm on mobile devices. In particular, our focus was on exploring the use of OpenCL as a framework for implementing the algorithm. Using modifications of an existing implementation [11], we first […]
Apr, 12

Analysis and Review of Sorting Algorithms

One of the fundamental issues in computer science is ordering a list of items. Although there is a huge number of sorting algorithms, sorting problem has attracted a great deal of research; because efficient sorting is important to optimize the use of other algorithms. Sorting algorithms have been studied extensively since past three decades. Their […]
Apr, 11

High performance in silico virtual drug screening on many-core processors

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their […]
Apr, 9

3D Hydrodynamic Simulation of Classical Nova Explosions

The purpose of this project is to develop a computer model to investigate the formation and life cycle of classical novae. A nova is an orbiting system consisting of a white dwarf and star. Over time, the white dwarf pulls hydrogen gas from the star which gathers onto the surface of the white dwarf (the […]
Apr, 1

GPU Based Performance Acceleration of Radar Imaging Algorithms

We consider the performance acceleration of the conventional Time Domain Backprojection and Kirchhoff Migration algorithms for imaging concealed targets. The Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL) are used here for accelerating these algorithms on Graphics Processing Units (GPUs). Data generated by means of analytical methods, simulation and experiment are used for […]
Apr, 1

Enhanced Parallel NegaMax Tree Search Algorithm on GPU

Parallel performance for GPUs today surpasses the traditional multi-core CPUs. Currently, many AI algorithms started to be tested on GPUs rather than CPUs, especially after the release of libraries such as Cuda and OpenCL that allows the implementation of general algorithms on the GPU. One of the most famous game tree search algorithms is Negamax, […]
Apr, 1

Code Generation for Embedded Heterogeneous Architectures on Android

The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs […]
Mar, 28

Improving Cache Locality for GPU-based Volume Rendering

We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects […]
Mar, 28

GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a […]
Mar, 28

Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels

This dissertation explores just-in-time (JIT) specialization as an optimization for OpenCL data-parallel compute kernels. It describes the implementation and performance of two extensions to OpenCL, Bacon and Specialization Annotated OpenCL (SOCL). Bacon is a replacement interface for OpenCL that provides improved usability and has JIT specialization built in. SOCL is a simple extension to OpenCL […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: