9181

Posts

Mar, 29

Warp Size Impact in GPUs: Large or Small?

There are a number of design decisions that impact a GPU’s performance. Among such decisions deciding the right warp size can deeply influence the rest of the design. Small warps reduce the performance penalty associated with branch divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also […]
Mar, 29

Graphics Processing Unit Acceleration of the Explicit Solution of the Time Domain Volume Integral Equation Using OpenACC

A graphics processing unit (GPU) accelerated implementation of the explicit solution of the time domain volume integral equation (TD-VIE) using the OpenACC application program interface (API) is presented. The use of the OpenACC API, which is based on a collection of compiler directives implementation, allows for the ease of porting as well as the efficient […]
Mar, 29

Parallel Simulation of Population Balance Model-Based Particulate Processes Using Multicore CPUs and GPUs

Computer-aided modeling and simulation are a crucial step in developing, integrating, and optimizing unit operations and subsequently the entire processes in the chemical/pharmaceutical industry. This study details two methods of reducing the computational time to solve complex process models, namely, the population balance model which given the source terms can be very computationally intensive. Population […]
Mar, 29

Accelerating Graph Analysis with Heterogeneous Systems

Data analysis is a rising field of interest for computer science research due to the growing amount of information that is digitally available. This increase in data has as direct consequence that any analysis is significantly complex. By using structured representations for the data sets, like graphs, the analysis becomes feasible, but is still time-consuming. […]
Mar, 26

Adaptive OpenCL (ACL) Execution in GPU Architectures

Open Compute Language (OpenCL) has been proposed as a platform-independent, parallel execution model to target heterogeneous systems, including multiple central processing units, graphics processing units (GPUs), and digital signal processors (DSPs). OpenCL parallelism scales with the available resources and hardware generational improvements due to the data-parallel nature of its kernels. Such parallel expressions must adhere […]
Mar, 26

Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit

We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained […]
Mar, 26

Accelerated Dictionary Learning with GPU/Multicore CPU and Its Application to Music Classification

K-means clustering and GMM training, as dictionary learning procedures, lie at the heart of many signal processing applications. Increasing data scale requires more efficient ways to perform this process. In this paper a new GPU and multi-core CPU accelerated k-means clustering and GMM training is proposed. We show that both methods can be concisely reformulated […]
Mar, 26

General Purpose Computation on Graphics Processing Units Using OpenCL

Computational Science has emerged as a third pillar of science along with theory and experiment, where the parallelization for scientific computing is promised by different shared and distributed memory architectures such as, super-computer systems, grid and cluster based systems, multi-core and multiprocessor systems etc. In the recent years the use of GPUs (Graphic Processing Units) […]
Mar, 26

Improving Performance Portability in OpenCL Programs

We study the performance portability of OpenCL across diverse architectures including NVIDIA GPU, Intel Ivy Bridge CPU, and AMD Fusion APU. We present detailed performance analysis at assembly level on three exemplar OpenCL benchmarks: SGEMM, SpMV, and FFT. We also identify a number of tuning knobs that are critical to performance portability, including threads-data mapping, […]
Mar, 26

Creating Optimal Code for GPU-Accelerated CT Reconstruction Using Ant Colony Optimization

PURPOSE: CT reconstruction algorithms implemented on the GPU are highly sensitive to their implementation details and the hardware they run on. Fine-tuning an implementation for optimal performance can be a time consuming task and require many updates when the hardware changes. There are some techniques that do automatic fine-tuning of GPU code. These techniques, however, […]
Mar, 26

A Region Growing Segmentation Algorithm for GPUs

This paper proposes a parallel region growing image segmentation algorithm for Graphics Processing Units (GPU). It is inspired in a sequential algorithm widely used by the Geographic Object Based Image Analysis (GEOBIA) community. Initially, all image pixels are considered as seeds or primitive segments. Fine grained parallel threads assigned to individual pixels merge adjacent segments […]
Mar, 26

An application of graphical numerical accelerators in simulations of ion-transport through biological membranes

The modeling of ion-transport through biological membranes is important for understanding many life processes. The transmembrane potential and ion concentrations in the stationary state can be measured in in-vivo experiments. They can also be simulated within membrane models. Here we consider a basic model of ion transport that describes the time evolution of ion concentrations […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org