8532

Posts

Nov, 5

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract […]
Nov, 5

Kite: Braided Parallelism for Heterogeneous Systems

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for general purpose computation. Several languages, such as BrookGPU, CUDA, and more recently OpenCL, have been developed to harness the potential of these processors. These languages typically involve control code running on a host CPU, while performance-critical, massively data-parallel kernel […]
Nov, 1

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the […]
Nov, 1

Quantum.Ligand.Dock: protein-ligand docking with quantum entanglement refinement on a GPU system

Quantum.Ligand.Dock (protein-ligand docking with graphic processing unit (GPU) quantum entanglement refinement on a GPU system) is an original modern method for in silico prediction of protein-ligand interactions via high-performance docking code. The main flavour of our approach is a combination of fast search with a special account for overlooked physical interactions. On the one hand, […]
Nov, 1

DL: A data layout transformation system for heterogeneous computing

For many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns to improve off-chip memory access behavior, but several challenges had limited the use of automated data layout transformation systems […]
Nov, 1

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

The efficient solution of the Frank-Kamenetskii partial differential equation through the implementation of parallelized numerical algorithms or GPUs (Graphics Processing Units) in MATLAB is a natural progression of the work which has been conducted in an area of practical import. There is an on-going interest in the mathematics describing thermal explosions due to the significance […]
Nov, 1

An Intermediate Library for Multi-GPUs Computing Skeletons

This paper introduces a library which supports programmers to write parallel programs on GPU architecture, especially with a system consisting of multi-GPUs. The library is designed from the idea of skeletons, which helps us to make parallel programs easily and quickly as if writing sequential programs. Skeletons usually are described by functional language which supports […]
Nov, 1

Speeding up the evaluation of evolutionary learning systems using GPGPUs

In this paper we introduce a method for computing fitness in evolutionary learning systems based on NVIDIA’s massive parallel technology using the CUDA library. Both the match process of a population of classifiers against a training set and the computation of the fitness of each classifier from its matches have been parallelized. This method has […]
Oct, 31

Efficient Pattern-Based Time Series Classification on GPU

Time series shapelet discovery algorithm finds subsequences from a set of time series for use as primitives for time series classification. This algorithm has drawn a lot of interest because of the interpretability of its results. However, computation requirements restrict the algorithm from dealing with large data sets and may limit its application in many […]
Oct, 31

Parallel Genetic Programming on Graphics Processing Units

In program inference, the evaluation of how well a candidate solution solves a certain task is usually a computationally intensive procedure. Most of the time, the evaluation involves either submitting the program to a simulation process or testing its behavior on many input arguments; both situations may turn out to be very time-consuming. Things get […]
Oct, 31

A Contour-Guided Deformable Image Registration Algorithm for Adaptive Radiotherapy

In adaptive radiotherapy, a deformable image registration is often conducted between the planning CT and the treatment CT (or cone beam CT) to generate a deformation vector field (DVF) for dose accumulation and contour propagation. The auto-propagated contours on the treatment CT may contain relatively large errors especially in low-contrast regions. Clinician’s inspection and editing […]
Oct, 31

GPU implementation of a Landau gauge fixing algorithm

We discuss how the steepest descent method with Fourier acceleration for Laudau gauge fixing in lattice SU(3) simulations can be implemented using CUDA. The scaling of the gauge fixing code was investigated using a Tesla C2070 Fermi architecture, and compared with a parallel CPU gauge fixing code.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: