8177

Posts

Aug, 26

A curved-element unstructured discontinuous Galerkin method on GPUs for the Euler equations

In this work we consider Runge-Kutta discontinuous Galerkin methods (RKDG) for the solution of hyperbolic equations enabling high order discretization in space and time. We aim at an efficient implementation of DG for Euler equations on GPUs. A mesh curvature approach is presented for the proper resolution of the domain boundary. This approach is based […]
Aug, 22

Coding Ants: Using Ant Colony Optimization to Accelerate CT Reconstruction

There is no one size fits all solution when it comes to CT reconstruction. Many different CT reconstruction algorithms and implementations have been devised in an attempt to solve the problem of producing an image under a specific set of constraints. One optimal CT reconstruction implementation can look very different from another optimal implementation; depending […]
Aug, 22

Parallel Trajectory Planning on GPU

The release of the CUDA architecture made massively parallel computing possible on ordinary desktops and opened a door to enormous computing power of graphics adapters. The trajectory planning for aerial vehicles is one of the tasks that can benefit from it. The sought path must respect all physical limitations of the airplane and avoid all […]
Aug, 22

Improving OpenACC compatibility within accULL

The irruption in the HPC scene of hardware accelerators, like GPUs, has made available unprecedented performance to developers. However, even expert developers may not be ready to exploit the new complex processor hierarchies. We need to find a way to leverage the programming effort in these devices at programming language level, otherwise, developers will spend […]
Aug, 22

Approaches for the Parallelization of Software Implementation of Integer Multiplication

In this paper there are considered several approaches for the increasing performance of software implementation of integer multiplication algorithm for the 32-bit & 64-bit platforms via parallelization. The main idea of algorithm parallelization consists in delayed carry mechanism using which authors have proposed earlier [11]. The delayed carry allows to get rid of connectivity in […]
Aug, 22

Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers

The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from many types of partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene […]
Aug, 21

Supporting Preemptive Task Executions and Memory Copies in GPGPUs

GPGPUs (General Purpose Graphic Processing Units) provide massive computational power. However, applying GPGPU technology to real-time computing is challenging due to the non-preemptive nature of GPGPUs. Especially, a job running in a GPGPU or a data copy between a GPGPU and CPU is non-preemptive. As a result, a high priority job arriving in the middle […]
Aug, 21

Streamed Watershed Transform on GPU for Processing of Large Volume Data

Since its introduction the watershed transform became a popular method for volume data segmentation. A range of various algorithms for its computation were developed, including parallel algorithms for computation on different architectures. Recently also algorithms for consumer graphical accelerators were developed. Neither of these, however, are able to process data larger than the available memory […]
Aug, 21

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

Given the extraordinary computational power of modern graphics processing units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance GPGPU code, we conduct an […]
Aug, 21

Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput

On-chip shared memory (a.k.a. local data share) is a critical resource to many GPGPU applications. In current GPUs, the shared memory is allocated when a thread block (also called a workgroup) is dispatched to a streaming multiprocessor (SM) and is released when the thread block is completed. As a result, the limited capacity of shared […]
Aug, 21

GPU-Accelerated Light Stemmer for the Arabic Language

Preprocessing of data is a vital aspect in information retrieval. Stemming is a major preprocessing task. The goal of stemming is to reduce the inflectional and some of the derivational forms of a word to its base form. Dealing with the massive amounts of data on the web, preprocessing generally consumes a major portion of […]
Aug, 20

Cosmological Calculations on the GPU

CONTEXT: Cosmological measurements require the calculation of nontrivial quantities over large datasets. The next generation of survey telescopes (such as DES, PanSTARRS, and LSST) will yield measurements of billions of galaxies. The scale of these datasets, and the nature of the calculations involved, make cosmological calculations ideal models for implementation on graphics processing units (GPUs). […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: