9673

Posts

Jun, 12

FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs

Sparse matrix matrix (SpMM) multiplication is involved in a wide range of scientific and technical applications. The computational requirements for this kind of operation are enormous, especially for large matrices. This paper analyzes and evaluates a method to efficiently compute the SpMM product in a computing environment that includes graphics processing units (GPUs). Some libraries […]
Jun, 12

FFT-SPA Non-Binary LDPC Decoding on GPU

It is well known that non-binary LDPC codes outperform the BER performance of binary LDPC codes for the same code length. The superior BER performance of non-binary codes comes at the expense of more complex decoding algorithms that demand higher computational power. In this paper, we propose parallel signal processing algorithms for performing the FFT-SPA […]
Jun, 12

OpenCL Implementation of a Color Based Object Tracking

In this paper we present an algorithm for realtime object tracking based on color. Firstly, a two-layer perceptron is trained aimed at coping with scene illumination changes. Based on this training, a piece of OpenCL code is generated for the purpose of harnessing the power of GPU computing. Then, color based object tracking is done […]
Jun, 12

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Small Angle Scattering (SAS) of X-rays or neutrons is an experimental technique that provides valuable structural information for biological macromolecules under physiological conditions and with no limitation on the molecular size. In order to refine molecular structure against experimental SAS data, ab initio prediction of the scattering profile must be recomputed hundreds of thousands of […]
Jun, 10

OCLoptimizer: An Iterative Optimization Tool for OpenCL

Nowadays, computers include several computational devices with parallel capacities, such as multicore processors and Graphic Processing Units (GPUs). OpenCL enables the programming of all these kinds of devices. An OpenCL program consists of a host code which discovers the computational devices available in the host system and it queues up commands to the devices, and […]
Jun, 10

Accelerating Genetic Programming Using Graphics Processing Units

Evolution through natural selection offers the possibility of automatically generating functionally complex solutions to a wide range of problems. Methods such as Genetic Programming (GP) show the promise of this approach but tend to stagnate after relatively few generations. To research this issue, execution speed must be substantially improved. This thesis presents work to accelerate […]
Jun, 10

Processing XPath Structural Constraints on GPU

Technologies such as CUDA and OpenCL have popularized the usage of graphics cards (GPUs) for general purpose programming, often with impressive performance gains. However, using such cards for speeding up XML Databases processing is yet to be fully explored. XML databases offer much flexibility for Web-oriented systems. Nonetheless, such flexibility comes at a considerable computational […]
Jun, 10

A flexible algorithm for calculating pair interactions on SIMD architectures

Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have traditionally worked well, especially since compilers usually do a good job of unrolling the inner loop. In order to reach high performance on modern CPU […]
Jun, 10

Recent Advances on GPU Computing in Operations Research

In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of […]
Jun, 9

GPU Acceleration of Algebraic Multigrid for Low-Frequency Finite Element Methods

This paper introduces a GPU acceleration of a Wavelet-based Algebraic Multigrid used as preconditioner for solving the Laplace’s equation discretized by Finite Element Method. We conduct some tests using a CPU-based direct solver, a CPU-based Preconditined Conjugate Gradient (PCG), and a GPU-based PCG. Finally, we report the solution time and the speed-up achieved in solving […]
Jun, 9

Understanding Dynamic Parallelism at Any Scale with Allinea’s Unified Tools (webinar)

Dynamic Parallelism is a great new feature introduced by NVIDIA in CUDA 5. As powerful features like this are introduced, the complexity of debugging and profiling often increase. This webinar will provide technical insight into how Allinea’s powerful tools can save the day if bugs come up when developing with Dynamic Parallelism. The webinar, presented […]
Jun, 8

GPU Acceleration of Particle Advection Workloads in a Parallel, Distributed Memory Setting

Although there has been significant research in GPU acceleration, both of parallel simulation codes (i.e., GPGPU) and of single GPU visualization and analysis algorithms, there has been relatively little research devoted to visualization and analysis algorithms on GPU clusters. This oversight is significant: parallel visualization and analysis algorithms have markedly different characteristics – computational load, […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org