5704

Posts

Sep, 19

Debugging CUDA

During six months of intensive nVidia CUDA C programming many bugs were created. We pass on the software engineering lessons learnt, particularly those relevant to parallel general-purpose computation on graphics hardware GPGPU.
Sep, 19

Parallel divide-and-evolve: experiments with OpenMP on a multicore machine

Multicore machines are becoming a standard way to speed up the system performance. After having instantiated the evolutionary metaheuristic DAEX with the forward search YAHSP planner, we investigate on the global parallelism approach, which exploits the intrinsic parallelism of the individual evaluation. This paper describes a parallel shared-memory version of the DAEYAHSP planning system using […]
Sep, 19

DAMS: distributed adaptive metaheuristic selection

We present a distributed algorithm, Select Best and Mutate (SBM), in the Distributed Adaptive Metaheuristic Selection (DAMS) framework. DAMS is dedicated to adaptive optimization in distributed environments. Given a set of metaheuristics, the goal of DAMS is to coordinate their local execution on distributed nodes in order to optimize the global performance of the distributed […]
Sep, 19

ACO with tabu search on a GPU for solving QAPs using move-cost adjusted thread assignment

This paper proposes a parallel ant colony optimization (ACO) for solving quadratic assignment problems (QAPs) on a graphics processing unit (GPU) by combining tabu (TS) with ACO in CUDA (ompute unified device architecture). In TS for QAP, all neighbor moves are tested. These moves form two groups based on computing of move cost. In one […]
Sep, 19

Generalisation in genetic programming

Genetic programming can evolve large general solutions using a tiny fraction of possible fitness test sets. Just one test may be enough.
Sep, 19

A training roadmap for new HPC users

Many new users of TeraGrid or other HPC resources are scientists or other domain experts by training and are not necessarily familiar with core principles, practices, and resources within the HPC community. As a result, they often make inefficient use of their own time and effort and of the computing resources as well. In this […]
Sep, 19

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of […]
Sep, 19

Computing without processors

Heterogeneous systems allow us to target our programming to the appropriate environment. From the programmer’s perspective the distinction between hardware and software is being blurred. As programmers struggle to meet the performance requirements of today’s systems, they will face an ever increasing need to exploit alternative computing elements such as GPUs (graphics processing units), which […]
Sep, 19

Real-time ray casting of algebraic B-spline surfaces

Piecewise algebraic B-spline surfaces (ABS surfaces) are capable of modeling globally smooth shapes of arbitrary topology. These can be potentially applied in geometric modeling, scientific visualization, computer animation and mathematical illustration. However, real-time ray casting the surface is still an obstacle for interactive applications, due to the large amount of numerical root findings of nonlinear […]
Sep, 19

The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method

In this paper, we describe the implementation of a multi-graphical processing unit (GPU) fluid flow solver based on the lattice Boltzmann method (LBM). The LBM is a novel approach in computational fluid dynamics, with numerous interesting features from a computational, numerical, and physical standpoint. Our program is based on CUDA and uses POSIX threads to […]
Sep, 19

Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU

Monte Carlo Light Transport algorithms such as Path Tracing (PT), Bi-Directional Path Tracing (BDPT) and Metropolis Light Transport (MLT) make use of random walks to sample light transport paths. When parallelizing these algorithms on the GPU the stochastic termination of random walks results in an uneven workload between samples, which reduces SIMD efficiency. In this […]
Sep, 19

Randomized selection on the GPU

We implement here a fast and memory-sparing probabilistic top k selection algorithm on the GPU. The algorithm proceeds via an iterative probabilistic guess-and-check process on pivots for a three-way partition. When the guess is correct, the problem is reduced to selection on a much smaller set. This probabilistic algorithm always gives a correct result and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: