5711

Posts

Sep, 20

CuMAPz: a tool to analyze memory access patterns in CUDA

CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, especially on memory access pattern, affect performance significantly. This paper presents CuMAPz, a tool to compare the memory performance of […]
Sep, 20

EFFEX: an embedded processor for computer vision based feature extraction

The deployment of computer vision algorithms in mobile applications is growing at a rapid pace. A primary component of the computer vision software pipeline is feature extraction, which identifies and encodes relevant image features. We present an embedded heterogeneous multicore design named EFFEX that incorporates novel functional units and memory architecture support, making it capable […]
Sep, 20

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource […]
Sep, 20

Acceleration of genetic algorithms for sudoku solution on many-core processors

In this paper, we use the problem of solving Sudoku puzzles to demonstrate the possibility of achieving practical processing time through the use of many-core processors for parallel processing in the application of genetic computation. To increase accuracy, we propose a genetic operation that takes building-block linkage into account. As a parallel processing model for […]
Sep, 19

Debugging CUDA

During six months of intensive nVidia CUDA C programming many bugs were created. We pass on the software engineering lessons learnt, particularly those relevant to parallel general-purpose computation on graphics hardware GPGPU.
Sep, 19

Parallel divide-and-evolve: experiments with OpenMP on a multicore machine

Multicore machines are becoming a standard way to speed up the system performance. After having instantiated the evolutionary metaheuristic DAEX with the forward search YAHSP planner, we investigate on the global parallelism approach, which exploits the intrinsic parallelism of the individual evaluation. This paper describes a parallel shared-memory version of the DAEYAHSP planning system using […]
Sep, 19

DAMS: distributed adaptive metaheuristic selection

We present a distributed algorithm, Select Best and Mutate (SBM), in the Distributed Adaptive Metaheuristic Selection (DAMS) framework. DAMS is dedicated to adaptive optimization in distributed environments. Given a set of metaheuristics, the goal of DAMS is to coordinate their local execution on distributed nodes in order to optimize the global performance of the distributed […]
Sep, 19

ACO with tabu search on a GPU for solving QAPs using move-cost adjusted thread assignment

This paper proposes a parallel ant colony optimization (ACO) for solving quadratic assignment problems (QAPs) on a graphics processing unit (GPU) by combining tabu (TS) with ACO in CUDA (ompute unified device architecture). In TS for QAP, all neighbor moves are tested. These moves form two groups based on computing of move cost. In one […]
Sep, 19

Generalisation in genetic programming

Genetic programming can evolve large general solutions using a tiny fraction of possible fitness test sets. Just one test may be enough.
Sep, 19

A training roadmap for new HPC users

Many new users of TeraGrid or other HPC resources are scientists or other domain experts by training and are not necessarily familiar with core principles, practices, and resources within the HPC community. As a result, they often make inefficient use of their own time and effort and of the computing resources as well. In this […]
Sep, 19

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of […]
Sep, 19

Computing without processors

Heterogeneous systems allow us to target our programming to the appropriate environment. From the programmer’s perspective the distinction between hardware and software is being blurred. As programmers struggle to meet the performance requirements of today’s systems, they will face an ever increasing need to exploit alternative computing elements such as GPUs (graphics processing units), which […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: