high performance computing on graphics processing units: hgpu.org

Posts

Jan, 23

A study of parallel evolution strategy: pattern search on a GPU computing platform

This paper presents a massively parallel Evolution Strategy – Pattern Search Optimization (ES-PS) algorithm with graphics hardware acceleration on bound constrained nonlinear continuous optimization functions. The algorithm is specifically designed for a graphic processing unit (GPU) hardware platform featuring ‘Single Instruction – Multiple Thread’ (SIMT). GPU computing is an emerging desktop parallel computing platform. The […]

CUDA

Jan, 23

Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform

This paper studies the efficiency of using Pattern Search (PS) on bound constrained optimization functions on a Graphics Processing Unit (GPU) computing platform. Pattern Search is a direct search optimization technique that does not require derivative information on non-linear programming problems. Pattern Search is ideally suited to a GPU computing environment due to its low […]

CUDA

Jan, 23

Deployment of CPU and GPU-based genetic programming on heterogeneous devices

A widely available and economic means of increasing the computing power applied to a problem is to use modern graphics processing units (GPUs) for parallel processing. We present a new, optimized general methodology for deploying genetic programming (GP) to the PC, Xbox 360 video game console, and Zune portable media device. This work describes, for […]

Jan, 23

Implementation of Parallel Genetic Algorithms on Graphics Processing Units

In this paper, we propose to parallelize a Hybrid Genetic Algorithm (HGA) on Graphics Processing Units (GPUs) which are available and installed on ubiquitous personal computers. HGA extends the classical genetic algorithm by incorporating the Cauchy mutation operator from evolutionary programming. In our parallel HGA, all steps except the random number generation procedure are performed […]

Jan, 23

High performance genetic programming on GPU

The availability of low cost powerful parallel graphics cards has stimulated the port of Genetic Programming (GP) on Graphics Processing Units (GPUs). Our work focuses on the possibilities offered by Nvidia G80 GPUs when programmed in the CUDA language. We compare two parallelization schemes that evaluate several GP programs in parallel. We show that the […]

CUDA

Jan, 23

An Improved Magma Gemm For Fermi Graphics Processing Units

We present an improved matrix-matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural features, […]

CUDA

Jan, 22

Implementing molecular dynamics on hybrid high performance computers – short range forces

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In […]

CUDA

Jan, 22

Swan: A tool for porting CUDA programs to OpenCL

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence […]

CUDA

•

OpenCL

Jan, 22

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

For the last 30 years, several dynamic memory managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. This issue has […]

Jan, 22

Porting estimation of distribution algorithms to the cell broadband engine

Current consumer-grade computers and game devices incorporate very powerful processors that can be used to accelerate many classes of scientific codes. In this paper we explore the ability of the Cell Broadband Engine to run two similar Estimation of Distribution Algorithms, one for the discrete domain and the other for the continuous domain. Starting from […]

Jan, 22

Evolving Soft Robotic Locomotion in PhysX

Given the complexity of the problem, genetic algorithms are one of the more promising methods of discovering control schemes for soft robotics. Since physically embodied evolution is time consuming and expensive, an outstanding challenge lies in developing fast and suitably realistic simulations in which to evolve soft robot gaits. We describe two parallel methods of […]

CUDA

Jan, 22

Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms

Current consumer-grade computers and game devices incorporate very powerful processors that can be used to accelerate many classes of scientific codes. However, programming multi-core chips, hybrid multi-processors or graphical processing units is not an easy task for those programmers that deal mainly with sequential codes. In this paper, we explore the ability of the Cell […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A study of parallel evolution strategy: pattern search on a GPU computing platform

Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform

Deployment of CPU and GPU-based genetic programming on heterogeneous devices

Implementation of Parallel Genetic Algorithms on Graphics Processing Units

High performance genetic programming on GPU

An Improved Magma Gemm For Fermi Graphics Processing Units

Implementing molecular dynamics on hybrid high performance computers – short range forces

Swan: A tool for porting CUDA programs to OpenCL

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

Porting estimation of distribution algorithms to the cell broadband engine

Evolving Soft Robotic Locomotion in PhysX

Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)