high performance computing on graphics processing units: hgpu.org

Posts

Jan, 22

Implementing molecular dynamics on hybrid high performance computers – short range forces

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In […]

CUDA

Jan, 22

Swan: A tool for porting CUDA programs to OpenCL

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence […]

CUDA

•

OpenCL

Jan, 22

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

For the last 30 years, several dynamic memory managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. This issue has […]

Jan, 22

Porting estimation of distribution algorithms to the cell broadband engine

Current consumer-grade computers and game devices incorporate very powerful processors that can be used to accelerate many classes of scientific codes. In this paper we explore the ability of the Cell Broadband Engine to run two similar Estimation of Distribution Algorithms, one for the discrete domain and the other for the continuous domain. Starting from […]

Jan, 22

Evolving Soft Robotic Locomotion in PhysX

Given the complexity of the problem, genetic algorithms are one of the more promising methods of discovering control schemes for soft robotics. Since physically embodied evolution is time consuming and expensive, an outstanding challenge lies in developing fast and suitably realistic simulations in which to evolve soft robot gaits. We describe two parallel methods of […]

CUDA

Jan, 22

Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms

Current consumer-grade computers and game devices incorporate very powerful processors that can be used to accelerate many classes of scientific codes. However, programming multi-core chips, hybrid multi-processors or graphical processing units is not an easy task for those programmers that deal mainly with sequential codes. In this paper, we explore the ability of the Cell […]

Jan, 22

Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs

In this paper, we describe our work to investigate how much cyclic graph based Genetic Programming (GP) can be accelerated on one machine using currently available mid-range Graphics Processing Units (GPUs). Cyclic graphs pose different problems for evaluation than do trees and we describe how our CUDA based, “population parallel” evaluator tackles these problems. Previous […]

CUDA

Jan, 22

Distributed genetic programming on GPUs using CUDA

Using of a cluster of Graphics Processing Unit (GPU) equipped computers, it is possible to accelerate the evaluation of individuals in Genetic Programming. Program compilation, fitness case data and fitness execution are spread over the cluster of computers, allowing for the efficient processing of very large datasets. Here, the implementation is demonstrated on datasets containing […]

CUDA

Jan, 22

Improving SMT performance: an application of genetic algorithms to configure resizable caches

Simultaneous Multithreading (SMT) is a technology aimed at improving the throughput of the processor core by applying Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). Nevertheless a good control strategy is required when resources are shared among different threads, so that throughput is optimized. We study the application of evolutionary algorithms to improve the […]

Jan, 22

Accelerating evolutionary computation with graphics processing units

Graphics Processing Units (GPUs) have become a major source of computational power for numerical applications. Originally designed for application of time-consuming graphics operations, GPUs are stream processors that implement the SIMD paradigm. Modern programming tools allow developers to access the parallelism of the GPU in a flexible and convenient way, hiding many low level details […]

Jan, 22

Parallel genetic algorithm on the CUDA architecture

This paper deals with the mapping of the parallel island-based genetic algorithm with unidirectional ring migrations to nVidia CUDA software model. The proposed mapping is tested using Rosenbrock’s, Griewank’s and Michalewicz’s benchmark functions. The obtained results indicate that our approach leads to speedups up to seven thousand times higher compared to one CPU thread while […]

CUDA

Jan, 22

Parallel Genetic Algorithm Solving 0/1 Knapsack Problem Running on the GPU

In this work, we show that consumer-level $100 GPU can be used to significantly speed-up optimization of 0/1 Knapsack problem. We identify strong and weak points of GPU architecture and propose our parallel genetic algorithm model implemented in CUDA running entirely on the GPU. We show that GPU must be utilized for sufficiently long time […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Implementing molecular dynamics on hybrid high performance computers – short range forces

Swan: A tool for porting CUDA programs to OpenCL

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

Porting estimation of distribution algorithms to the cell broadband engine

Evolving Soft Robotic Locomotion in PhysX

Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms

Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs

Distributed genetic programming on GPUs using CUDA

Improving SMT performance: an application of genetic algorithms to configure resizable caches

Accelerating evolutionary computation with graphics processing units

Parallel genetic algorithm on the CUDA architecture

Parallel Genetic Algorithm Solving 0/1 Knapsack Problem Running on the GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)