high performance computing on graphics processing units: hgpu.org

Posts

Jun, 21

libCudaOptimize: an Open Source Library of GPU-based Metaheuristics

Evolutionary Computation techniques and other metaheuristics have been increasingly used in the last years for solving many real-world tasks that can be formulated as optimization problems. Among their numerous strengths, a major one is their natural predisposition to parallelization. In this paper, we introduce libCudaOptimize, an open source library which implements some metaheuristics for continuous […]

CUDA

Jun, 21

CFMDS: CUDA-based fast multidimensional scaling for genome-scale data

BACKGROUND: Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as […]

CUDA

Jun, 21

Artificial Neural Network Simulation on CUDA

The advent of low cost GPU hardware and user friendly parallel programming APIs, such as NVIDIA CUDA means that affordable, programmable, high-performance computing environments for simulation are now attainable for development of scientific simulations. In this paper the authors present the MineHunter program, a parallel simulation of neural networks on NVIDIA CUDA. The simulation consists […]

CUDA

Jun, 21

On the Effect of Using Multiple GPUs in Solving QAPs with CUDA

In this paper, we implement ACO algorithms on a PC which has 4 GTX 480 GPUs. We implement two types of ACO models; the island model, and the other is the master/slave model. When we compare the island model and the master/slave model, the island model shows promising speedup values on class (iv) QAP instances. […]

CUDA

Jun, 21

Continuous Representation of Projected Attribute Spaces of Multifields over Any Spatial Sampling

For the visual analysis of multidimensional data, dimension reduction methods are commonly used to project to a lower-dimensional visual space. In the context of multifields, i.e., volume data with a multidimensional attribute space, the spatial arrangement of the samples in the volumetric domain can be exploited to generate a Continuous Representation of the Projected Attribute […]

CUDA

Jun, 19

Parallel Algorithms for Hybrid Multi-core CPU-GPU Implementations of Component Labelling in Critical Phase Models

Optimising the use of all the cores of a hybrid multi-core CPU and its accelerating GPUs is becoming increasingly important as such combined systems become widely available. We show how a complex interplay of cross-calling kernels and host components can be used to support good throughput performance on hybrid simulation tasks that have inherently serial […]

CUDA

Jun, 19

Deep learning with COTS HPC systems

Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and […]

CUDA

Jun, 19

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory […]

CUDA

Jun, 19

Real-Time Geometry Decompression on Graphics Hardware

Real-Time Computer Graphics focuses on generating images fast enough to cause the illusion of a continuous motion. It is used in science, engineering, computer games, image processing, and design. Special purpose graphics hardware, a so-called graphics processing unit (GPU), accelerates the image generation process substantially. Therefore, GPUs have become indispensable tools for Real-Time Computer Graphics. […]

CUDA

Jun, 19

Parallel Asynchronous Modelization and Execution of Cholesky Algorithm using Petri Nets

Parallelization of algorithms with hard data dependencies has a lack of task synchronization. Synchronous parallel versions are simple to model and program, but inefficient in terms of scalability and processors use rate. The same problem for or Asynchronous versions with elemental static task scheduling. Efficient Asynchronous algorithms implements out of order execution and are complex […]

CUDA

Jun, 18

GPU Matrix Multiplication

Graphics Processing Units (GPUs) were developed originally to meet the computational needs of algorithms for rendering computer graphics. The rapid and enormous growth in sophistication of graphics applications such as computer games has resulted in the availability of GPUs that have hundreds of processors and peak performance near a teraflop and that sell for hundreds […]

CUDA

Jun, 18

Sorting On A Graphics Processing Unit (GPU)

One of the very first GPU sorting algorithms, an adaptation of bitonic sort, was developed by Govindraju et al. [12]. Since this algorithm was developed before the advent of CUDA, the algorithm was implemented using GPU pixel shaders. Zachmann et al. [13] improved on this sort algorithm by using BitonicT rees to reduce the number […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

libCudaOptimize: an Open Source Library of GPU-based Metaheuristics

CFMDS: CUDA-based fast multidimensional scaling for genome-scale data

Artificial Neural Network Simulation on CUDA

On the Effect of Using Multiple GPUs in Solving QAPs with CUDA

Continuous Representation of Projected Attribute Spaces of Multifields over Any Spatial Sampling

Parallel Algorithms for Hybrid Multi-core CPU-GPU Implementations of Component Labelling in Critical Phase Models

Deep learning with COTS HPC systems

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

Real-Time Geometry Decompression on Graphics Hardware

Parallel Asynchronous Modelization and Execution of Cholesky Algorithm using Petri Nets

GPU Matrix Multiplication

Sorting On A Graphics Processing Unit (GPU)

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)