high performance computing on graphics processing units: hgpu.org

Posts

Apr, 21

Genetic Algorithm Modeling with GPU Parallel Computing Technology

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. […]

CUDA

Apr, 21

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

With rapidly improving computational power numerical models are being developed for ever more complex problems that cannot be solved analytically, making them more and more computationally intensive. Parallel computing has emerged as an important paradigm to speed up the processing of such models. In recent years graphics processing units (GPU) are among the massively parallel […]

CUDA

Apr, 19

24th International Conference on Parallel Computational Fluid Dynamics, ParCFD2012

Parallel Computational Fluid Dynamics (ParCFD) Conference 2012 is the 24th of series of annual international meetings since 1989 dedicated to the discussion of most recent developments and applications of parallel computing in the field of CFD and related disciplines. ParCFD conferences are truly multi-cultural and international attracting many researchers across the globe with diverse technical […]

Apr, 19

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

General-Purpose computing on GPUs (GPGPU) provides the opportunity to utilize the tremendous computational power of graphics accelerators for a wider set of problems. These devices leverage massive parallelism to achieve high performance, however, creating highly parallelized code which is optimized for the characteristics of GPUs is no simple task. The polyhedron model is used successfully […]

CUDA

Apr, 19

GPU-Accelerated Numerical Simulations of the Knudsen Gas on Time-Dependent Domains

We consider the long-time behaviour of a free-molecular gas in a time-dependent vessel with absorbing boundary, in any space dimension. We first show, at the theoretical level, that the convergence towards equilibrium heavily depends on the initial data and on the time evolution law of the vessel. Subsequently, we describe a numerical strategy to simulate […]

CUDA

Apr, 19

Algorithm Construction for GPGPU

Today every personal computer and almost every work-related computer has a GPU powerful enough to be used as a supplementary computational device. One framework which enables utilization of this is called OpenCL. We asked the question how one writes efficient algorithms on these GPGPU devices. We found that there are two major ways to run […]

OpenCL

Apr, 19

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design […]

CUDA

Apr, 19

GPU computing in medical physics: A review

The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization […]

CUDA

Apr, 18

Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs

In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. As examples, we develop single-precision CUDA kernels for […]

CUDA

Apr, 18

Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and […]

CUDA

•

OpenCL

Apr, 18

MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL

With demand for high-performance computing at an all-time high, especially from the scientific/numerical analysis community, leveraging the power of existing heterogeneous architectures has become increasingly desirable. The attempt to use GPUs for non-graphics computations has bred programming models and innovative architectures that have trended towards a general-purpose computing platform. The latest generation of programming tools […]

OpenCL

Apr, 18

OpenCL vs. OpenMP: A Programmability Debate

OpenCL and OpenMP are the most commonly used programming models for homogeneous multi-core processors. They are also fundamentally different in their approach to parallelization, in terms of granularity level, explicit/implicit constructs, and usability. In this paper, we compare these two models in terms of programmability, with a special focus on performance and productivity. For our […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Genetic Algorithm Modeling with GPU Parallel Computing Technology

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

24th International Conference on Parallel Computational Fluid Dynamics, ParCFD2012

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

GPU-Accelerated Numerical Simulations of the Knudsen Gas on Time-Dependent Domains

Algorithm Construction for GPGPU

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

GPU computing in medical physics: A review

Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs

Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL

OpenCL vs. OpenMP: A Programmability Debate

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)