high performance computing on graphics processing units: hgpu.org

Posts

Dec, 1

Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems

In this paper we present research on applying a domain specific high-level abstractions (HLA) development strategy with the aim to "future-proof" a key class of high performance computing (HPC) applications that simulate hydrodynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block […]

CUDA

•

OpenCL

Dec, 1

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

Python has gained a lot of attention by the high performance computing community as an easy-to-use, elegant scripting language for rapid prototyping and development of flexible software. At the same time, there is an ever-growing need for more compute power to satisfy the demand for higher accuracy simulation or more detailed modeling. The Intel Xeon […]

Nov, 29

A Framework for Composing High-Performance OpenCL from Python Descriptions

Parallel processors have become ubiquitous; most programmers today have access to parallel hardware such as multi-core processors and graphics processors. This has created an implementation gap, where efficiency programmers with knowledge of hardware details can attain high performance by exploiting parallel hardware, while productivity programmers with application-level knowledge may not understand low-level performance trade-offs. Ideally, […]

OpenCL

Nov, 29

Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS

The aim of OpenMP which is a well known shared memory programming API, is using shared memory multiprocessor programming with pragma directives easily. Up till now, its interface consisted of task and iteration level parallelism for general purpose CPU. However OpenMP includes in its latest 4.0 specification the accelerator model. OmpSs is an OpenMP extended […]

CUDA

•

OpenCL

Nov, 29

A CUDA implementation of the High Performance Conjugate Gradient benchmark

The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains the computational and communication […]

CUDA

Nov, 29

Runtime Comparison of CPU and GPU Using Portable Programming Models

Since increasing clock speeds are not enough to speed up computation, there exist several alternative options. One of them is parallelism. For some problems it is possible to use the graphics processor as a massive parallel system and gain high speedups. Since NVIDIA introduced the unified device architecture and AMD switched to the OpenCL programming […]

CUDA

•

OpenCL

Nov, 29

Parallel kNN on GPU Architecture Using OpenCL

In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the […]

OpenCL

Nov, 25

4th International Conference on Software and Computer Applications, ICSCA 2015

Submission Deadline: 2015-04-10 Topics: Software Engineering AI and Knowledge based software engineering Artificial Intelligence Aspect-orientation and feature interaction Business Process Reengineering & Science Communication Systems and Networks Component-Based Software Engineering Computer & Software Engineering Computer Animation and Design Contents Computer Game Development, User Modeling and Management Computer supported cooperative work Cost Modeling and Analysis Data […]

Nov, 25

Improving GPU Performance by Regrouping CPU-Memory Data

In order to fast effective analysis of large complex systems, high-performance computing is essential. NVIDIA Compute Unified Device Architecture (CUDA)-assisted central processing unit (CPU) / graphics processing unit (GPU) computing platform has proven its potential to be used in high-performance computing. In CPU/GPU computing, original data and instructions are copied from CPU main memory to […]

CUDA

Nov, 25

A Self-Optimizing Framework for Developing Metrology Software on Massive Parallel Processor Architectures

Standard PC hardware rapidly increases in parallel computing power in form of multicore CPUs and general purpose GPUs. To take advantage of this situation it is necessary to create specialized code. This is a very time consuming and therefore an expensive task. One approach on solving this problem is the OpenCL (Open Computing Language) standard. […]

OpenCL

Nov, 25

Anisotropic interfacial tension, contact angles, and line tensions: A graphics-processing-unit-based Monte Carlo study of the Ising model

As a generic example for crystals where the crystal-fluid interface tension depends on the orientation of the interface relative to the crystal lattice axes, the nearest neighbor Ising model on the simple cubic lattice is studied over a wide temperature range, both above and below the roughening transition temperature. Using a thin film geometry $L_x […]

CUDA

•

OpenCL

Nov, 25

Ageing at the Spin-Glass/Ferromagnet Transition: Monte Carlo Simulation using GPUs

We study the the non-equilibrium ageing behaviour of the +/-J Edwards-Anderson model in three dimensions for samples of size up to N=128^3 and for up to 10^8 Monte Carlo sweeps. In particular we are interested in the change of the ageing when crossing from the spin-glass phase to the ferromagnetic phase. The necessary long simulation […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

A Framework for Composing High-Performance OpenCL from Python Descriptions

Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS

A CUDA implementation of the High Performance Conjugate Gradient benchmark

Runtime Comparison of CPU and GPU Using Portable Programming Models

Parallel kNN on GPU Architecture Using OpenCL

4th International Conference on Software and Computer Applications, ICSCA 2015

Improving GPU Performance by Regrouping CPU-Memory Data

A Self-Optimizing Framework for Developing Metrology Software on Massive Parallel Processor Architectures

Anisotropic interfacial tension, contact angles, and line tensions: A graphics-processing-unit-based Monte Carlo study of the Ising model

Ageing at the Spin-Glass/Ferromagnet Transition: Monte Carlo Simulation using GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)