high performance computing on graphics processing units: hgpu.org

Posts

Aug, 18

RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices

This project presents a library that automates the parallelisation of several higherorder functions, originally provided within the Ruby standard-library. The library distributes computation across many compute-units, following an annotation specifying that primitives are solely operating on numerical data. RubiCL harnesses the OpenCL framework in order to allow execution to occur on CPU or GPU devices. […]

OpenCL

Aug, 18

Optimizing OpenCL Local Work Group Size With Machine Learning

GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the […]

OpenCL

Aug, 14

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than today’s systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular control flow and memory access patterns. However, the growing complexity, exposed memory hierarchy, incoherence, heterogeneity, and […]

CUDA

Aug, 14

MPC: A Massively Parallel Compression Algorithm for Scientific Data

Due to their high peak performance and energy efficiency, massively parallel accelerators such as GPUs are quickly spreading in high-performance computing, where large amounts of floating-point data are processed, transferred, and stored. Such environments can greatly benefit from data compression if done sufficiently quickly. Unfortunately, most conventional compression algorithms are unsuitable for highly parallel execution. […]

CUDA

Aug, 14

Bufferless NOC Simulation of Large Multicore System on GPU Hardware

Last level cache management and core interconnection network play important roles in performance and power consumption in multicore system. Large scale chip multicore uses mesh interconnect widely due to scalability and simplicity of the mesh interconnection design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing […]

CUDA

Aug, 14

Automatic classification of object code using machine learning

Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, […]

CUDA

Aug, 14

Processing Markov Logic Networks with GPUs

Graphics Processing Units (GPUs) are being widely used to improve performance of machine learning and logic programming systems. Next, we propose using this technique to improve the performance of Markov logic programs. In this paper we focus on the first step of the inference phase, the grounding of first-order logical formulas composing a Markov network. […]

CUDA

Aug, 13

An Introduction to High Performance Computing on AWS

This paper describes a range of high performance computing (HPC) applications that are running today on Amazon Web Services (AWS). You will learn best practices for cloud deployment, for cluster and job management, and for the management of third-party software. This whitepaper covers HPC use cases that include highly distributed, highly parallel grid computing applications, […]

CUDA

•

OpenCL

Aug, 13

Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field

Computation power to simulate sound fields from the three-dimensional numerical models has progressed fast; for example, using GPU cluster systems. We can render directivity, position, distance, and reverberation of sound sources in a practical time. Furthermore, a multichannel sound field system can be realized with low-cost digital-to-analog converter modules. Moreover, some researchers are trying to […]

CUDA

Aug, 13

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

Image restoration is a long-standing problem in low-level computer vision with many interesting applications. We describe a flexible learning framework to obtain simple but effective models for various image restoration problems. The proposed approach is based on the concept of nonlinear reaction diffusion, but we extend conventional nonlinear reaction diffusion models by highly parametrized linear […]

CUDA

Aug, 12

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

‘How can GPU acceleration be obtained as a service in a cluster?’ This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), […]

CUDA

Aug, 12

Accelerating IISPH: A Parallel GPGPU Solution Using CUDA

CONTEXT: Simulating realistic fluid behavior in incompressible fluids for computer graphics has been pioneered with the implicit incompressible smoothed particle hydrodynamics (IISPH) solver. The algorithm converges faster than other incompressible SPH-solvers, but real-time performance (in the perspective of video games, 30 frames per second) is still an issue when the particle count increases. OBJECTIVES: This […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices

Optimizing OpenCL Local Work Group Size With Machine Learning

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

MPC: A Massively Parallel Compression Algorithm for Scientific Data

Bufferless NOC Simulation of Large Multicore System on GPU Hardware

Automatic classification of object code using machine learning

Processing Markov Logic Networks with GPUs

An Introduction to High Performance Computing on AWS

Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

Accelerating IISPH: A Parallel GPGPU Solution Using CUDA

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)