high performance computing on graphics processing units: hgpu.org

Posts

Aug, 22

Accelerating Haskell array codes with multicore GPUs

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge. To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of […]

CUDA

Aug, 22

A programming model for GPU-based parallel computing with scalability and abstraction

In this paper, we present a multi-level programming model for recent GPU-based high performance computing systems. Involving cooperative stream threads and symmetric multiprocessing threads our model gives a computational framework that scales through multi-GPU environments to GPU-cluster systems. Instead of hiding the execution environment from the programmer using compiler extensions or metaprogramming techniques we aim […]

CUDA

Aug, 21

Compiling Python to a hybrid execution environment

A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. […]

OpenCL

Aug, 21

A declarative API for particle systems

Recent trends in computer-graphics APIs and hardware have made it practical to use high-level functional languages for real-time graphics applications. Thus we have the opportunity to develop new approaches to computer graphics that take advantage of the high-level features of functional languages. This paper describes one such project that uses the techniques of functional programming […]

OpenCL

Aug, 21

Software architecture and system validation of an open, unified model for accelerated multicore computing

For systems that use hardware accelerators to combine multicore and multiprocess technology with libraries and computational kernels, the drawbacks are the complexity of the programming model and the corresponding verification of the software and validation of the system performance capabilities. In this paper, we describe a software approach to utilizing the compute power of the […]

OpenCL

Aug, 21

Mind the gap!: bridging the dichotomy of design and implementation

This paper presents a revamping of a sparse linear algebra design pattern, targeting parallelization within scientific and engineering applications. A proof of concept implementation is developed to compare actual software practices and optimizations with those described in the original design pattern. The case study reveals that the design pattern did not tightly coincide with the […]

OpenCL

Aug, 21

EpiGPU

MOTIVATION: Hundreds of genome-wide association studies have been performed over the last decade, but as single nucleotide polymorphism (SNP) chip density has increased so has the computational burden to search for epistasis [for n SNPs the computational time resource is O(n(n-1)/2)]. While the theoretical contribution of epistasis toward phenotypes of medical and economic importance is […]

OpenCL

Aug, 21

Visual Computing in Biology and Medicine: Interactive visual analysis of contrast-enhanced ultrasound data based on small neighborhood statistics

Contrast-enhanced ultrasound (CEUS) has recently become an important technology for lesion detection and characterization in cancer diagnosis. CEUS is used to investigate the perfusion kinetics in tissue over time, which relates to tissue vascularization. In this paper we present a pipeline that enables interactive visual exploration and semi-automatic segmentation and classification of CEUS data. For […]

OpenCL

Aug, 21

Reducing data access latency in SDSM systems using runtime optimizations

Software Distributed Shared Memory (SDSM) systems offer a convenient way to run applications developed for shared memory systems on distributed systems with no changes to them. However, since SDSM systems add an extra layer of abstraction to the memory hierarchy, applications may suffer performance problems when running on top of them. Our main research interest […]

OpenCL

Aug, 21

A new method for GPU based irregular reductions and its application to k-means clustering

A frequently used method of clustering is a technique called k-means clustering. The k-means algorithm consists of two steps: A map step, which is simple to execute on a GPU, and a reduce step, which is more problematic. Previous researchers have used a hybrid approach in which the map step is computed on the GPU […]

OpenCL

Aug, 21

Multi- and many-core data mining with adaptive sparse grids

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid […]

OpenCL

Aug, 21

Sponge: portable stream programming on graphics engines

Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL, makes GPU programming attractive to a wide variety of programmers. However, programming GPUs is still a cumbersome task for two primary reasons: tedious performance optimizations and lack of portability. First, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating Haskell array codes with multicore GPUs

A programming model for GPU-based parallel computing with scalability and abstraction

Compiling Python to a hybrid execution environment

A declarative API for particle systems

Software architecture and system validation of an open, unified model for accelerated multicore computing

Mind the gap!: bridging the dichotomy of design and implementation

EpiGPU

Visual Computing in Biology and Medicine: Interactive visual analysis of contrast-enhanced ultrasound data based on small neighborhood statistics

Reducing data access latency in SDSM systems using runtime optimizations

A new method for GPU based irregular reductions and its application to k-means clustering

Multi- and many-core data mining with adaptive sparse grids

Sponge: portable stream programming on graphics engines

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)