high performance computing on graphics processing units: hgpu.org

Posts

Aug, 1

Matrix Convolution using Parallel Programming

The convolution theorem is used to multiply matrices of two different sizes i.e. matrices in which the number of rows in the first matrix is not equal to the number of columns in the second matrix. In this study, the multiplication of 3*3 and 4*4 matrices was done using MPI. A 3*3 matrix was taken […]

OpenCL

Jul, 12

A GPGPU-based Pipeline for Accelerated Rendering of Point Clouds

Direct rendering of large point clouds has become common practice in architecture and archaeology in recent years. Due to the high point density no meshes are reconstructed from the scanning data, but the points can be rendered directly as primitives of a graphics API like OpenGL. However, these APIs and the hardware, which they are […]

OpenCL

•

OpenGL

Jul, 12

SIMD Divergence Optimization through Intra-Warp Compaction

SIMD execution units in GPUs are increasingly used for high performance and energy efficient acceleration of general purpose applications. However, SIMD control flow divergence effects can result in reduced execution efficiency in a class of GPGPU applications, classified as divergent applications. Improving SIMD efficiency, therefore, has the potential to bring significant performance and energy benefits […]

OpenCL

•

OpenGL

Jul, 7

CrowdCL: Web-Based Volunteer Computing with WebCL

We present CrowdCL, an open-source framework for the rapid development of volunteer computing and OpenCL applications on the web. Drawing inspiration from existing GPU libraries like PyCUDA, CrowdCL provides an abstraction layer for WebCL aimed at reducing boilerplate and improving code readability. CrowdCL also provides developers with a framework to easily run computations in the […]

OpenCL

Jul, 7

Comparative study of parallel programming models for multicore computing

Shared memory multi-core processor technology has seen a drastic development with faster and increasing number of processors per chip. This new architecture challenges computer programmers to write code that scales over these many cores to exploit full computational power of these machines. Shared-memory parallel programming paradigms such as OpenMP and Intel Threading Building Blocks (TBB) […]

OpenCL

Jul, 5

Triangular mesh simplification on the GPU

We present a simplification algorithm for triangular meshes, implemented on the GPU. The algorithm performs edge collapses driven by a quadric error metric. It uses data parallelism as provided by OpenCL and has no sequential segments in its main iterative structure in order to fully exploit the processing power of the GPU. Our implementation produces […]

OpenCL

Jul, 2

CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU

In rolling of steel into thin sheets the final step is the cooling of the finished product on the Runout Table. In this thesis, the heat transfer into a water jet impinging on a hot flat steel plate was studied as the key cooling process on the runout table. The temperature of the plate was […]

OpenCL

Jul, 1

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, […]

OpenCL

Jun, 29

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms’ performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described […]

CUDA

•

OpenCL

Jun, 29

Efficient computation of constrained parameterizations on parallel platforms

Constrained isometric planar parameterizations are central to a broad spectrum of applications. In this work, we present a non linear solver developed on OpenCL that is efficiently parallelizable on modern massively parallel architectures. We establish how parameterization relates to mesh smoothing and show how to ciently and robustly solve the planar mesh parameterization problem with […]

OpenCL

Jun, 24

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due […]

CUDA

•

OpenCL

Jun, 21

Parallel Language Programming In Different Platforms

The need to speed-up computing has introduced the interest to explore parallelism in algorithms and parallel programming. Technology is evolving fast but computing power in sequential execution is not increasing as much as earlier but CPUs contain more and more parallel computing resources. However, parallel algorithms may not be able to exploit all the parallelism […]

CUDA

•

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Matrix Convolution using Parallel Programming

A GPGPU-based Pipeline for Accelerated Rendering of Point Clouds

SIMD Divergence Optimization through Intra-Warp Compaction

CrowdCL: Web-Based Volunteer Computing with WebCL

Comparative study of parallel programming models for multicore computing

Triangular mesh simplification on the GPU

CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

Efficient computation of constrained parameterizations on parallel platforms

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Parallel Language Programming In Different Platforms

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)