high performance computing on graphics processing units: hgpu.org

Posts

Oct, 2

Exploiting Limited Access Distance of ODE Systems for Parallelism and Locality in Explicit Methods

The solution of initial value problems of large systems of ordinary differential equations (ODEs) is computationally intensive and demands for efficient parallel solution techniques that take into account the complex architectures of modern parallel computer systems. This article discusses implementation techniques suitable for ODE systems with a special coupling structure, called limited access distance, which […]

OpenCL

Oct, 2

Parallelizing LINQ Program for GPGPU

Recent technologies have brought parallel infrastructure to general users. Nowa-days parallel infrastructure is available in PC’s and personal laptops. Now single core machines have became history. Even multi-core technologies are replaced by GPGPUs when it comes to high performance computing because GPGPUs are giv-ing many cores at low cost. Sequential programs of the past are […]

CUDA

Oct, 2

Multi2Sim: a simulation framework for CPU-GPU computing

Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables […]

OpenCL

Oct, 1

Parallel Application Library for Object Recognition

Computer vision research enables machines to understand the world. Humans usually interpret and analyze the world through what they see – the objects they capture with their eyes. Similarly, machines can better understand the world by recognizing objects in images. Object recognition is therefore a major branch of computer vision. To achieve the highest accuracy, […]

OpenCL

Oct, 1

Accelerated Pressure Projection using OpenCL on GPUs

A GPU version of the pressure projection solver using OpenCL is implemented. Then it has been compared with CPU version which is accelerated with OpenMP. The GPU version shows a sensible reduction in time despite using a simple algorithm in the kernel. The nal code is plugged into a commercial uid simulator software. Dierent kinds […]

OpenCL

Oct, 1

GPGPU Accelerated Texture-Based Radiosity Calculation

Radiosity is a popular global illumination algorithm capable of achieving photorealistic rendering results. However, its use in interactive environments is limited by its computational complexity. This paper presents a GPGPU-based implementation of the gathering radiosity approach using texture-based discretisation and the OpenCL framework. Hemicubes are rendered to a texture array and are processed by OpenCL […]

OpenCL

•

OpenGL

Oct, 1

Compute Distance Matrices with GPU

Given a data matrix where the rows are objects and the columns are variables, researchers often want to compute all the pairwise distances among the objects. Due to the design of Nvidia GPU architecture, CUDA code can work with ease data matrices where the numbers of rows and columns are multiples of sixteen. The present […]

CUDA

Oct, 1

Synthesizing Structured Traversals from Attribute Grammars

We examine how to automatically decompose a program into structured parallel traversals over trees. In our system, programs are declaratively specified as attribute grammars and parallel traversals are defined by a compiler designed to optimize them for both GPUs and multicore CPUs. Our synthesizer automatically finds a correct schedule of the attribute grammar as structured […]

OpenCL

Sep, 30

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs

As the prevalence of general purpose computations on GPU, shared memory programming models were proposed to ease the pain of GPU programming. However, with the demanding needs of more intensive workloads, it’s desirable to port GPU programs to more scalable distributed memory environment, such as multi-GPUs. To achieve this, programs need to be re-written with […]

CUDA

Sep, 30

Nonperturbative Quantum Field Theory in Astrophysics

The extreme electromagnetic or gravitational fields associated with some astrophysical objects can give rise to macroscopic effects arising from the physics of the quantum vacuum. Therefore, these objects are incredible laboratories for exploring the physics of quantum field theories. In this dissertation, we explore this idea in three astrophysical scenarios.

CUDA

Sep, 30

ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations

Introduction of Graphical Processing Units (GPUs) and computing using GPUs in recent years opened possibilities for simple parallelization of programs. In this update, we present the modernized version of program ARVO [J. Busa, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorny, J. Skivanek, M.-C. Wu, Comput. Phys. Comm. 165 (2005) 59]. […]

OpenCL

Sep, 30

Real-Time Computer Vision with openCV

Computer vision is a rapidly growing field devoted to analyzing, modifying, and high-level understanding of images. Its objective is to determine what is happening in front of a camera and use that understanding to control a computer or robotic system, or to provide people with new images that are more informative.

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Exploiting Limited Access Distance of ODE Systems for Parallelism and Locality in Explicit Methods

Parallelizing LINQ Program for GPGPU

Multi2Sim: a simulation framework for CPU-GPU computing

Parallel Application Library for Object Recognition

Accelerated Pressure Projection using OpenCL on GPUs

GPGPU Accelerated Texture-Based Radiosity Calculation

Compute Distance Matrices with GPU

Synthesizing Structured Traversals from Attribute Grammars

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs

Nonperturbative Quantum Field Theory in Astrophysics

ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations

Real-Time Computer Vision with openCV

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)