high performance computing on graphics processing units: hgpu.org

Posts

Aug, 23

Estimation of Skin Optical Parameters for Real-Time Hyperspectral Imaging Applications using GPGPU Parallel Computing

Hyperspectral imaging with a high spatial and spectral resolution can be used to analyze materials using spectroscopic methods. This can be applied on skin as a general purpose real-time diagnostic tool. Light transport models, like the diffusion model, can describe the light propagation in tissue before the light is captured by the hyperspectral camera. The […]

CUDA

Aug, 23

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This […]

CUDA

Aug, 23

A Shader Library for OpenGL 4 and GLSL 4.3 Learning and Development

In the past decades, besides experiencing a huge development in terms of computation speed, we have also experienced the emergence of the Programmable GPU, giving birth to languages like GLSL and CUDA. This technology gives great flexibility for the usage of such a powerful hardware, attracting the interest of many researchers and programmers to this […]

OpenGL

Aug, 23

Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation

Physical system models that consist of thousands of ordinary differential equations can be synthesized to field-programmable gate arrays (FPGAs) for highly-parallelized, real-time physical system emulation. Previous work introduced synthesis of custom networks of homogeneous processing elements, consisting of processing elements that are either all general differential equation solvers or are all custom solvers tailored to […]

CUDA

Aug, 23

Median Based Parallel Steering Kernel Regression for Image Reconstruction

Image reconstruction is a process of obtaining the original image from corrupted data. Applications of image reconstruction include Computer Tomography, radar imaging, weather forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm […]

CUDA

Aug, 23

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

With the rapid growth of Internet, E-mail, with its convenient and efficient characteristics, has become an important means of communication in people’s life. It reduces the cost of communication. It comes with Spam. Spam emails, also known as "junk e-mails", are unsolicited one’s sent in bulk with hidden or forged identity of the sender, address, […]

OpenCL

Aug, 21

Comparative Analysis of OpenACC, OpenMP and CUDA using Sequential and Parallel Algorithms

With the increased processing required in the last years, and the search for devices with better performance, started in computing a need to parallelize processing, making it possible to support the performance of software and algorithms requiring high processing pattern. It’s possible to use the processing power of devices like the GPU to run parallel […]

CUDA

Aug, 21

Comparison of Cilk, Kaapi and CUDA for the Jacobi Method

While multi-core architectures allow for high performance gains, the use of a specialized framework is often required to maximize the use of the available hardware. This paper compares the performance of three different frameworks, Cilk, Kaapi and CUDA, in implementing the Jacobi method.

CUDA

Aug, 21

Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS

GPU acceleration is a promising approach to speed up query processing of database systems by using low cost graphic processors as coprocessors. Two major trends have emerged in this area: (1) The development of frameworks for scheduling tasks in heterogeneous CPU/GPU platforms, which is mainly in the context of coprocessing for applications and does not […]

Aug, 21

Parallel Voronoi Diagram computation on scaled distance planes using CUDA

Voronoi diagrams are fundamental data structures in computational geometry with several applications on different fields inside and outside computer science. This paper shows a CUDA algorithm to compute Voronoi diagrams on a 2D image where the distance between points cannot be directly computed in the euclidean plane. The proposed method extends an existing Dijkstra-based GPU […]

CUDA

Aug, 21

Accurate Analytic Models to Estimate Execution Time on GPU Applications

Today top ranked HPC systems feature several GPUs which present high processing speed at low power budget with various parallel applications. Many scientific applications still claim for even more computing speed than the available today. A general approach to provide more processing speed is to scale the system. However, aspects such as interference, the amount […]

CUDA

Aug, 20

Efficient Heterogeneous Execution on Large Multicore and Accelerator Platforms: Case Study Using a Block Tridiagonal Solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Estimation of Skin Optical Parameters for Real-Time Hyperspectral Imaging Applications using GPGPU Parallel Computing

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

A Shader Library for OpenGL 4 and GLSL 4.3 Learning and Development

Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation

Median Based Parallel Steering Kernel Regression for Image Reconstruction

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

Comparative Analysis of OpenACC, OpenMP and CUDA using Sequential and Parallel Algorithms

Comparison of Cilk, Kaapi and CUDA for the Jacobi Method

Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS

Parallel Voronoi Diagram computation on scaled distance planes using CUDA

Accurate Analytic Models to Estimate Execution Time on GPU Applications

Efficient Heterogeneous Execution on Large Multicore and Accelerator Platforms: Case Study Using a Block Tridiagonal Solver

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)