high performance computing on graphics processing units: hgpu.org

Posts

Jun, 2

Integrated Modelling of Hydrodynamic Processes, Faecal Indicator Organisms and Related Parameters with Improved Accuracy using Parallel (GPU) Computing

Environmental problems and issues are not limited by artificial boundaries created by man. Usually there are different teams or individuals working on the catchments, estuaries, rivers and coastal basins in different countries using different parameters and formulations for various processes. However, the system is a natural one and as such no boundaries exist. When a […]

CUDA

Jun, 2

Accelerating NTRU based Homomorphic Encryption using GPUs

In this work we introduce a large polynomial arithmetic library optimized for Nvidia GPUs to support fully homomorphic encryption schemes. To realize the large polynomial arithmetic library we convert the polynomial with large coefficients using the Chinese Remainder Theorem into many polynomials with small coefficients, and then carry out modular multiplications in the residue space […]

CUDA

Jun, 2

Multi-target DPA attacks: Pushing DPA beyond the limits of a desktop computer

Following the pioneering CRYPTO ’99 paper by Kocher et al., differential power analysis (DPA) was initially geared around low-cost computations performed using standard desktop equipment with minimal reliance on device-specific assumptions. In subsequent years, the scope was broadened by, e.g., making explicit use of (approximate) power models. An important practical incentive of so-doing is to […]

OpenCL

Jun, 2

Region Templates: Data Representation and Management for Large-Scale Image Analysis

Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters […]

CUDA

Jun, 1

Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors

Anomaly detection is an important task for hyperspectral data exploitation. Although many algorithms have been developed for this purpose in recent years, due to the large dimensionality of hyperspectral image data, fast anomaly detection remains a challenging task. In this work, we exploit the computational power of commodity graphics processing units (GPUs) and multicore processors […]

CUDA

Jun, 1

A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms

Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. There is therefore an urgent […]

CUDA

Jun, 1

An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix

We present implementation details of a reordering strategy for permuting elements whose absolute value is large to the diagonal of a sparse matrix. This algorithm, based on work by Duff and Koster [9], is a critical component of the SPIKE-based preconditioner provided by the Spike::GPU library [2]. We discuss the four stages required to implement […]

CUDA

Jun, 1

Evaluating GPU Passthrough in Xen for High Performance Cloud Computing

With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their technical computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities clouds provide, as well as many novel computing paradigms available for data-intensive applications. However, there is […]

CUDA

Jun, 1

A CUDA-enabled Parallel Implementation of Collaborative Filtering

Collaborative filtering (CF) is one of the essential algorithms in recommendation system. Based on the performance analysis, two computational kernels are identified. In order to accelerate CF on large-scale data, a CUDA-enabled parallel CF approach is proposed where an efficient data partition scheme is proposed as well. Various optimization techniques are also applied to maximize […]

CUDA

May, 31

GPU Ray Tracing with CUDA

Ray tracing is a technique for rendering images in computer graphics by simulating how light rays interact with the virtual environment. By tracing the path of a light ray through a scene and emulating the effect of the ray as it intersects with virtual objects, the ray tracing algorithm can accurately portray reflections, refractions, shadows, […]

CUDA

May, 31

Parallel SAT solvers and their application in automatic parallelization

Since the slowdown in improvement in the frequency of processors, a new tendency has arisen to allow software to take advantage of faster hardware: parallelization. However, different from increasing the frequency of processors, using parallelization requires a different kind of programming, parallel programming, which is usually harder than common sequential programming. In this context, automatic […]

CUDA

May, 31

Fast parallel volume visualization on CUDA technology

In the medical diagnosis and treatment planning, radiologists and surgeons rely heavily on the slices produced by medical imaging scanners. Unfortunately, most of these scanners can only produce two dimensional images because the machines that can produce three dimensional are very expensive. The two dimensional images from these devices are difficult to interpret because they […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Integrated Modelling of Hydrodynamic Processes, Faecal Indicator Organisms and Related Parameters with Improved Accuracy using Parallel (GPU) Computing

Accelerating NTRU based Homomorphic Encryption using GPUs

Multi-target DPA attacks: Pushing DPA beyond the limits of a desktop computer

Region Templates: Data Representation and Management for Large-Scale Image Analysis

Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors

A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms

An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix

Evaluating GPU Passthrough in Xen for High Performance Cloud Computing

A CUDA-enabled Parallel Implementation of Collaborative Filtering

GPU Ray Tracing with CUDA

Parallel SAT solvers and their application in automatic parallelization

Fast parallel volume visualization on CUDA technology

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)