high performance computing on graphics processing units: hgpu.org

Posts

May, 26

Fast and accurate digital signal processing realized with GPGPU technology

An idea of the so-called quasi-maximum accuracy computations for improvement of precision of the floating-point digital signal processing with graphic processing units (GPUs) is presented in this paper. In the presented approach, the increase of the precision of computations does not need any increase of the length of the data words. Special attention has been […]

CUDA

May, 26

Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA

In this paper we present a parallelized algorithm for edge detection for gray scale images. The chosen method is the local threshold and boolean function based edge detection. This method differs from common edge detectors in the use of bit map patterns instead of analyzing gradient changes in the image for edge recognition. The parallelization […]

CUDA

May, 25

The Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, PARENG2013

The conference will consider mathematical, computer science and engineering developments that impact on the use of HPC in engineering analysis, design, and simulation. Engineering is interpreted in its widest sense to include aeronautical, civil, mechanical, electrical, materials, bioengineering, geotechnical, structural and environmental fields. The range of topics considered by the Conference will include: The mathematical […]

May, 25

The 3rd International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering, 2012, GPU-SMP’ 2012

This international conference in Shenzhen will focus on understanding the potential usage of GPU and MIC from a computational scientific user point of view, particularly for multiscale problems in science on engineering. It brings together experts from China, Japan, and bordering Pacific countries such as the USA, Korea，Australia and Singapore. In addition to algorithmic research, […]

May, 25

Using Compute Unified Device Architecture (CUDA) in Parallelizing Different Digital Image Processing Techniques

Graphics Processing Units (GPUs) have been conventionally used in the acceleration of 2D, 3D graphics and video rendering. Because of its performance and capability, the GPU has evolved into a highly parallel programmable processor that specializes in memory bandwith utilization and intensive computation. For operations involving graphics, GPUs offer a lot of gigaflops of processing […]

CUDA

May, 25

On the Simulations of Evolution-Communication P Systems with Energy without Antiport Rules for GPUs

In this report, we present our initial proposal on simulating computations on a restricted variant of Evolution-Communication P system with energy (ECPe system) which will then be implemented in Graphics Processing Units (GPUs). This ECPe systems variant prohibits the use of antiport rules for communication. Several possible levels of parallelizations for simulating ECPe systems computations […]

CUDA

May, 25

Effective Sparse Matrix Representation for the GPU Architectures

General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the […]

CUDA

May, 25

Accelerating In-Memory Graph Database traversal using GPGPUS

The paper aims to provide a comparitive analysis on the performance of in memory databases as opposed to a customised graph database written ground up whose joins(searches) are performed on a GPGPU. This is done primarily to serve as a proof of concept on how databases that are represented as graphs can benefit by fostering […]

CUDA

May, 25

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional […]

CUDA

May, 24

Java on CUDA architecture

Traditional CPU is able to run only a few complex threads concurrently. On the other side, a GPU allows a concurrent execution of hundreds or thousands of simpler threads. The GPU was originally designed for a computer graphics, but nowadays it is being used for general-purpose calculations using a GPGPU technology. CUDA, one of the […]

CUDA

May, 24

Sparse direct solvers with accelerators over DAG runtimes

The current trend in the high performance computing shows a dramatic increase in the number of cores on the shared memory compute nodes. Algorithms, especially those related to linear algebra, need to be adapted to these new computer architectures in order to be efficient. PASTIX is a sparse parallel direct solver, that incorporates a dynamic […]

CUDA

May, 24

Tuning a Finite Difference Computation for Parallel Vector Processors

Current CPU and GPU architectures heavily use data and instruction parallelism at different levels. Floating point operations are organised in vector instructions of increasing vector length. For reasons of performance it is mandatory to use the vector instructions efficiently. Several ways of tuning a model problem finite difference stencil computation are discussed. The combination of […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast and accurate digital signal processing realized with GPGPU technology

Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA

The Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, PARENG2013

The 3rd International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering, 2012, GPU-SMP’ 2012

Using Compute Unified Device Architecture (CUDA) in Parallelizing Different Digital Image Processing Techniques

On the Simulations of Evolution-Communication P Systems with Energy without Antiport Rules for GPUs

Effective Sparse Matrix Representation for the GPU Architectures

Accelerating In-Memory Graph Database traversal using GPGPUS

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Java on CUDA architecture

Sparse direct solvers with accelerators over DAG runtimes

Tuning a Finite Difference Computation for Parallel Vector Processors

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)