high performance computing on graphics processing units: hgpu.org

Posts

May, 13

Interaction and Visualization Techniques for Immersive Exploration and Perception of 3D datasets

The objective in this case is not only to be realistic, but also to provide new and intelligible ways of model representation. This raises new issues in data perception. The question of perception of complex data, especially regarding visual feedback, is an open question, and it is the subject of this work. This PhD thesis […]

OpenGL

May, 13

A GPU based real-time video compression method for video conferencing

Recent years have seen a great increase in the everyday use of real-time video communication over the internet through video conferencing applications. Limitations on computational resources and network bandwidth require video encoding algorithms that provide acceptable quality on low bitrates and can support various resolutions inside the same stream. In this work, the authors present […]

CUDA

May, 13

Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second

Object detection, and in particular pedestrian detection, is a challenging task, due to the wide variety of appearances. The application domain is extremely broad, ranging from e.g. surveillance to automotive safety systems. Many practical applications however often rely on stringent real-time processing speeds combined with high accuracy needs. These demands are contradictory, and usually a […]

CUDA

May, 13

Improving Synchronization and Data Access in Parallel Programming Models

Today, parallel architectures are the main vector for exploiting available die area. The shift from architectures tuned for sequential programming models to ones optimized for parallel processing follows from the inability of further enhance sequential performance due to power and memory walls. On the other hand, efficient exploitation of parallel computing units looks a hard […]

CUDA

•

OpenCL

May, 13

Programming for scientific computing on peta-scale heterogeneous parallel systems

Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer […]

CUDA

May, 11

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of […]

CUDA

May, 11

Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming

Computer Vision (CV) is a rapidly growing field, intent on enabling computers to process, analyze, and understand the information of images to produce structured information and/or make decisions. In recent years, interest in computer vision has grown in part as a result of both cheaper and more capable cameras, but also largely because of affordable […]

CUDA

•

OpenCL

May, 11

Parallel implementation of the wideband DOA algorithm on single core, multicore, GPU and IBM cell BE processor

The Multiple Signal Classification (MUSIC) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals impinging on an antenna array.The algorithm is serial based, mathematically intensive, and requires substantial computing power to realize in real-time.Recently, multi-core processors are becoming more prevalent and affordable.The challenge of adapting existing serial based algorithms to […]

CUDA

May, 11

Blum Blum Shub on the GPU

CONTEXT. The cryptographically secure pseudo-random number generator Blum Blum Shub (BBS) is a simple algorithm with a strong security proof, however it requires very large numbers to be secure, which makes it computationally heavy. The Graphics Processing Unit (GPU) is a common vector processor originally dedicated to computer-game graphics, but has since been adapted to […]

OpenCL

May, 11

The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection

Graphics Processing Unit (GPU) has been converted to general purpose parallel processor devices from a single rendering. It performed far better than the CPU in many fields of science. String matching is widely used, especially in information retrieval, intrusion detection, Computational Biology etc. In this paper, we designed and implemented a GPU-based multi-string matching algorithm […]

CUDA

May, 11

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety […]

OpenCL

May, 11

Real-Time Object Tracking by CUDA-accelerated Neural Network

An algorithm is proposed for tracking objects in real time. The algorithm is based on neural network implemented on GPU. Investigation and parameter optimization of the algorithm are realized. Tracking process has accelerated by 10 times and the training process has accelerated by 2 times versus to the sequential algorithm version. The maximum resolution of […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Interaction and Visualization Techniques for Immersive Exploration and Perception of 3D datasets

A GPU based real-time video compression method for video conferencing

Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second

Improving Synchronization and Data Access in Parallel Programming Models

Programming for scientific computing on peta-scale heterogeneous parallel systems

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming

Parallel implementation of the wideband DOA algorithm on single core, multicore, GPU and IBM cell BE processor

Blum Blum Shub on the GPU

The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

Real-Time Object Tracking by CUDA-accelerated Neural Network

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)