high performance computing on graphics processing units: hgpu.org

Posts

Jun, 8

Dynamic load balancing on heterogeneous multicore/multiGPU systems

Parallel computing in heterogeneous environments is drawing considerable attention due to the growing number of these kind of systems. Adapting existing code and libraries to such systems is a fundamental problem. The performance of this code is affected by the large interdependence between the code and these parallel architectures. We have developed a dynamic load […]

Jun, 8

Research on Three-Dimensional Playing Video Technology in Virtual Education Environment

The paper intensively analyzed and researched video playing technology in virtual education environment, which is an application of Pervasive Computing focusing on demand for some important application field with visual technology, proposed a suit of effective settlement making use of decoding, data compressing, texture mapping and the GPU’s parallel computing ability which improved the efficiency […]

Jun, 8

Stress Tensor Field Visualization for Implant Planning in Orthopedics

We demonstrate the application of advanced 3D visualization techniques to determine the optimal implant design and position in hip joint replacement planning. Our methods take as input the physiological stress distribution inside a patient’s bone under load and the stress distribution inside this bone under the same load after a simulated replacement surgery. The visualization […]

Jun, 8

Realtime Ray Tracing on a Hibrid Parallel Architecture

Octrees are attractive data structures for rendering of volumes, as they provide simultaneously uniform and hierarchical data encapsulation. We present a simple and efficient algorithm for interactive ray tracing on a hybrid architectures, which takes advantage of the parallelism present by heavily exploiting the hardware for the CPU and GPU, which has been observed to […]

Jun, 8

Hybrid Embarrassingly Parallel on heterogeneous platform

The Embarrassingly Parallel (EP) is one kernel benchmark of NAS Parallel Benchmarks (NPB). EP generates pairs of Gaussian Random Deviates (GRDs) of large random numbers which produced by Linear Congruential Generator (LCG). In this paper, the Hybrid EP is efficient implemented on CPU/GPU heterogeneous platform. Experimental results show that the Hybrid EP is 11.98 times […]

Jun, 8

Parallel rate-distortion optimized intra mode decision on multi-core graphics processors using greedy-based encoding orders

Rate-distortion (RD) optimized intra-prediction mode selection can lead to significant improvement in coding efficiency in intra-frame encoding. However, it would incur considerable increase in encoding complexity. In this paper, we investigate how multi-core Graphics Processing Units (GPUs) can be efficiently utilized to undertake the task of RD optimized intra mode selection in AVS and H.264 […]

OpenGL

Jun, 8

Parallel accelerators for GlimmerHMM bioinformatics algorithm

In the last decades there is an exponential growth in the amount of genomic data that need to be analyzed. A very important problem in biology is the extraction of the biologically functional genomic DNA from the actual genome of the organisms. There have been proposed many computational biology algorithms that solve the gene finding […]

Jun, 7

Tuning Manifold Harmonics Filters

There are several techniques for automatic music visualization, which are included with virtually any media player. The basic ingredient of those techniques is spectral analysis of the sound, used to automatically generate parameters for procedural image generation. However, only a few music visualizations rely on 3D models. This paper proposes to use spectral mesh processing […]

OpenGL

Jun, 7

Fast, Processor-Cardinality Agnostic PRNG with a Tracking Application

As vision algorithms mature with increasing inspiration from the learning community, statistically independent pseudo random number generation (PRNG) becomes increasingly important. At the same time, execution time demands have seen algorithms being implemented on evolving parallel hardware such as GPUs. The Mersenne Twister (MT) has proven to be the current state of the art for […]

CUDA

Jun, 7

SRP Based Natural Interaction between Real and Virtual Worlds in Augmented Reality

In current video see-through augmented reality system, natural interaction between real and virtual worlds is impossible. We analyze its cause and propose a new mode which is named mix before projection to combine real and virtual worlds for AR. Under this framework, we develop a tool named space registration particle (SRP) and speed multi-pass GPU […]

Jun, 7

A new parallel video understanding and retrieval system

In this paper, a hybrid parallel computing framework is proposed for video understanding and retrieval. It is a unified computing architecture based on the Map-Reduce programming model, which supports multi-core and GPU architectures. A key task scheduler is designed for the parallelization of computation tasks. The SVM method is used to train models for video […]

Jun, 7

Tangible video teleconference system using real-time image-based relighting

This paper deals with a real-time image based relighting system for tangible video teleconference. The proposed image based relighting system renders the extracted human object using the virtual environmental images. The proposed system can homogenize virtually the lighting environments of remote users on the video teleconference, or render the humans like they are in the […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Dynamic load balancing on heterogeneous multicore/multiGPU systems

Research on Three-Dimensional Playing Video Technology in Virtual Education Environment

Stress Tensor Field Visualization for Implant Planning in Orthopedics

Realtime Ray Tracing on a Hibrid Parallel Architecture

Hybrid Embarrassingly Parallel on heterogeneous platform

Parallel rate-distortion optimized intra mode decision on multi-core graphics processors using greedy-based encoding orders

Parallel accelerators for GlimmerHMM bioinformatics algorithm

Tuning Manifold Harmonics Filters

Fast, Processor-Cardinality Agnostic PRNG with a Tracking Application

SRP Based Natural Interaction between Real and Virtual Worlds in Augmented Reality

A new parallel video understanding and retrieval system

Tangible video teleconference system using real-time image-based relighting

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)