high performance computing on graphics processing units: hgpu.org

Posts

Jan, 15

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a […]

CUDA

•

OpenCL

Jan, 15

Reducing overheads of dynamic scheduling on heterogeneous chips

In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration of both devices in the execution of parallel workloads. In this work, we focus on the problem of efficiently scheduling […]

OpenCL

Jan, 15

Batched Matrix Computations on Hardware Accelerators Based on GPUs

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for effective approach to develop energy efficient, high-performance codes for these small matrix problems that we call […]

CUDA

Jan, 15

High Performance GPU-based Fourier Volume Rendering

FVR (Fourier volume rendering) is a significant visualization technique that has been used widely in digital radiography. As a results of its O(N^2logN) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are O(N^3) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of […]

CUDA

•

OpenGL

Jan, 15

International Conference on Signal Processing, ICOSP 2015

Topics： Adaptive Filtering & Signal Processing Ad-Hoc and Sensor Networks Analog and Mixed Signal Processing Biometrics & Authentification Biosignal Processing & Understanding Communication and Broadband Networks Communication Signal processing Computer Vision & Virtual Reality Cryptography and Network Security Design and Implementation of Signal Processing Systems Image and Multidimensional Signal Processing Image Processing & Understanding Machine […]

Jan, 13

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

In this paper we describe our performance-breakdown model for GPU programs. GPUs are a popular choice as accelerator hardware due to their high performance, high availability and relatively low price. However, writing programs that are highly efficient represents a difficult and time consuming task for programmers because of the complexities of GPU architecture and the […]

OpenCL

Jan, 13

Development of an Algorithm for Extracting Parallelism and Pipeline Structure from Stream-based Processing flow with Spanning Tree

It is a fashion to use the manycore accelerators to promote the computing power in a computing platform. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. Programming methods on such accelerators includes development of control programs which accelerators executes to […]

CUDA

•

OpenCL

Jan, 13

Convolutional Neural Networks for Human Activity Recognition using Mobile Sensors

A variety of real-life mobile sensing applications are becoming available, especially in the life-logging, fitness tracking and health monitoring domains. These applications use mobile sensors embedded in smart phones to recognize human activities in order to get a better understanding of human behavior. While progress has been made, human activity recognition remains a challenging task. […]

CUDA

Jan, 13

A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of architecture of CUDA-enabled GPUs. The main contribution of this paper is to present an efficient implementation of the O(n^3)-time dynamic programming algorithm for solving the optimal triangulation problem for a convex n-gon in the HMM. Although the HMM can […]

CUDA

Jan, 13

Thorough Evaluation of GPU Shared Memory Load and Store Instructions

This work focuses on measuring the number of GPU clock cycles necessary to execute load/store instructions in both bank conflict and bank conflict-free shared memory access patterns. To this end, a varying number of parameters have been considered in the experiments, including the number of warps (w), the number of memory bank conflicts (k) as […]

CUDA

Jan, 12

A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches

Recent trends of CMOS scaling and increasing number of on-chip cores have led to a large increase in the size of on-chip caches. Since SRAM has low density and consumes large amount of leakage power, its use in designing on-chip caches has become more challenging. To address this issue, researchers are exploring the use of […]

Jan, 10

Face Recognition: A Tutorial on Computational Aspects

Face recognition is a sophisticated problem requiring a significant commitment of computer resources. A modern GPU architecture provides a practical platform for performing face recognition in real time. The majority of the calculations of an eigenpicture implementation of face recognition are matrix multiplications. For this type of computation, a conventional computer GPU is capable of […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

Reducing overheads of dynamic scheduling on heterogeneous chips

Batched Matrix Computations on Hardware Accelerators Based on GPUs

High Performance GPU-based Fourier Volume Rendering

International Conference on Signal Processing, ICOSP 2015

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

Development of an Algorithm for Extracting Parallelism and Pipeline Structure from Stream-based Processing flow with Spanning Tree

Convolutional Neural Networks for Human Activity Recognition using Mobile Sensors

A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine

Thorough Evaluation of GPU Shared Memory Load and Store Instructions

A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches

Face Recognition: A Tutorial on Computational Aspects

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)