high performance computing on graphics processing units: hgpu.org

Posts

Jan, 19

Human Re-identification System On Highly Parallel GPU and CPU Architectures

The paper presents a new approach to the human reidentification problem using covariance features. In many cases a distance operator between signatures based on generalized eigenvalues has to be computed efficiently, especially once the real-time response is expected from the system. This is a challenging problem as many procedures are computationally intensive tasks and must […]

CUDA

Jan, 19

Power Profiling and Optimization for Heterogeneous Multi-Core Systems

Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core computing platforms. Our approach enables rapid and automated design space exploration involving optimisation of workload distribution for systems with accelerators such as […]

CUDA

Jan, 19

Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy

Heterogeneous parallel computing has become an unavoidable consequence of the emergence of GeneralPurpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)-including significant memory transfer latency and complex performance characteristics-demand new approaches to ensuring that all available computational resources are geared towards optimal utilisation. This paper considers the simple case […]

Jan, 19

A Compile-Time Managed Multi-Level Register File Hierarchy

As processors increasingly become power limited, performance improvements will be achieved by rearchitecting systems with energy efficiency as the primary design constraint. While some of these optimizations will be hardware based, combined hardware and software techniques likely will be the most productive. This work redesigns the register file system of a modern throughput processor with […]

CUDA

Jan, 19

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Data analytics are key applications running in the cloud computing environment. To improve performance and cost-effectiveness of a data analytics cluster in the cloud, the data analytics system should account for heterogeneity of the environment and workloads. In addition, it also needs to provide fairness among jobs when multiple jobs share the cluster. In this […]

Jan, 18

Parallel Optimization of Queries in XML Dataset Using GPU

As XML is playing a crucial role in web services, databases, and document processing, efficient processing of XML queries has become an important issue. On the other hand, due to the increasing number of users, high throughput of XML queries is also required to execute tens of thousands of queries in a short time. Given […]

CUDA

Jan, 18

Solving the Boltzmann equation on GPUs

We show how to accelerate the direct solution of the Boltzmann equation using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we choose a method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo evaluation of the collision integral. The efficiency […]

CUDA

Jan, 18

GPU Accelerated Registration of a Statistical Shape Model of the Lumbar Spine to 3D Ultrasound Images

We present a parallel implementation of a statistical shape model registration to 3D ultrasound images of the lumbar vertebrae (L2-L4). Covariance Matrix Adaptation Evolution Strategy optimization technique, along with Linear Correlation of Linear Combination similarity metric have been used, to improve the robustness and capture range of the registration approach. Instantiation and ultrasound simulation have […]

CUDA

Jan, 18

Cognitive radio network for the smart grid: Experimental system architecture, control algorithms, security, and microgrid testbed

This paper systematically investigates the novel idea of applying the next generation wireless technology, cognitive radio network, for the smart grid. In particular, system architecture, algorithms, and hardware testbed are studied. A microgrid testbed supporting both power flow and information flow is also proposed. Control strategies and security considerations are discussed. Furthermore, the concept of […]

CUDA

Jan, 18

The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high performance computing (HPC) facilities-often with a limited customizable user environment-are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop […]

CUDA

Jan, 18

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

In this manuscript, the parallel computation is investigated including reviewing different programming APIs and architectures. Two specific parallel API-MPI and CUDA C are deeply analyzed. Two sorting algorithms and a visual mathematic problem are implemented with MPI alone with performance analysis. A stable fluid dynamics simulation has been experimented with CUDA. We also present a […]

CUDA

•

OpenGL

Jan, 18

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images

The context of this work is related to the vertebra segmentation. The method we propose is based on the active shape model (ASM). An original approach taking advantage of the edge polygonal approximation was developed to locate the vertebra positions in a X-ray image. Despite the fact that segmentation results show good efficiency, the time […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Human Re-identification System On Highly Parallel GPU and CPU Architectures

Power Profiling and Optimization for Heterogeneous Multi-Core Systems

Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy

A Compile-Time Managed Multi-Level Register File Hierarchy

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Parallel Optimization of Queries in XML Dataset Using GPU

Solving the Boltzmann equation on GPUs

GPU Accelerated Registration of a Statistical Shape Model of the Lumbar Spine to 3D Ultrasound Images

Cognitive radio network for the smart grid: Experimental system architecture, control algorithms, security, and microgrid testbed

The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)