high performance computing on graphics processing units: hgpu.org

Posts

Mar, 3

Increasing predictability of GPU’s

GPU’s are massively multicore architectures managing several thousands of concurrent threads. This concurrence, maintained through several schedulers, is necessary to keep high performance but negatively impact predictability. In this work, we first propose measures of predictability as well as CUDA tests to estimate this measure regarding warp and block scheduler for architectures from G80 to […]

CUDA

Mar, 1

Applications of Linux-Based QT-CUDA Parallel Architecture

Joint programming of QT and CUDA is a urgent problem on Linux, a Linux-based QT-CUDA parallel architecture has been built creatively. As an example, an fast parallel rendering algorithm for seismic and GPR imaging is proposed and implemented based on the Linux QT-CUDA parallel architecture. It is proved that the parallel rendering algorithm is about […]

CUDA

•

OpenGL

Mar, 1

Reducing Beamforming Calculation Time with GPU Accelerated Algorithms

Beamforming algorithms make high demands on the computer hardware and the computation time is an important factor for the assessment of this method. This paper describes techniques for optimizing the implementation of beamforming algorithms in regard to calculation time. The main focus is on using the Graphic Processing Unit for accelerating beamforming. After a brief […]

CUDA

•

OpenCL

Mar, 1

CPU-GPU Collaboration for Output Quality Monitoring

In this paper, we proposed a new low overhead collaborative technique of output quality monitoring for approximate computing on GPUs. In this technique, the CPU is responsible for performing quality monitoring while the GPU executes approximate kernels. For two image processing applications, we showed that this technique outperforms previous quality monitoring techniques.

CUDA

•

OpenCL

Mar, 1

A Multi GPU Read Alignment Algorithm with Model-based Performance Optimization

This paper describes a performance model for read alignment problem, one of the most computationally intensive tasks in bioinformatics. We adapted Burrows Wheeler transform based index to be used with GPUs to reduce overall memory footprint. A mathematical model of computation and communication costs was developed to find optimal memory partitioning for index and queries. […]

CUDA

Mar, 1

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility […]

CUDA

Feb, 28

2014 3rd International Conference on Computer Technology and Science, ICCTS 2014

All papers for the ICCTS 2014 will be published in the IJCEE (ISSN: 1793-8163) as one volume, and will be indexed by Ulrich’s Periodicals Directory, Google Scholar, EBSCO, Engineering & Technology Digital Library, Crossref, ProQuest, DOAJ and EI (INSPEC, IET) and Electronic Journals Library. 2014-04-05 Algorithms Artificial Intelligence Automated Software Engineering Bio-informatics Biomedical Engineering Compilers […]

Feb, 28

Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems

From Mobile to High-Performance Computing (HPC) systems, performance and energy efficiency are becoming always more challenging requirements. In this regard, heterogeneous systems, made by a general-purpose processor and one or more hardware accelerators, are emerging as affordable solutions. However, the effective exploitation of such platforms requires specific programming languages, like for instance OpenCL, and suitable […]

OpenCL

Feb, 28

Expanding the VPE-qGM Environment Towards a Parallel Quantum Simulation of Quantum Processes Using GPUs

Quantum computing proposes quantum algorithms exponentially faster than their classical analogues when executed by a quantum computer. As quantum computers are currently unavailable for general use, one approach for analyzing the behavior and results of such algorithms is the simulation using classical computers. As this simulation is inefficient due to the exponential growth of the […]

CUDA

Feb, 28

A high performance computing for AOM stock trading order matching using GPU

The task of trading orders matching in financial markets is a very challenging task since due to the speed of arriving request. In this paper, the GPUs technology and CUDA programming is explored as a potential technology to accelerate this task. The trading method in Automatic Order Matching (AOM) of Stock Exchange of Thailand (SET) […]

CUDA

Feb, 28

Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment

OpenACC, a directive-based GPU programing standard, is emerging as a promising technology for massively-parallel accelerators, such as General-purpose computing on graphics processing units (GPGPU), Accelerated Processing Unit (APU) and Many Integrated Core Architecture (MIC). The heterogeneous nature of these accelerators call for careful designs of parallel algorithms and data management, which imposes a great hurdle […]

CUDA

Feb, 28

Heterogenous Acceleration for Linear Algebra in Multi-Coprocessor Environments

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems’ algorithms – the LU, QR, and […]

high performance computing on graphics processing units: hgpu.org

Posts

Increasing predictability of GPU’s

Applications of Linux-Based QT-CUDA Parallel Architecture

Reducing Beamforming Calculation Time with GPU Accelerated Algorithms

CPU-GPU Collaboration for Output Quality Monitoring

A Multi GPU Read Alignment Algorithm with Model-based Performance Optimization

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

2014 3rd International Conference on Computer Technology and Science, ICCTS 2014

Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems

Expanding the VPE-qGM Environment Towards a Parallel Quantum Simulation of Quantum Processes Using GPUs

A high performance computing for AOM stock trading order matching using GPU

Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment

Heterogenous Acceleration for Linear Algebra in Multi-Coprocessor Environments

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)