high performance computing on graphics processing units: hgpu.org

Posts

Aug, 11

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an […]

OpenCL

Aug, 11

9th International Conference on Machine Learning and Computing (ICMLC), 2017

Paper Publication ICMLC 2017 proceedings will be published in the International Conference Proceedings Series by ACM, which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). Proceedings ISBN: 978-1-4503-4783-9 Submission Methods You can […]

Aug, 11

International Conference on Bioinformatics and Computing Technologies (ICBCT), 2017

Publication All papers accepted by this conference will be published by International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB) / International Journal of Machine Learning and Computing (IJMLC), and will be submitted to EI(INSPEC) to include. Submission Please submit your full paper to us:icbct@saise.org

Aug, 11

9th International Conference on Computer and Automation Engineering (ICCAE), 2017

Publication All accepted papers of ICCAE 2017 will be published in the International Conference Proceedings Series by ACM, (ISBN: 978-1-4503-4791-4), which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). ICCAE 2016 conference […]

Aug, 8

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on many-core architectures. This paper presents first performance results of the particle-in-cell plasma simulation code […]

Aug, 8

Accelerating Computational Finance Simulations with OpenCL

Computational finance is a domain, where performance is in high demand. Therefore, we investigate the suitability of two families of accelerators for computational finance simulations. Specifically, we use a scenario-based ALM (Asset Liability Management) model and design a suitable OpenCL implementation. We further improve the performance of the application by applying several typical optimization techniques […]

OpenCL

Aug, 8

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today’s platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But […]

OpenCL

Aug, 8

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over […]

OpenCL

Aug, 8

OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory

We present a method to classify objects in video streams using a brain-inspired Hierarchical Temporal Memory (HTM) algorithm. Object classification is a challenging task where humans still significantly outperform machine learning algorithms due to their unique capabilities. We have implemented a system which achieves very promising performance in terms of recognition accuracy. Unfortunately, conducting more […]

OpenCL

Aug, 5

Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Adaptive Mesh Refinement methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. This paper […]

CUDA

Aug, 4

Parallel experiments with RARE-BLAS

Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. […]

Aug, 4

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

9th International Conference on Machine Learning and Computing (ICMLC), 2017

International Conference on Bioinformatics and Computing Technologies (ICBCT), 2017

9th International Conference on Computer and Automation Engineering (ICCAE), 2017

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Accelerating Computational Finance Simulations with OpenCL

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory

Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Parallel experiments with RARE-BLAS

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)