high performance computing on graphics processing units: hgpu.org

Posts

Mar, 15

Melia: A MapReduce Framework on OpenCL-based FPGAs

MapReduce, originally developed by Google for search applications, has recently become a popular programming framework for parallel and distributed environments. This paper presents an energy-efficient architecture design for MapReduce on Field Programmable Gate Arrays (FPGAs). The major goal is to enable users to program FPGAs with simple MapReduce interfaces, and meanwhile to embrace automatic performance […]

OpenCL

Mar, 15

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity for accelerating MF much further when appropriately exploiting the GPU architectural characteristics. This paper presents cuMF, a CUDA-based matrix factorization library that implements memory-optimized […]

CUDA

Mar, 14

2nd IEEE International Conference on Computer and Communications (ICCC), 2016

Submission Date: Before July 1 History: Good News! All papers from ICCC 2015 has been included in IEEE Xplore. Supported by: ICCC 2016 is hosted by IEEE and Sichuan Institue of Electronics, co-organized by Southwest Jiaotong University and Xihua University. Publication: All accepted papers must be written in English and will be published into conference […]

Mar, 14

The First Int. Conference on Multimedia and Image Processing (ICMIP), 2016

ICMIP 2016 is organized by University of Brunei Darussalam, Brunei Darussalam. Publication: After a careful reviewing process, all accepted papers will be published in the Conference Proceedings, and send to be reviewed by EI Compendex. Invited Speakers from International Prestigious University: Prof. Amine Bermak, IEEE Fellow, Hong Kong University of Science and Technology, Hong Kong […]

Mar, 14

6th Int. Workshop on Computer Science and Engineering (WCSE), 2016

All accepted of WCSE 2016 will be published by Conference proceedings, which will be indexed by 【EI &Scopus.】 Keynote &Plenary Speakers Prof. Hayato Ohwada, Tokyo University of Science, Japan Prof. Taku Harada, Tokyo University of Science, Japan Prof. Akiko Aizawa, National Institute of Informatics, Japan Prof. Hiroyuki Nishiyama, Tokyo University of Science, Japan Conference Program […]

Mar, 12

Machine Learning at the Limit

Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes […]

CUDA

Mar, 12

SGO: An ultrafast engine for atomic structure global optimization by differential evolution

This paper presents a fast method for global search of atomic structures. The structures global optimization (SGO) engine consists of a high-efficiency differential evolution algorithm, accelerated local relaxation methods and an ultrafast density functional theory plane-wave code run on GPU machines. It can search the global minimum configuration of crystals, two-dimensional materials and quantum clusters […]

Mar, 12

A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the […]

Mar, 12

Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy

Purpose: To demonstrate the feasibility of fast Monte Carlo (MC) based inverse biological planning for the treatment of head and neck tumors in spot-scanning proton therapy. Methods: Recently, a fast and accurate Graphics Processor Unit (GPU)-based MC simulation of proton transport was developed and used as the dose calculation engine in a GPU-accelerated IMPT optimizer. […]

Mar, 12

Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models

The move from single core and processor systems to multi-core and many-processors systemscomes with the requirement of implementing computations in a way that can utilizethese multiple units eciently. This task of writing ecient multi-threaded algorithmswill not be possible with out improving programming languages and compilers to providethe mechanisms to do so. Computer aided mathematical modeling […]

OpenCL

Mar, 10

Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit

Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the CPU, thus limiting their usefulness. The single-locus Wright-Fisher forward algorithm is, however, exceedingly parallelizable, with many steps which are so-called embarrassingly parallel, consisting of a vast number of individual computations that are all […]

CUDA

Mar, 10

Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures

The ubiquity of hybrid CPU+GPU architectures has led to renewed interest in automatic data layout generation owing to the fact that data layouts have a large impact on performance, and that different data layouts yield the best performance on CPUs vs. GPUs. Unfortunately, current programming models still fail to provide an effective solution to the […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Melia: A MapReduce Framework on OpenCL-based FPGAs

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

2nd IEEE International Conference on Computer and Communications (ICCC), 2016

The First Int. Conference on Multimedia and Image Processing (ICMIP), 2016

6th Int. Workshop on Computer Science and Engineering (WCSE), 2016

Machine Learning at the Limit

SGO: An ultrafast engine for atomic structure global optimization by differential evolution

A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy

Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models

Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit

Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)