high performance computing on graphics processing units: hgpu.org

Posts

Mar, 25

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals II

Evaluation of a wide variety of Feynman diagrams with multi-loop integrals and physical parameters and its comparison with high energy experiments are expected to investigate new physics beyond the Standard Model. We have been developing a direct computation method of multi-loop integrals of Feynman diagrams. One of features of our method is that we adopt […]

OpenCL

Mar, 25

MALBEC: a new CUDA-C ray-tracer in General Relativity

A new CUDA-C code for tracing orbits around non-charged black holes is presented. This code is named MALBEC, and take advantage of the graphic processing units and the CUDA platform in order to track the geodesic motion of null and timelike test particles in Schwarzschild and Kerr. Additionally, a new general set of equations that […]

CUDA

Mar, 25

Accelerating CNN inference on FPGAs: A Survey

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as […]

OpenCL

Mar, 22

The VOLNA-OP2 Tsunami Code (Version 1.0)

In this paper, we present the VOLNA-OP2 tsunami model and implementation; a finite volume non-linear shallow water equations (NSWE) solver built on the OP2 domain specific language for unstructured mesh computations. VOLNA-OP2 is unique among tsunami solvers in its support for several high performance computing platforms: CPUs, the Intel Xeon Phi, and GPUs. This is […]

CUDA

Mar, 22

FPGA in HPC: High Level Synthesys of OpenCL kernels for Molecular Dynamics

The overall goal of this thesis is to evaluate the feasibility of FPGA based computer system in HPC. This works is performed within ExaNeSt, an EU funded project which aims to develop and prototype energy efficient solutions for the production of exascale-level supercomputers. As the matter of fact, the current computer architectures need to be […]

OpenCL

Mar, 22

A multi-agent architecture for scheduling of high performance services in a GPU cluster

Nowadays, clusters containing multiple GPU nodes are widely used to execute high-performance computing applications. Diverse disciplines use these clusters to improve the performance of several services that consume high computational resources. The challenge of executing high-performance computing applications becomes harder when the applications are executed concurrently and each one of them may demand multiple GPU […]

Mar, 22

TBD: Benchmarking and Analyzing Deep Neural Network Training

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference — i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary […]

CUDA

Mar, 22

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Graphics Processing Units (GPUs) perform the majority of computations in state-of-the-art supercomputers. Programming these GPUs is often assisted using a programming model such as (amongst others) the directive-driven OpenACC. Unfortunately, OpenACC (and other similar models) are incapable of automatically targeting and distributing work across several GPUs, which decreases productivity and forces needless manual labor upon […]

Mar, 18

International Conference on Biomedicine & Pharmacotherapy, 2018

International Conference on Biomedicine & Pharmacotherapy is going to be held during August 06-07, 2018 in Osaka, Japan. The conferences focuses on foremost topics such as Biomedicine, Biomedical Statistics, Biomedical Diagnosis, Frontiers in Biomedicine, Industrial Pharmacy, Pharmacotherapy, Molecular Biomedicine, Computational Biomedicine, Tissue Engineering, Medical Devices, Biomedical Model, Personalized Medicine, Biomedical Technology, Nanotechnology, Pharmacotherapy, Pharmaceutical Sciences, […]

Mar, 18

8th International Workshop on Computer Science and Engineering (WCSE’18), 2018

Meeting time：June 28-30, 2018 Meeting place：1880 New Petchburi Road, Bangkok 10310 Thailand Organized by Science and Engineering Institute, co organized by Bauman Moscow State technical University, Russia, Tokyo University of Science, Japan and China Agricultural University, 2018 the 8th International Workshop on Computer Science and Engineering (WCSE 2018) to Bangkok, Thailand during June 28-30, 2018. […]

Mar, 18

International Conference on Robotics, Artificial Intelligence, Automation and Mechatronics, 2018

Mechatronics and Robotics 2018 warmly welcome all the researchers, developers, experts, students from the field of mechatronics & robotics to attend International Conference on Mechatronics & Robotics during October 15-16, 2018, Helsinki, Finland. The Conference will be composed around the theme “Unfolding Knowledge with a Delineate Technical World". Sessions covered as mentioned below: 1. Mechatronics […]

Mar, 18

The 3rd IEEE International Conference on Image, Vision and Computing (ICIVC), 2018

Meeting time: June 27-29, 2018 Meeting place: No.2 Chong Wen Road, Nan An District, Chongqing 400065, P. R. China Keynote speakers Prof. Lap-Pui Chau, Nanyang Technological University, Singapore (IEEE Fellow) Prof. Shahram Latifi, UNIVERSITY OF NEVADA, USA (IEEE Fellow) Published by: All accepted papers must be written in English and will be published into conference proceedings and indexed […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals II

MALBEC: a new CUDA-C ray-tracer in General Relativity

Accelerating CNN inference on FPGAs: A Survey

The VOLNA-OP2 Tsunami Code (Version 1.0)

FPGA in HPC: High Level Synthesys of OpenCL kernels for Molecular Dynamics

A multi-agent architecture for scheduling of high performance services in a GPU cluster

TBD: Benchmarking and Analyzing Deep Neural Network Training

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

International Conference on Biomedicine & Pharmacotherapy, 2018

8th International Workshop on Computer Science and Engineering (WCSE’18), 2018

International Conference on Robotics, Artificial Intelligence, Automation and Mechatronics, 2018

The 3rd IEEE International Conference on Image, Vision and Computing (ICIVC), 2018

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)