high performance computing on graphics processing units: hgpu.org

Posts

Jul, 5

High Performance Computing and Cluster Technologies Conference (HPCCT), 2016

Publication: All the accepted papers will be published in the HPCCT 2016 conference Proceedings, All the accepted papers will be published in the HPCCT 2016 conference Proceedings, and which will be indexed by Ei Compendex. Conference Schedule: December 17th, 2016 : Registration, collecting conference materials December 18th, 2016 : Opening Remarks & Keynote Speeches Oral […]

Jul, 5

International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), 2017

Publication: All the accepted papers will be published in the ISMSI 2017 conference Proceedings,and reviewed by the IEEE Conference Publication Program for IEEE Xplore and Ei Compendex. Contact: Ms.Yvonne Miller Email:sub@ismsi.org

Jul, 5

4th International Conference on Mechatronics, Electronics and Automation Engineering (ICMEAE), 2017

Publication: All accepted papers of ICMEAE 2017 will be published in ICMEAE 2017 Conference Proceedings,which will be indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. Conference Venue: Chulalongkorn University, Thailand Conference Schedule: April 1st, 2017: Registration and Collecting conference materials April 2nd, 2017: Keynote speeches and Oral presentation April 3rd, 2017: […]

Jul, 5

IEEE 2nd International Conference on Control and Robotics Engineering (ICCRE), 2017

Publication: All submissions will be peer reviewed, and all the accepted papers will be published in the ICCRE 2017 conference Proceedings, and reviewed by the IEEE Conference Publication Program for IEEE Xplore and Ei Compendex. ICCRE 2016 have been included in the IEEE Xplore! Conference Schedule: April 1, 2017: Registration and Collecting conference materials April […]

Jul, 5

Accelerated cryo-EM structure determination with parallelisation using GPUs in relion-2

By reaching near-atomic resolution for a wide range of specimens, single-particle cryo-EM structure determination is transforming structural biology. However, the necessary calculations come at increased computational costs, introducing a bottleneck that is currently limiting throughput and the development of new methods. Here, we present an implementation of the relion image processing software that uses graphics […]

CUDA

Jul, 5

An Adaptive Multi-Spline Refinement Algorithm in Simulation Based Sailboat Trajectory Optimization Using Onboard Multi-Core Computer Systems

A new dynamic programming based parallel algorithm adapted to on-board heterogeneous computers for simulation based trajectory optimization is studied in the context of "high-performance sailing". The algorithm uses a new discrete space of continuously differentiable functions called the multi-splines as its search space representation. A basic version of the algorithm is presented in detail (pseudo-code, […]

OpenCL

Jul, 5

A Test Drive of the NVIDIA Jetson TX1 Developer Kit for Deep Learning and Computer Vision Applications

The Jetson TX1 module is NVIDIA’s latest processor system-on-module for embedded applications, based on the Tegra X1 chip. The Jetson TX1 Developer Kit is a low-cost, feature-rich development kit based on the Jetson TX1 module. BDTI, a technology analysis firm, used the Jetson TX1 Developer Kit to develop a deep-learning-based computer vision application-a camera that […]

CUDA

Jul, 5

Time Predictability of GPU Kernel on an HSA Compliant Platform

During recent years, the importance of utilizing more computational power in smaller computer systems has increased. The utilization of more computational power in smaller packages, the ability to combine more than one type of processor unit has become more popular in the industry. By combining, one achieves more power efficiency as well as gain more […]

OpenCL

Jul, 5

OpenCL Implementation of a Parallel Universal Kriging Algorithm for Massive Spatial Data Interpolation on Heterogeneous Systems

In some digital Earth engineering applications, spatial interpolation algorithms are required to process and analyze large amounts of data. Due to its powerful computing capacity, heterogeneous computing has been used in many applications for data processing in various fields. In this study, we explore the design and implementation of a parallel universal kriging spatial interpolation […]

OpenCL

Jun, 30

GPRM: a high performance programming framework for manycore processors

Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient […]

Jun, 30

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism – such as flat or two-level parallelism – and a degree of parallelism that can be statically determined based on the size of the input dataset. However, the effective use of GPUs for algorithms exhibiting complex patterns of parallelism, possibly known only […]

CUDA

Jun, 30

DeepBE: Learning Deep Binary Encoding for Multi-Label Classification

The track 2 and track 3 of ChaLearn 2016 can be considered as Multi-Label Classification problems. We present a framework of learning deep binary encoding (DeepBE) to deal with multi-label problems by transforming multi-labels to single labels. The transformation of DeepBE is in a hidden pattern, which can be well addressed by deep convolutions neural […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High Performance Computing and Cluster Technologies Conference (HPCCT), 2016

International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), 2017

4th International Conference on Mechatronics, Electronics and Automation Engineering (ICMEAE), 2017

IEEE 2nd International Conference on Control and Robotics Engineering (ICCRE), 2017

Accelerated cryo-EM structure determination with parallelisation using GPUs in relion-2

An Adaptive Multi-Spline Refinement Algorithm in Simulation Based Sailboat Trajectory Optimization Using Onboard Multi-Core Computer Systems

A Test Drive of the NVIDIA Jetson TX1 Developer Kit for Deep Learning and Computer Vision Applications

Time Predictability of GPU Kernel on an HSA Compliant Platform

OpenCL Implementation of a Parallel Universal Kriging Algorithm for Massive Spatial Data Interpolation on Heterogeneous Systems

GPRM: a high performance programming framework for manycore processors

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

DeepBE: Learning Deep Binary Encoding for Multi-Label Classification

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)