high performance computing on graphics processing units: hgpu.org

Posts

Oct, 12

Neural Network Computing Using On-Chip Accelerators

The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. […]

Oct, 12

Portage: Bringing Hackers’ Wisdom to Science

Providing users of HPC systems with a wide variety of up to date software packages is a challenging task. Large software stacks built from source are difficult to manage, requiring powerful package management tools. The Portage package manager from Gentoo is a highly flexible tool that offers a mature solution to this otherwise daunting task. […]

Oct, 12

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory […]

CUDA

Oct, 9

International Conference on Digital Signal Processing (ICDSP), 2017

For papers submitted to ICDSP 2017, we offer the publications as following: 1. Publication in proceedings, which will be indexed by EI Compendex, Scopus, and ISI CPCS. 2. Publication published in the International Journal of Signal Processing Systems, which will be indexed by EI (INSPEC, IET), Google Scholar, etc There are two methods for submitting […]

Oct, 9

6th International Conference on Frontiers of Information Technology (ICFIT), 2017

For papers submitted to ICFIT 2017, we offer the publications as following: 1. Publication in Proceedings. Submissions will be peer reviewed by conference committees, and accepted papers will be published in proceedings, which will be indexed by EI Compendex, Scopus, and ISI CPCS. 2. Publication in Journal. Submissions will be reviewed by the conference committees […]

Oct, 9

6th International Conference on Software and Computing Technologies (ICSCT), 2017

For papers submitted to ICSCT 2017, we offer the publications as following: 1. Publication in Proceedings. Submissions will be peer reviewed by conference committees, and accepted papers will be published in proceedings, which will be indexed by EI Compendex, Scopus, and ISI CPCS. 2. Publication in Journal. Submissions will be reviewed by the conference committees […]

Oct, 9

2nd IEEE International Conference on Signal and Image Processing (ICSIP), 2017

1.Publication: After a careful reviewing process, all accepted papers after proper registration and presentation, will be published in the conference Proceedings by IEEE, and sent to be reviewed by the IEEE Conference Publication Program for IEEE Xplore and Ei Compendex. 2.Submission Methods: Electronic Submission System (.pdf) https://www.easychair.org/conferences/?conf=icsip2017

Oct, 8

Implementation of Frequency Domain Convolution for the Caffe-Framework

Deep Convolutional Neural Networks have received a lot of attention over the past few years as a promising technique for object classification in images. In this thesis, we implemented the frequency domain convolution for the popular Caffe framework. Deep Convolutional Neural Networks suffer from long training times even on contemporary hardware, which we want to […]

CUDA

Oct, 8

GPU Concurrency Choices in Graph Analytics

Graph analytics is becoming ever more ubiquitous in today’s world. However, situational dynamic changes in input graphs, such as changes in traffic and weather patterns, lead to variations in concurrency. Moreover, graph algorithms are known to have data dependent loops and fine-grain synchronization that makes them hard to scale on parallel machines. Recent trends in […]

OpenCL

Oct, 8

BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with […]

CUDA

Oct, 8

Rinnegan: Efficient Resource Use in Heterogeneous Architectures

Current processors provide a variety of different processing units to improve performance and power efficiency. For example, ARM’s big.LITTLE, AMD’s APUs, and Oracle’s M7 provide heterogeneous processors, on-die GPUs, and on-die accelerators. However, the performance experienced by programs using these processing units can vary widely due to contention from multiprogramming, thermal constraints and other issues. […]

OpenCL

Oct, 8

A Runtime Controller for OpenCL Applications on Heterogeneous System Architectures

Heterogeneous architectures nowadays are becoming very attractive in the embedded and mobile markets thanks to the possibility to exploit the best computational resource to optimize the performance per Watt figure of merit. Unfortunately, deciding the right resource to use and its operating frequency is a difficult problem that depends on the actual conditions in which […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Neural Network Computing Using On-Chip Accelerators

Portage: Bringing Hackers’ Wisdom to Science

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

International Conference on Digital Signal Processing (ICDSP), 2017

6th International Conference on Frontiers of Information Technology (ICFIT), 2017

6th International Conference on Software and Computing Technologies (ICSCT), 2017

2nd IEEE International Conference on Signal and Image Processing (ICSIP), 2017

Implementation of Frequency Domain Convolution for the Caffe-Framework

GPU Concurrency Choices in Graph Analytics

BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

Rinnegan: Efficient Resource Use in Heterogeneous Architectures

A Runtime Controller for OpenCL Applications on Heterogeneous System Architectures

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)