high performance computing on graphics processing units: hgpu.org

Posts

Feb, 27

International Conference on Bioinformatics and Computational Intelligence (ICBCI), 2017

Publication ICBCI 2017 will be published in Proceedings. Submission Methods Electronic Submission System (.pdf) http://www.easychair.org/conferences/?conf=icbci2017 Contacts Ms. Ada R. L. Wei Email: icbci@zhconf.ac.cn Tel: +86-28-8625-6789 10 am–12 am, 2 pm-6 pm, Monday to Friday

Feb, 27

The 2nd International Conference on Network Security (ICNS), 2017

2017 II International Conference on Network Security (ICNS 2017) will be held in Kunming, China, during December 8-10, 2017. ICNS 2017 will be a remarkable event which brings together professors, researchers and students in the field of Network Security making the conference a perfect platform to share experience, foster collaborations across industry and academia, and […]

Feb, 26

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to […]

OpenCL

Feb, 26

liquidSVM: A Fast and Versatile SVM package

liquidSVM is a package written in C++ that provides SVM-type solvers for various classification and regression tasks. Because of a fully integrated hyper-parameter selection, very carefully implemented solvers, multi-threading and GPU support, and several built-in data decomposition strategies it provides unprecedented speed for small training sizes as well as for data sets of tens of […]

CUDA

Feb, 26

Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images

Connected component labeling (CCL) is a traditionally sequential problem that is hard to parallelize. This report aims to test the performance of solving CCL using massively parallel hardware through GPGPU. To achieve this several CCL algorithms were researched and implemented using C++ and OpenCL. The results showed an improvement of up to a factor of […]

OpenCL

Feb, 26

Large-Scale Stochastic Learning using GPUs

In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU implementation of the widely used stochastic coordinate descent/ascent algorithm that can provide up to 35x speed-up over a sequential CPU implementation. In order […]

CUDA

Feb, 26

First Experiences Optimizing Smith-Waterman on Intel’s Knights Landing Processor

The well-known Smith-Waterman (SW) algorithm is the most commonly used method for local sequence alignments. However, SW is very computationally demanding for large protein databases. There exist several implementations that take advantage of computing parallelization on many-cores, FPGAs or GPUs, in order to increase the alignment throughtput. In this paper, we have explored SW acceleration […]

Feb, 22

Dynamic Buffer Overflow Detection for GPGPUs

Buffer overflows are a common source of program crashes, data corruption, and security problems. In this work, we demonstrate that GPU-based workloads can also cause buffer overflows, a problem that was traditionally ignored because CPUs and GPUs had separate memory spaces. Modern GPUs share virtual, and sometimes physical, memory with CPUs, meaning that GPU-based buffer […]

OpenCL

Feb, 22

MCBooster: a library for fast Monte Carlo generation of phase-space decays on massively parallel platforms

MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on […]

CUDA

Feb, 22

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy […]

CUDA

Feb, 22

Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two-level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors […]

CUDA

Feb, 22

Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach

Cyber-epidemics, the widespread of fake news or propaganda through social media, can cause devastating economic and political consequences. A common countermeasure against cyber-epidemics is to disable a small subset of suspected social connections or accounts to effectively contain the epidemics. An example is the recent shutdown of 125,000 ISIS-related Twitter accounts. Despite many proposed methods […]

CUDA

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

International Conference on Bioinformatics and Computational Intelligence (ICBCI), 2017

The 2nd International Conference on Network Security (ICNS), 2017

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

liquidSVM: A Fast and Versatile SVM package

Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images

Large-Scale Stochastic Learning using GPUs

First Experiences Optimizing Smith-Waterman on Intel’s Knights Landing Processor

Dynamic Buffer Overflow Detection for GPGPUs

MCBooster: a library for fast Monte Carlo generation of phase-space decays on massively parallel platforms

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)