high performance computing on graphics processing units: hgpu.org

Posts

Jan, 30

International Conference on Robotics and Automation Engineering (ICRAE), 2016

Call for Paper: Robot design, development and control Vehicle control applications Modeling, simulation and architectures Vision, recognition and reconstruction Hybrid dynamical systems etc. Publication: After a careful reviewing process, all accepted papers after proper registration and presentation, will be published in the conference Proceedings and will be indexed by EI compendex. Keynote Speakers: Prof. Dong […]

Jan, 29

GPU-Accelerated Recurrent Neural Networks: OpenCLLink and SymbolicC

The paper presents application of OpenCLLink in Wolfram Mathematica to accelerate fully recurrent neural networks using GPU. We also show the idea of automatically generated parts of source code using SymbolicC.

OpenCL

Jan, 29

High-Performance Tensor Contractions for GPUs

We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. In particular, to accelerate large […]

CUDA

Jan, 29

GPU Based Methods for Interactive Information Visualization of Big Data

Interactive visual analysis has been a key component of gaining insights in information visualization area. However, the amount of data has increased exponentially in the past few years. Existing information visualization techniques lack scalability to deal with big data, such as graphs with millions of nodes, or millions of multidimensional data records. Recently, the remarkable […]

CUDA

•

OpenGL

Jan, 29

Towards Interactive Visual Exploration of Parallel Programs using a Domain-specific Language

The utilization of GPUs and the massively parallel computing paradigm have become increasingly prominent in many research domains. Recent developments of platforms, such as OpenCL and CUDA, enable the usage of heterogeneous parallel computing in a wide-spread field. However, the efficient utilization of parallel hardware requires profound knowledge of parallel programming and the hardware itself. […]

CUDA

•

OpenCL

Jan, 29

GeNN: a code generation framework for accelerated brain simulations

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational […]

CUDA

Jan, 26

Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search

This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-Nearest Neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively […]

CUDA

Jan, 26

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. […]

Jan, 26

Compositional Compilation for Sparse, Irregular Data Parallelism

While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is […]

OpenCL

Jan, 26

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in […]

Jan, 26

Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer

Power consumption is a major obstacle for High Performance Computing (HPC) systems in their quest towards the holy grail of ExaFLOP performance. Significant advances in power efficiency have to be made before this goal can be attained and accurate modeling is an essential step towards power efficiency by optimizing system operating parameters to match dynamic […]

Jan, 26

4th International Symposium on Computational and Business Intelligence (IEEE-ISCBI), 2016

2016 4th International Symposium on Computational and Business Intelligence (ISCBI 2016) will be held in Olten, Switzerland during September 5-7, 2016. ISCBI 2016 is organized by International Neural Network Society (INNS) India Regional Chapter and University of Applied Sciences and Arts Northwestern Switzerland, Switzerland, is the flagship event of INNS-India. All submissions will be peer […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

International Conference on Robotics and Automation Engineering (ICRAE), 2016

GPU-Accelerated Recurrent Neural Networks: OpenCLLink and SymbolicC

High-Performance Tensor Contractions for GPUs

GPU Based Methods for Interactive Information Visualization of Big Data

Towards Interactive Visual Exploration of Parallel Programs using a Domain-specific Language

GeNN: a code generation framework for accelerated brain simulations

Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Compositional Compilation for Sparse, Irregular Data Parallelism

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer

4th International Symposium on Computational and Business Intelligence (IEEE-ISCBI), 2016

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)