high performance computing on graphics processing units: hgpu.org

Posts

Feb, 23

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue", to a specific device for the entire program. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort […]

OpenCL

Feb, 23

Sparse Convex Optimization on GPUs

Convex optimization is a fundamental mathematical framework used for general problem solving. The computational time taken to optimize problems formulated as Linear Programming, Integer Linear Programming or Quadratic Programming has an immediate impact on countless application fields, and it is critical to determining which problems we will be able to solve in the future. Since […]

CUDA

Feb, 23

Togpu: Automatic Source Transformation from C++ to CUDA using Clang/LLVM

Parallel processing using GPUs provides substantial increases in algorithm performance across many disciplines. As a result serial algorithms are commonly translated to parallel algorithms written in CUDA or OpenCL. To perform this translation a user must first overcome various barriers to entry. These obstacles change depending on the user but in general may include learning […]

CUDA

Feb, 19

A GPU-based Large-scale Monte Carlo Simulation Method for Systems with Long-range Interactions

In this work we present an efficient implementation of Canonical Monte Carlo simulation for Coulomb many body systems on graphics processing units (GPU). Our method takes advantage of the GPU Single Instruction, Multiple Data (SIMD) architectures. It adopts the sequential updating scheme of Metropolis algorithm, and makes no approximation in the computation of energy. It […]

CUDA

Feb, 19

HeSP: a simulation framework for solving the task scheduling-partitioning problem on heterogeneous architectures

In this paper we describe HeSP, a complete simulation framework to study a general task scheduling-partitioning problem on heterogeneous architectures, which treats recursive task partitioning and scheduling decisions on equal footing. Considering recursive partitioning as an additional degree of freedom, tasks can be dynamically partitioned or merged at runtime for each available processor type, exposing […]

CUDA

•

OpenCL

Feb, 19

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. A key feature of our scheme is that […]

OpenCL

Feb, 19

LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition

Personal mobile devices offer a growing variety of personalized services that enrich considerably the user experience. This is made possible by increased access to personal information, which to a large extent is extracted from user email messages and archives. There are, however, two main issues. First, currently these services can be offered only by large […]

OpenCL

Feb, 19

Gravitational wave astrophysics, data analysis and multimessenger astronomy

This paper reviews gravitational wave sources and their detection. One of the most exciting potential sources of gravitational waves are coalescing binary black hole systems. They can occur on all mass scales and be formed in numerous ways, many of which are not understood. They are generally invisible in electromagnetic waves, and they provide opportunities […]

CUDA

Feb, 19

The 4th International Symposium on Computing and Networking

Following the success of past ICNC conferences, 2010 in Hiroshima, 2011 in Osaka, and 2012 in Okinawa, and CANDAR symposiums 2013 in Matsuyama, 2014 in Shizuoka, 2015 in Sapporo, CANDAR 2016 will be held in Hiroshima, Japan. CANDAR 2016 will serve as a forum for exchanging the latest findings and experiences ranging from theoretical research […]

Feb, 18

Deep Feature-based Face Detection on Mobile Devices

We propose a deep feature-based face detector for mobile devices to detect user’s face acquired by the front facing camera. The proposed method is able to detect faces in images containing extreme pose and illumination variations as well as partial faces. The main challenge in developing deep feature-based algorithms for mobile devices is the constrained […]

OpenCL

Feb, 18

Speeding Up Reinforcement Learning with Graphics Processing Units

Conventionally programmed systems (e.g. robots) are not able to adapt to unforeseen changes in their task or environment. Reinforcement learning (RL), a machine learning approach, could grant this flexibility. Many fields of work could greatly benefit from this, be it in terms of cost, time or some other parameter. With RL, a learning agent tries […]

OpenCL

Feb, 17

Deep Learning on FPGAs: Past, Present, and Future

The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems automatically from massive amounts of data has led to ground-breaking performance in important domains such as computer vision, speech recognition, […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Sparse Convex Optimization on GPUs

Togpu: Automatic Source Transformation from C++ to CUDA using Clang/LLVM

A GPU-based Large-scale Monte Carlo Simulation Method for Systems with Long-range Interactions

HeSP: a simulation framework for solving the task scheduling-partitioning problem on heterogeneous architectures

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition

Gravitational wave astrophysics, data analysis and multimessenger astronomy

The 4th International Symposium on Computing and Networking

Deep Feature-based Face Detection on Mobile Devices

Speeding Up Reinforcement Learning with Graphics Processing Units

Deep Learning on FPGAs: Past, Present, and Future

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)