17936

Posts

Jan, 20

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the hardware characteristics. The expected benefit is to improve robustness, portability and efficiency of the generated computer programs. We address […]
Jan, 20

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. […]
Jan, 20

Fast and Flexible GPU Accelerated Binding Free Energy Calculations within the AMBER Molecular Dynamics Package

Alchemical free energy calculations (AFE) based on molecular dynamics (MD) simulations are key tools in both improving our understanding of a wide variety of biological processes and accelerating the design and optimization of therapeutics for numerous diseases. Computing power and theory have, however, long been insufficient to enable AFE calculations to be routinely applied in […]
Jan, 20

Evaluation of Machine Learning Fameworks on Finis Terrae II

Machine Learning (ML) and Deep Learning (DL) are two technologies used to extract representations of the data for a specific purpose. ML algorithms take a set of data as input to generate one or several predictions. To define the final version of one model, usually there is an initial step devoted to train the algorithm […]
Jan, 15

The 2018 International Conference on High Performance Computing & Simulation (HPCS), 2018

The 2018 International Conference on High Performance Computing & Simulation (HPCS 2018) will be held on July 16 – 20, 2018 in Orléans, France (Tentative). Under the theme of “HPC and Modeling & Simulation for the 21st Century," HPCS 2018 will focus on a wide range of the state-of-the-art as well as emerging topics pertaining […]
Jan, 13

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

Applications written for heterogeneous CPU-GPU systems often suffer from poor performance portability. Finding good work partitions can also be challenging as different devices are suited for different applications. This article describes ImageCL, a high-level domain-specific language and source-to-source compiler, targeting single system as well as distributed heterogeneous hardware. Initially targeting image processing algorithms, our framework […]
Jan, 13

pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment

BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of […]
Jan, 13

Graph Processing on GPUs: A Survey

In the big data era, much real-world data can be naturally represented as graphs. Consequently, many application domains can be modeled as graph processing. Graph processing, especially the processing of the large scale graphs with the number of vertices and edges in the order of billions or even hundreds of billions, has attracted much attention […]
Jan, 13

High Performance Stencil Code Generation with Lift

Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places […]
Jan, 13

Deep In-GPU Experience Replay

Experience replay allows a reinforcement learning agent to train on samples from a large amount of the most recent experiences. A simple in-RAM experience replay stores these most recent experiences in a list in RAM, and then copies sampled batches to the GPU for training. I moved this list to the GPU, thus creating an […]
Jan, 6

GPU Acceleration of a High-Order Discontinuous Galerkin Incompressible Flow Solver

We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a […]
Jan, 6

Rubus: A compiler for seamless and extensible parallelism

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: