16816

Posts

Dec, 14

Translating OpenMP Device Constructs to OpenCL using Unnecessary Data Transfer Elimination

In this paper, we propose a framework that translates OpenMP 4.0 accelerator directives to OpenCL. By translating an OpenMP program to an OpenCL program, the program can be executed on any hardware platform that supports OpenCL. We also propose a run-time optimization technique that automatically eliminates unnecessary data transfers between the host and the target […]
Dec, 14

Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units in Support of Scientific Computation

The most popular multithreaded languages based on the fork-join concurrency model (CilkPlus, OpenMP) are currently being extended to support other forms of parallelism (vectorization, pipelining and single-instruction-multiple-data (SIMD)). In the SIMD case, the objective is to execute the corresponding code on a many-core device, like a GPGPU, for which the CUDA language is a natural […]
Dec, 14

nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware

In this work, a novel package called nmfgpu4R is presented, which offers the computation of Non-negative Matrix Factorization (NMF) on Compute Unified Device Architecture (CUDA) platforms within the R environment. Benchmarks show a remarkable speed-up in terms of time per iteration by utilizing the parallelization capabilities of modern graphics cards. Therefore the application of NMF […]
Dec, 14

GaDei: On Scale-up Training As A Service For Deep Learning

Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user’s dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that […]
Dec, 13

5th International Conference on Sustainable Development (ICSD), 2017

The 5th ICSD 2017 will be an excellent opportunity to share your ideas and research findings relevant to the Sustainability Science, through the European network of academics Papers will be published in EJSD Journal (Thompson Reuters) and Proceedings. European Center of Sustainable Development in collaboration with CIT University will organize the 5th ICSD 2017 Rome, […]
Dec, 10

cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs

The Fast Fourier Transform (FFT) is one of the most important numerical tools widely used in many scientific and engineering applications. The algorithm performs O(nlogn) operations on n input data points in order to calculate only small number of k large coefficients, while the rest of n − k numbers are zero or negligibly small. […]
Dec, 10

Implementing and Evaluating Candidate-Based Invariant Generation

The discovery of inductive invariants lies at the heart of static program verification. This paper describes our efforts to apply candidate-based invariant generation in GPUVerify, a static checker of programs that run on GPUs. We study a set of 383 GPU programs that contain loops, drawn from a number of open source suites and vendor […]
Dec, 10

Performance Evaluation and Optimization of HPCG benchmark on CPU + MIC platform

High-performance conjugate gradient (HPCG) is the latest benchmark adopted by the TOP500 organization, and thus how to optimize the HPCG source code for different heterogeneous computing platforms to achieve a higher floating-point computation rate has already become a new hot issue in HPC field. In the paper, we used the CPU + MIC heterogeneous computing […]
Dec, 10

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a […]
Dec, 10

Adaptive Work-Efficient Connected Components on the GPU

This report presents an adaptive work-efficient approach for implementing the Connected Components algorithm on GPUs. The results show a considerable increase in performance (up to 6.8x) over current state-of-the-art solutions.
Dec, 10

BrainFrame: A heterogeneous accelerator platform for neuron simulations

OBJECTIVE: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field […]
Dec, 6

Brownian Dynamics of Active Sphere Suspensions Confined Near a No-Slip Boundary

We develop numerical methods for performing efficient Brownian dynamics of colloidal suspensions confined to remain in the vicinity of a no-slip wall by gravity or active flows. We present a stochastic Adams-Bashforth integrator for the Brownian dynamic equations, which is second-order accurate deterministically and uses a random finite difference to capture the stochastic drift proportional […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: