
May, 19

SPOC: GPGPU Programming Through Stream Processing With OCaml

General purpose computing on graphics processing units (GPGPU) consists of using GPUs to handle computations commonly handled by CPUs. GPGPU programming implies developing specific programs to run on GPUs managed by a host program running on the CPU. To achieve high performance implies to explicitly organize memory transfers between devices. Besides, different incompatible frameworks exist […]
May, 19

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and […]
May, 19

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory […]
May, 19

High Performance Monte Carlo and Time-Stepping Dynamics for the Classical Spin Heisenberg Model on GPUs

The Heisenberg model of classical spins makes use of both Monte Carlo stochastic dynamics as well as time-integration of its equation of motion. These two schemes have different parallelisation strategies and tradeoffs. We implement both algorithms using a data-parallel approach for Graphical Processing Units (GPUs) and we discuss the resulting performance on various combinations of […]
May, 19

Accelerated GPU Simulation of Compressible Flow by the Discontinuous Evolution Galerkin Method

The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation of the genuinely […]
May, 19

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Computer systems increasingly integrate heterogeneous computing elements like graphic processing units and specialized co-processors. The systematic programming and exploitation of such heterogeneous systems is still a subject of research. While many efforts address the programming of accelerators, scheduling heterogeneous systems, i. e., mapping parts of an application to accelerators at runtime, is still performed from […]
May, 19

Comparison and Analysis of GPGPU and Parallel Computing on Multi-Core CPU

There are two ways to improve the performance of the algorithm computing, which are general purpose of computation and parallel computation of multi-core CPU. By comparison and analysis, contrast the main difference between them, we reach a conclusion that GPU is suitable for processing large-scale data-parallel load of high-density computing but relatively simple branching logic, […]
May, 19

Applying Object Oriented Design Patterns to CUDA based Pyramidal Image Blending – An Experience

In this paper, we present Compute Unified Device Architecture i.e. CUDA based pyramidal image blending algorithm using an object oriented design patterns. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We introduce an object oriented framework […]
May, 19

The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method

Implementation of a direct solver for the symmetric positive definite sparse matrix of general structure exploiting the parallelism on the graphic card (GPU). Implementation of a direct solver using the Schur complement specially for the requirements of sparse system in bundle adjustment.
May, 19

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our […]
May, 19

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

Hyperspectral imaging is a growing area in remote sensing in which an imaging spectrometer collects hundreds of images (at different wavelength channels) for the same area on the surface of the Earth. Hyperspectral images are extremely high-dimensional, and require advanced on-board processing algorithms able to satisfy near real-time constraints in applications such as wildland fire […]
May, 16

An Introduction to the OpenCL Programming Model

This paper presents an overview of the OpenCL 1.1 standard [Khronos 2012]. We first motivate the need for GPGPU computing and then discuss the various concepts and technological background necessary to understand the programming model. We use concurrent matrix multiplication as a framework for explaining various performance characteristics of compiling and running OpenCL code, and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: