6805

Posts

Dec, 26

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has […]
Dec, 26

OpenCL in Action: How to Accelerate Graphics and Computations

SUMMARY: OpenCL in Action is a thorough, hands-on presentation of OpenCL, with an eye toward showing developers how to build high-performance applications of their own. It begins by presenting the core concepts behind OpenCL, including vector computing, parallel programming, and multi-threaded operations, and then guides you step-by-step from simple data structures to complex functions. ABOUT […]
Dec, 26

A Novel GPU-Based Deformation Pipeline

We present a new deformation pipeline that is independent of the integration solver used and allows fast rendering of deformable soft bodies on the GPU. The proposed method exploits the transform feedback mechanism of the modern GPU to bypass the CPU read-back, thus, reusing the modified positions and/or velocities of the deformable object in a […]
Dec, 26

SHADOW3 API: The Application Programming Interface for the ray tracing code SHADOW

We developed the third version of SHADOW, a ray tracing software widely used to design optical system in the synchrotron world. SHADOW3 is written in Fortran 2003 and follows the new computer engineering standards. The users can always execute the program in the traditional file oriented approach. Moreover, advanced users can create personalized scripts, macros […]
Dec, 26

Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit

This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (GPU) in order to solve machine learning problems for large datasets. In particular, it contributes to the development of fast GPU based algorithms for calculating distance (i.e. similarity, affinity, closeness) matrix. It also presents the algorithm and implementation of a fast parallel Support […]
Dec, 26

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In […]
Dec, 25

Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to […]
Dec, 25

Turbo Bayesian Compressed Sensing

Compressed sensing (CS) theory specifies a new signal acquisition approach, potentially allowing the acquisition of signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is not directly acquired but reconstructed from a few measurements. One of the key problems in CS is how to recover the original signal […]
Dec, 25

Modeling Parallel Programs for Heterogeneous Computing

With the growing interest in multicore processors and Graphics Processing Units (GPUs), heterogeneous computing is an emerging necessity to fully utilize computing resources. In the traditional approach to parallel programming, programmers are required to manage the sequential part of a program while rewriting the parallel part to a more efficient parallel programming model. Our evaluation […]
Dec, 25

Methodology of control and supervision of web connected mobile robots with CUDA technology application

The main problem of the following paper is control and supervision of web connected mobile robots. Taking up this subject is justified by the need of developing new methods for control, supervision and integration of exis-ting modules (inspection robots, autonomous robots, mo-bile base station). The methodology consists of: multi ro-botic system structure, cognitive model of […]
Dec, 25

Accelerating linear system solutions using randomization techniques

We illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a […]
Dec, 25

G-CP: Providing Fault Tolerance on the GPU through Software Checkpointing

GPUs have become increasingly popular in recent years, in large part due to their potential to offer a large amount of computational power at low prices. GPU designers have also made GPU pipelines more general purpose and more programmable, which has made GPUs more attractive to a wider audience. Thus, it is increasingly important to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: