14792

Posts

Nov, 3

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. […]
Nov, 3

Software Defined Radio over CUDA

Software Defined Radio (SDR) is a wireless communication system in which components of transmitters and receivers are mostly implemented by software (filters, mixers, modulators). Thanks to this approach, is possible to implement a single universal radio transceiver, capable of multi-mode and multi-standard wireless communications. These capabilities are very useful for researchers and radio amateur, who […]
Oct, 31

Energy-Efficient Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms

State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. […]
Oct, 31

Estimation of numerical reproducibility on CPU and GPU

Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. Reproducibility problems are particularly noticeable on new computing architectures such as multicore processors or GPUs (Graphics Processing […]
Oct, 31

Parallelization of Encryption and Hashing Algorithm Using GPU

With the development of the GPGPU (General-purpose computing on graphics processing units), more and more computing problems are solved by using the parallel property of GPU (Graphics Processing Unit). CUDA (Compute Unified Device Architecture) is a framework which makes the GPGPU more accessible and easier to learn for the general population of programmers. This is […]
Oct, 31

Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems

In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular […]
Oct, 31

A general tridiagonal solver for coprocessors: Adapting g-Spike for the Intel Xeon Phi

Manycores like the Intel Xeon Phi and graphics processing units like the NVIDIA Tesla series are prime examples of systems for accelerating applications that run on current CPU multicores. It is therefore of interest to build fast, reliable linear system solvers targeting these architectures. Moreover, it is of interest to conduct cross comparisons between algorithmic […]
Oct, 31

Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA

In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel computing algorithm by providing CUDA code for fast and easy implementation. We show that the simulations carried out on […]
Oct, 29

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose […]
Oct, 29

CLOP: A Multi-stage Compiler to Seamlessly Embed Heterogeneous Code

Heterogeneous programming complicates software development. We present CLOP, a platform that embeds code targeting heterogeneous compute devices in a convenient and clean way, allowing unobstructed data flow between the host code and the devices, reducing the amount of source code by an order of magnitude. The CLOP compiler uses the standard facilities of the D […]
Oct, 29

Approximation of BEM matrices using GPGPUs

The efficiency of boundary element methods depends crucially on the time required for setting up the stiffness matrix. The far-field part of the matrix can be approximated by compression schemes like the fast multipole method or $mathcal{H}$-matrix techniques. The near-field part is typically approximated by special quadrature rules like the Sauter-Schwab technique that can handle […]
Oct, 29

GPU Ray-Traced Collision Detection for Cloth Simulation

We propose a method to perform collision detection with cloths with ray-tracing. Our method is able to perform collision detection between cloths and volumetric objects (rigid or deformable) as well as collision detection between cloths (including auto-collision). Our method casts rays between objects to perform collision detection, and an inversion-handling algorithm is introduced to correct […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org