10383

Posts

Aug, 17

Solving 3D viscous incompressible Navier-Stokes equations using CUDA

A CUDA implementation of the 3D viscous incompressible Navier-Stokes equations is proposed using as advection operator the BFECC (Back and Forth Error Compensation and Correction) scheme. The Poisson problem for pressure is solved with a CG (Conjugated Gradient) preconditioning the system with FFTs (Fast Fourier Transforms). Study cases such as Lid-Driven Cavity and Flow Past […]
Aug, 17

Performance Analysis of a Symmetric Cryptography Algorithm on GPU and GPU Cluster

This article presents a performance analysis of the symmetric encryption algorithm AES (Advanced Encryption Standard) on a machine with one GPU and a cluster of GPUs, for cases in which the memory required by the algorithm is more than that of a GPU. Two implementations were carried out, based on C language, that use the […]
Aug, 17

Formal specification and verification of OpenCL Kernel optimization

Computing general problems using the graphical processing unit (GPU) of a device is an emerging field. The parallel structure of the GPU allows for massive concurrency, when executing a program. Therefore, by executing (a part of) the code on the GPU, a previously unused resource can be used, to achieve a speed-up of an application. […]
Aug, 17

Acceleration of Feynman loop integrals in high-energy physics on many core GPUs

The current and future colliders in high-energy physics require theorists to carry out a large scale computation for a precise comparison between experimental results and theoretical ones. In a perturbative approach several methods to evaluate Feynman loop integrals which appear in the theoretical calculation of cross-sections are well established in the one-loop level, however, more […]
Aug, 17

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

The discrepancy in the mass-density profile of dark matter halos between simulations and observations, the core-cusp problem, is a long-standing open question in the standard paradigm of cold dark matter cosmology. Here, we study the dynamical response of dark matter halos to oscillations of the galactic potential which are induced by a cycle of gas […]
Aug, 16

Accelerating Random Forests on CPUs and GPUs for Object-Class Image Segmentation

Random forests are a machine learning method that has recently become popular in the computer vision community to solve image segmentation and object detection tasks. Existing random forest implementations are either general purpose and not efficiently applicable for image segmentation or focus only on the speed of prediction. The implementation for the Microsoft Kinect gaming […]
Aug, 16

GPU-Accelerated Scalable Solver for Banded Linear Systems

Solving a banded linear system efficiently is important to many scientific and engineering applications. Current solvers achieve good scalability only on the linear systems that can be partitioned into independent subsystems. In this paper, we present a GPU based, scalable Bi-Conjugate Gradient Stabilized solver that can be used to solve a wide range of banded […]
Aug, 16

Lossless LZW Data Compression Algorithm on CUDA

Data compression is an important area of information and communication technologies it seeks to reduce the number of bits used to store or transmit information. It will efficiently utilizes the memory spaces and allows to transmit data within a limited bandwidth. Most compression process is achieved by removing data redundancy while preserving information content. Data […]
Aug, 16

Towards Path Tracing in Games

We investigate GPU path tracing performance in the context of real-time rendering for games. We propose a reformulation of Russian roulette, as well as an efficient implementation of the path regeneration algorithm by Novak et al. [Novak et al. 2010]. We show that a combination of these algorithms provides high performance for a variety of […]
Aug, 16

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex […]
Aug, 15

Parallel Gravitation Field Algorithm Based on the CUDA Platform

Gravitation Field Algorithm (GFA) is a simple but very effective heuristic search algorithm. This algorithm has obvious advantages in multimodal function optimization problems compared with SA and GA. However, when we want to get a more precise global optimal value, it needs a lot of initial dusts involved in computing, which causes a low efficiency […]
Aug, 15

General Transformations for GPU Execution of Tree Traversals

With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU memory hierarchy is well-understood for regular applications that operate on dense data structures such as arrays and matrices. However, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org