Posts
Aug, 17
Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters
The discrepancy in the mass-density profile of dark matter halos between simulations and observations, the core-cusp problem, is a long-standing open question in the standard paradigm of cold dark matter cosmology. Here, we study the dynamical response of dark matter halos to oscillations of the galactic potential which are induced by a cycle of gas […]
Aug, 16
Accelerating Random Forests on CPUs and GPUs for Object-Class Image Segmentation
Random forests are a machine learning method that has recently become popular in the computer vision community to solve image segmentation and object detection tasks. Existing random forest implementations are either general purpose and not efficiently applicable for image segmentation or focus only on the speed of prediction. The implementation for the Microsoft Kinect gaming […]
Aug, 16
GPU-Accelerated Scalable Solver for Banded Linear Systems
Solving a banded linear system efficiently is important to many scientific and engineering applications. Current solvers achieve good scalability only on the linear systems that can be partitioned into independent subsystems. In this paper, we present a GPU based, scalable Bi-Conjugate Gradient Stabilized solver that can be used to solve a wide range of banded […]
Aug, 16
Lossless LZW Data Compression Algorithm on CUDA
Data compression is an important area of information and communication technologies it seeks to reduce the number of bits used to store or transmit information. It will efficiently utilizes the memory spaces and allows to transmit data within a limited bandwidth. Most compression process is achieved by removing data redundancy while preserving information content. Data […]
Aug, 16
Towards Path Tracing in Games
We investigate GPU path tracing performance in the context of real-time rendering for games. We propose a reformulation of Russian roulette, as well as an efficient implementation of the path regeneration algorithm by Novak et al. [Novak et al. 2010]. We show that a combination of these algorithms provides high performance for a variety of […]
Aug, 16
GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex […]
Aug, 15
Parallel Gravitation Field Algorithm Based on the CUDA Platform
Gravitation Field Algorithm (GFA) is a simple but very effective heuristic search algorithm. This algorithm has obvious advantages in multimodal function optimization problems compared with SA and GA. However, when we want to get a more precise global optimal value, it needs a lot of initial dusts involved in computing, which causes a low efficiency […]
Aug, 15
General Transformations for GPU Execution of Tree Traversals
With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU memory hierarchy is well-understood for regular applications that operate on dense data structures such as arrays and matrices. However, […]
Aug, 15
Programming Dense Linear Algebra Kernels on Vectorized Architectures
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers’ (i.e., LU, Cholesky) performance depend on GEMM’s […]
Aug, 15
First experiences with the Intel MIC architecture at LRZ
With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and error-prone. Trying to overcome these difficulties, Intel developed their own […]
Aug, 15
Detecting Data Races on OpenCL Kernels with Symbolic Execution
We present an automatic analysis technique for checking data races on OpenCL kernels. Our method defines symbolic execution techniques based on separation logic with suitable abstractions to automatically detect non-benign racy behaviours on kernels.
Aug, 14
Lattice Boltzmann Method for Simulating Turbulent Flows
The lattice Boltzmann method (LBM) is a relatively new method for fluid flow simulations, and is recently gaining popularity due to its simple algorithm and parallel scalability. Although the method has been successfully applied to a wide range of flow physics, its capabilities in simulating turbulent flow is still under-validated. Hence, in this project, a […]