11219

Posts

Jan, 5

Opportunities for Parallelism in Matrix Multiplication

BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the "GotoBLAS approach" to implementing matrix multiplication (GEMM). While GEMM was previously implemented as three loops around an inner kernel, BLIS exposes two additional loops within that inner kernel, casting the computation in terms of the BLIS micro-kernel so […]
Jan, 5

Crack-free rendering of dynamically tesselated B-Rep models

We propose a versatile pipeline to render B-Rep models interactively, precisely and without rendering-related artifacts such as cracks. Our rendering method is based on dynamic surface evaluation using both tesselation and ray-casting, and direct GPU surface trimming. An initial rendering of the scene is performed using dynamic tesselation. The algorithm we propose reliably detects then […]
Jan, 5

A Push-Relabel-Based Maximum Cardinality Bipartite Matching Algorithm on GPUs

We design, develop, and evaluate an atomic- and lock-free GPU implementation of the push-relabel algorithm in the context of finding maximum cardinality matchings in bipartite graphs. The problem has applications on computer science, scientific computing, bioinformatics, and other areas. Although the GPU parallelization of the push-relabel technique has been investigated in the context of flow […]
Jan, 5

Parallel Irradiance Caching on the GPU

While ray tracing is highly parallelizable in concept, the Radiance suite of programs for architectural global illumination simulation was written for serial execution and makes use of certain heuristic techniques that are not easily performed in parallel environments. It uses irradiance caching to store and reuse the results of expensive indirect irradiation computations. The irradiance […]
Jan, 5

Multiple Bounding Boxes Algorithm in Collision Detection and Its Performances in Sequential vs CUDA Parallel Processing

The traditional method for detecting collisions in a 2D computer game uses a axis-aligned bounding box around each sprite, and checks to determine if the bounding boxes overlap periodically. Using this single bounding box method may result in a large amount of pixel intersection tests, since a sprite may be composed of areas where the […]
Jan, 3

A GPU-based real time trigger for rare kaon decays at NA62

This thesis reports a study for a new real-time trigger for the NA62 experiment based on Graphical Processing Units (GPUs). The NA62 experiment was devised to study with unprecedented precision the ultra-rare decay K+->pi+ nu nu_bar, a process mediated by Flavour-Changing Neutral Currents (FCNC) whose exceptional theoretical cleanliness provides a unique probe to test the […]
Jan, 3

Wavelet Encoding and Multi-GPU Programming

We investigate compression of large-volume spatial data using the wavelet transform, computed massively in parallel on NVIDIA graphics processing units (GPUs). In particular, Haar basis wavelets are used to achieve compression ratios of [100x] or more. Computation is done over a set of computing nodes consisting of multiple nodes and multiple GPUs per node. Significantly […]
Jan, 3

Adhoc On-Demand Distance Vector Protocol For Energy Efficiency

The use of computer networks is drastically growing and the need for enhancing the existing network protocols and enforcing communication security thus is increasing. Tools like network simulators are used by researchers in order to test new scenarios and protocols in a controlled and reproducible environment. They allow the user to represent various topologies, simulate […]
Jan, 3

Accelerating Simulation Codes through the GeMTC Framework

GPU Computing utilizes high level language to run sequential part of the code on the CPU as well as speeds up parallel part via running it on GPUs but GPUs are SIMD by default which means they can run only single instruction on multiple data. The introduction of GEMTC framework [1] addresses these limitations by […]
Jan, 3

Nemo: A parallelized Lagrangian particle-tracking model

Lagrangian particle-tracking models are a computationally intensive, but massively parallelizable method for investigating marine larval dispersal processes, seed dispersal of plants, or a variety of other material transport processes. In order to fully capture the distribution of potential dispersal patterns, highly efficient models with the capacity to simulate tens of millions or more particles are […]
Jan, 2

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e. for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: […]
Jan, 2

Interactive Ray-tracing Based on OptiX to Visualize Signed Distance Fields

We propose a parallel ray-tracing technique to visualize signed distance fields generated from triangular meshes based on NVIDIA OptiX. Our method visualizes signed distance fields with various distance offset values at interactive rates (2-12 fps). Our method utilizes a parallel kd-tree implementation to query the nearest triangle and the sphere tracing method to visualize the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: