7133

Posts

Jan, 26

CUDA raytracing algorithm for visualizing discrete element model output

A raytracing algorithm is constructed using the CUDA API for visualizing output from a CUDA discrete element model, which outputs spatial information in dynamic particle systems. The raytracing algorithm is optimized with constant memory and compilation flags, and performance is measured as a function of the number of particles and the number of pixels. The […]
Jan, 26

CUDA Fortran for Scientists and Engineers

This document in intended for scientists and engineers who develop or maintain computer simulations and applications in Fortran, and who would like to harness parallel processing power of graphics processing units (GPUs) to accelerate their code. The goal here is to provide the reader with the fundamentals of GPU programming using CUDA Fortran as well […]
Jan, 26

A Parallel Ant Colony Optimization Algorithm for the Travelling Salesman Problem: Improving Performance Using CUDA

The ant colony optimization (ACO) algorithm is a metaheuristic algorithm used for combinatorial optimization problems. It is a good choice for many hard combinatorial problems because it is more efficient that brute force methods and produces better solutions than greedy algorithms. However, ACO is computationally expensive, and it can still take a long time to […]
Jan, 26

Using a GPU, Online Diarization – Offline Diarization

This article presents a low-latency, online speaker diarization system ("who is speaking now?") based on the repeated execution of a GPU-optimized, highly efficient offline diarization system ("who spoke when"). The system fulfills all requirements of the diarization task, i.e., it does not require any a priori information about the input, including specific speaker models. In […]
Jan, 26

Parallel Symbolic Analysis of Large Analog Circuits on GPU Platforms

Graph-based symbolic technique is a viable tool for calculating the behavior or the characterization of an analog circuit. Traditional symbolic analysis tools typically are used to calculate the behavior or the characteristic of a circuit in terms of symbolic parameters (Gielen et al., 1994). The introduction of determinant decision diagrams based symbolic analysis technique allows […]
Jan, 26

Visual Data Mining Using the Point Distribution Tensor

We explore a novel algorithm to analyze arbitrary distributions of 3D-points. Using a direct tensor field visualization technique allows to easily identify regions of linear, planar or isotropic structure. This approach is very suitable for visual data mining and exemplified upon geoscience applications. It allows to distinguish, for example, power lines and flat terrains in […]
Jan, 25

Scalable Parallel Minimum Spanning Forest Computation

The proliferation of data in graph form calls for the development of scalable graph algorithms that exploit parallel processing environments. One such problem is the computation of a graph’s minimum spanning forest (MSF). Past research has proposed several parallel algorithms for this problem, yet none of them scales to large, high-density graphs. In this paper […]
Jan, 25

Parallel LDPC Decoder Implementation on GPU Based on Unbalanced Memory Coalescing

We consider flexible decoder implementation of low density parity check (LDPC) codes via compute-unified-devicearchitecture (CUDA) programming on graphics processing unit (GPU), a research subject of considerable recent interest. An important issue in LDPC decoder design based on CUDA-GPU is realizing coalesced memory access, a technique that reduces memory transaction time considerably. In previous works along […]
Jan, 25

Multifrontal Sparse Matrix Factorization on Graphics Processing Units

For many finite element problems, when represented as sparse matrices, iterative solvers are found to be unreliable because they can impose computational bottlenecks. Early pioneering work by Duff et al, explored an alternative strategy called multifrontal sparse matrix factorization. This approach, by representing the sparse problem as a tree of dense systems, maps well to […]
Jan, 25

TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture

Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the lastlevel cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, […]
Jan, 25

Parallel Algorithm Design and Implementation of Regular/Irregular Problems: An In-depth Performance Study on Graphics Processing Units

Recently, interest in the Graphics Processing Unit (GPU) for general purpose parallel applications development and research has grown. Much of the current research on the GPU focuses on the acceleration of regular problems, as irregular problems typically do not provide the same level of performance on the hardware. We explore the potential of the GPU […]
Jan, 25

PyCOOL – a Cosmological Object-Oriented Lattice code written in Python

There are a number of different phenomena in the early universe that have to be studied numerically with lattice simulations. This paper presents a graphics processing unit (GPU) accelerated Python program called PyCOOL that solves the evolution of scalar fields in a lattice with very precise symplectic integrators. The program has been written with the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: