3615

Posts

Apr, 7

Practical examples of GPU computing optimization principles

In this paper, we provide examples to optimize signal processing or visual computing algorithms written for SIMT-based GPU architectures. These implementations demonstrate the optimizations for CUDA or its successors OpenCL and DirectCompute. We discuss the effect and optimization principles of memory coalescing, bandwidth reduction, processor occupancy, bank conflict reduction, local memory elimination and instruction optimization. […]
Apr, 7

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact […]
Apr, 7

Scaling Hierarchical N-body Simulations on GPU Clusters

This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation […]
Apr, 6

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich […]
Apr, 6

An algorithmic incremental and iterative development method to parallelize dusty-deck FORTRAN HPC codes in GPGPUs using CUDA

State-of-the-art high-speed and economical graphic card processors (GPUs) provide high multiprocessing power for high performance computing (HPC). But software development for high performance computing is profound and requires a good comprehension of algorithms, applications, and architectures. This paper outlines an incremental and iterative software development process for porting dusty-deck HPC application source codes to a […]
Apr, 6

Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs

Due to the demand of high definition graphics presentation in gaming and video market, graphics processing units (GPUs) have drastically increased their computational capacities. General-purpose computation on GPUs uses the fragment shader multicore of these processing units to concurrently process data streams. However, the I/O overheads in recursive GPGPU applications have a negative impact in […]
Apr, 6

High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA

Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times thus raising the demand for performance accelerations by efficient […]
Apr, 6

A Hybrid Computational Grid Architecture for Comparative Genomics

Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved among species, as well as genes that give each organism its unique characteristics. However, the huge datasets involved makes this approach impractical on traditional computer architectures leading to prohibitively long runtimes. In this paper, we present […]
Apr, 6

Barnes-hut treecode on GPU

General-purpose computation on graphics processing units (GPGPU) has become a popular field of study. Due to its high computing capacity and relatively low price, GPU has been an ideal processing unit for many scientific applications, among which is N-body simulation. According to the published papers, a simple O(N^2) algorithm of N-body simulation has achieved some […]
Apr, 6

Parallel and distributed seismic wave field modeling with combined Linux clusters and graphics processing units

General-purpose computing on graphics processing units (GPGPU) is a fast developing method of high performance computing (HPC). In some cases even a low-end video card can be several to dozens times faster than a modem CPU core. Seismic wave filed modeling is one of the problems of this kind. But in some modern methods of […]
Apr, 6

Implementation of Ant Colony Algorithm Based on GPU

Ant colony algorithm is an efficient intelligent algorithm to solve NP hard problem. This paper presents a parallel computing solution based on General Purpose GPU (GPGPU) to solve traveling salesman problem (TSP) with max-min ant system (MMAS). The experimental result shows it is more efficient than pure CPU computing.
Apr, 6

High-speed electromagnetic field simulation by HIE-FDTD method with GPGPU

The HIE(Hybrid Implicit-Explicit)-FDTD method is very useful for the simulation of computational domain with thin cells. This paper describes the HIE-FDTD method with GPGPU(General Purpose computing on Graphic Processing Unit) for massively parallel electromagnetic field simulation. First, the properties of the HIE-FDTD method are explained. Next, 3D HIE-FDTD method with CUDA is implemented. Finally, it […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: