8419

Posts

Oct, 13

Extendable Pattern-Oriented Optimization Directives (extended version)

Algorithm-specific, i.e., semantic-specific optimizations have been observed to bring significant performance gains, especially for a diverse set of multi/many-core architectures. However, current programming models and compiler technologies for the state-of-the-art architectures do not exploit well these performance opportunities. In this paper, we propose a pattern-making methodology that enables algorithm-specific optimizations to be encapsulated into "optimization […]
Oct, 13

GPU-Based Local-Dimming for Power Efficient Imaging

This paper describes a local dimming method for reducing the power consumption of LCD monitors. Reducing this load is of ever growing importance as it is getting the dominant power consumer of mobile computing. As a side effect, our method does not only significantly reduce the power consumption but also improves the visual quality (see […]
Oct, 13

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intraand […]
Oct, 9

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms

Image segmentation is a very important step in many GIS applications. Mean shift is an advanced and versatile technique for clustering-based segmentation, and is favored in many cases because it is non-parametric. However, mean shift is very computationally intensive compared with other simple methods such as k-means. In this work, we present a hybrid design […]
Oct, 9

Applying Genetic Algorithms to Tune Heterogeneous Platform Configurations

Present need to move towards heterogeneous architectures has been well established. This has increased the importance of parallelization of software to achieve good performance. Use of mixed architectures exponentially increases the need of the programmer to understand the intricacies of the underlying hardware to achieve optimal speedup. Obtaining optimal performance on one such architecture is […]
Oct, 9

A PCG Implementation of an Elliptic Kernel in an Ocean Global Circulation Model Based on GPU Libraries

In this paper an inverse preconditioner for the numerical solution of an elliptic Laplace prob- lem of a global circulation ocean model is presented. The inverse preconditiong technique is adopted in order to efficiently compute the numerical solution of the elliptic kernel by using the Conjugate Gradient (CG) method. We show how the performance and […]
Oct, 9

Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of […]
Oct, 9

Efficient deconvolution methods for astronomical imaging: algorithms and IDL-GPU codes

The Richardson-Lucy method is the most popular deconvolution method in astronomy because it preserves the number of counts and the non-negativity of the original object. Regularization is, in general, obtained by an early stopping of Richardson-Lucy iterations. In the case of point-wise objects such as binaries or open star clusters, iterations can be pushed to […]
Oct, 8

Learning hash codes for efficient content reuse detection

Content reuse is extremely common in user generated mediums. Reuse detection serves as be the basis for many applications. However, along with the explosion of Internet and continuously growing uses of user generated mediums, the task becomes more critical and difficult. In this paper, we present a novel efficient and scalable approach to detect content […]
Oct, 8

Realtime Two-Way Coupling of Meshless Fluids and Nonlinear FEM

In this paper, we present a novel method to couple Smoothed Particle Hydrodynamics (SPH) and nonlinear FEM to animate the interaction of fluids and deformable solids in real time. To accurately model the coupling, we generate proxy particles over the boundary of deformable solids to facilitate the interaction with fluid particles, and develop an efficient […]
Oct, 8

Measuring the Performance of Realtime DSP Using Pure Data and GPU

In order to achieve greater amounts of computation while lowering the cost of artistic and scientific projects that rely on realtime digital signal processing techniques, it is interesting to study the performance of commodity parallel processing GPU cards coupled with commonly used software for realtime DSP. In this article, we describe the measurement of data […]
Oct, 8

GPU Accelerated NIDS Search

Network Intrusion Detection System (NIDS) analyzes network traffic for malicious activities and report’s findings from events that intend to compromise the security of the computers and other equipment. NIDS looks into both headers and payloads of the network packets to identify possible intrusions. NIDS models that only use Central Processing Units (CPU) such as the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: