
Nov, 4

Accelerating a TV based JPEG decompression algorithm with Cuda

In previous works, we have have developed a mathematical model for artifact-free decompression of JPEG images. There, the problem of finding an artifact-free decompression for a given JPEG compressed image is related to a convex minimization problem. We use a primal-dual algorithm to solve this problem, for which we have developed a Matlab and C++ […]
Nov, 4

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

For the blind separation of convolutive mixtures, a huge processing power is required. In this paper we propose a massive parallel implementation of the Independent Component Analysis in the time-frequency domain using the processing power of the current graphics adapters within the CUDA framework. The often used approach for solving the separation task is the […]
Nov, 4

Parallelization of maximum likelihood fits with OpenMP and CUDA

Data analyses based on maximum likelihood fits are commonly used in the high energy physics community for fitting statistical models to data samples. This technique requires the numerical minimization of the negative log-likelihood function. MINUIT is the most common package used for this purpose in the high energy physics community. The main algorithm in this […]
Nov, 4

Combined acoustic and optical trapping

Combining several methods for contact free micro-manipulation of small particles such as cells or micro-organisms provides the advantages of each method in a single setup. Optical tweezers, which employ focused laser beams, offer very precise and selective handling of single particles. On the other hand, acoustic trapping with wavelengths of about 1 mm allows the […]
Nov, 4

PEPSC: A Power-Efficient Processor for Scientific Computing

The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost […]
Nov, 4

Gyrofluid Modeling of Turbulent, Kinetic Physics

Gyrofluid models to describe plasma turbulence combine the advantages of fluid models, such as lower dimensionality and well-developed intuition, with those of gyrokinetics models, such as finite Larmor radius (FLR) effects. This allows gyrofluid models to be more tractable computationally while still capturing much of the physics related to the FLR of the particles. We […]
Nov, 4

Semi-Global Matching-Motivation, Developments and Applications

Since its original publication, the Semi-Global Matching (SGM) technique has been re-implemented by many researchers and companies. The method offers a very good trade off between runtime and accuracy, especially at object borders and fine structures. It is also robust against radiometric differences and not sensitive to the choice of parameters. Therefore, it is well […]
Nov, 4

Inter-cluster communication on clustered SIMD architectures

This work envisions that in the near future, GPUlike architectures will find their way to embedded systems. Accompanied by a small RISC control core, they will not merely be a hardware accelerator, but the heart of the system itself. Taking a state-of-the-art GPU, a baseline architecture is constructed with the embedded context in mind. Next, […]
Nov, 4

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

General purpose graphical processing units (GPU’s) offer high processing speeds for certain classes of highly parallelizable computations, such as matrix operations and Fourier transforms, that lie at the heart of first-principles electronic structure calculations. Inclusion of exact-exchange increases the cost of density functional theory by orders of magnitude, motivating the use of GPU’s. Porting the […]
Nov, 4

Computing Optimal Cycle Mean in Parallel on CUDA

Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem […]
Nov, 3

A Mutable Hardware Abstraction to Replace Threads

Ever since first digital images appeared, computer scientists all over the world have been trying to computationally estimate their similarity. So far, no solution as good as human brain was found. This paper presents another technique that tackles with this issue, using singular value decomposition – a matrix factorization method which extracts main features of […]
Nov, 3

Parallelization of the Generalized Hough Transform on GPU

Programs developed under the Compute Unified Device Architecture (CUDA) obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: