8144

Posts

Aug, 16

Efficient GPU implementation of parameter estimation of a statistical model for online advertisement optimization

The optimization problem of estimating parameters using a maximum a-posterior (MAP) [3] approach on a non-linear statistical model with a large data set can be solved using an L-BFGS [10] algorithm. When dealing with an ever changing reality, the evaluation need to be fast to capture the immediacy of the observations. This thesis will present […]
Aug, 15

vCUDA Framework Development for GPU Virtualization

vCUDA is a middleware that allows an application to use a CUDA-compatible graphics processing unit (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. vCUDA is designed following the client-server distributed architecture. On one side, the client employs a library of wrappers to the […]
Aug, 15

Real root isolation for univariate polynomials on GPUs and multicores

I participate to the elaboration of the library cumodp. My objective is to develop code for the exact calculation of the real roots of univariate polynomials. Stating this problem is very easy. However, as one dives into the details, one realizes that there are lots of challenges in order to reach highly efficient algorithmic and […]
Aug, 15

GPU-Assisted Cryptography of Log-Structured Indices

General purpose programming of Graphics Processing Units (GPUs) is a relatively new technological advancement. GPUs contain vast amounts of computational power with their many core architectures. Within many computer systems the power of these GPUs often goes unused outside the realm of graphics. Many of today’s common computational tasks are well suited for the single […]
Aug, 15

A New Cooperative Evolutionary Multi-Swarm Optimizer Algorithm Based on CUDA Parallel Architecture Applied to Solve Engineering Optimization Problems

This paper presents a new Cooperative Evolutionary MultiSwarm Optimization Algorithm (CEMSO-GPU) based on CUDA parallel architecture applied to solve engineering problems. The focus on this approach is: The use of the concept of master/slave swarm with a mechanism of sharing data; and, the parallelism method based on the paradigm of General Purpose Computing on Graphics […]
Aug, 15

Optimizing the Computation of Eigenvalues Using Graphics Processing Units

In this paper, we first briefly describe some mathematical aspects regarding the computation of eigenvalues, followed by an original approach: a bisection algorithm useful in computing eigenvalues for a tridiagonal symmetric matrix of arbitrary size, using the computing capabilities of the latest graphics processing units that incorporate the Compute Unified Device Architecture. The novel approach […]
Aug, 14

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

A comparison of PGI OpenACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting pre-existing FORTRAN codes to utilize a GPUs is efficient and easy to implement with OpenACC and […]
Aug, 14

Accelerating cellular automata simulations using AVX and CUDA

We investigated various methods of parallelization of the Frish-Hasslacher-Pomeau (FHP) cellular automata algorithm for modeling fluid flow. These methods include SSE, AVX, and POSIX Threads for central processing units (CPUs) and CUDA for graphics processing units (GPUs). We present implementation details of the FHP algorithm based on AVX/SSE and CUDA technologies. We found that (a) […]
Aug, 14

Dynamic Warp Resizing in High-Performance SIMT

Modern GPUs synchronize threads grouped in a warp at every instruction. These results in improving SIMD efficiency and makes sharing fetch and decode resources possible. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with […]
Aug, 14

A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel

The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schroedinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling […]
Aug, 14

A GPU implementation of the Simulated Annealing Heuristic for the Quadratic Assignment Problem

The quadratic assignment problem (QAP) is one of the most difficult combinatorial optimization problems. An effective heuristic for obtaining approximate solutions to the QAP is simulated annealing (SA). Here we describe an SA implementation for the QAP which runs on a graphics processing unit (GPU). GPUs are composed of low cost commodity graphics chips which […]
Aug, 13

Orthorectification by Using GPGPU Method

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org