Posts
Apr, 17
Performance of Optical Flow Techniques on Graphics Hardware
Since graphics cards have become programmable the recent years, numerous computationally intensive algorithms have been implemented on the now called general purpose graphics processing units (GPGPUs). While the results show that GPGPUs regularly outperform CPU based implementations, the question arose how optical flow algorithms can be ported to graphics hardware. To answer the question, the […]
Apr, 17
Data handling inefficiencies between CUDA, 3D rendering, and system memory
While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In […]
Apr, 17
A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries
Fluid simulation has been an active research field in computer graphics for the last 30 years. Stam’s stable fluids method, among others, is used for solving the equations that govern fluids (i.e. Navier-Stokes equations). An implementation of stable fluids in 3D using NVIDIA Compute Unified Architecture, shortly CUDA, is provided in this paper. This CUDA-based […]
Apr, 17
Efficient JPEG2000 EBCOT Context Modeling for Massively Parallel Architectures
Embedded Block Coding with Optimal Truncation (EBCOT) is the fundamental and computationally very demanding part of the compression process of JPEG2000 image compression standard. In this paper, we present a reformulation of the context modeling of EBCOT that allows full parallelization for massively parallel architectures such as GPUs with their single instruction multiple threads architecture. […]
Apr, 16
Exploiting Computational Resources in Distributed Heterogeneous Platforms
We have been witnessing a continuous growth of both heterogeneous computational platforms (e.g., Cell blades, or the joint use of traditional CPUs and GPUs) and multicore processor architecture; and it is still an open question how applications can fully exploit such computational potential efficiently. In this paper we introduce a run-time environment and programming framework […]
Apr, 16
Computation of Voronoi diagrams using a graphics processing unit
A parallel algorithm to compute a discrete approximation to the Voronoi diagram is presented. The algorithm, which executes in single instruction multiple data (SIMD) mode, was implemented on a high-end graphics processing unit (GPU) using NVIDIApsilas compute unified device architecture (CUDA) development environment. The performance of the resulting code is investigated and presented, and a […]
Apr, 16
Statistical testing of random number sequences using CUDA
Previous research in the field of statistical testing of random number sequences using Graphics Processing Units (GPU) has shown that this approach yields a significant increase in performance for a subset of the statistical tests proposed by National Institute of Standards and Technology (NIST). The present paper aims at further improvements in the performance of […]
Apr, 16
3-SAT on CUDA: Towards a massively parallel SAT solver
This work presents the design and implementation of a massively parallel 3-SAT solver, specifically targeting random problem instances. Our approach is deterministic and features very little communication overhead and basically no load-balancing cost at all. In the context of most current parallel SAT solvers running only on a handful of cores, we implemented our solver […]
Apr, 16
GP-GPU: Bridging the Gap between Modelling & Experimentation
Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time […]
Apr, 16
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
MapReduce is a programming model that enables efficient massive data processing in large-scale computing environments such as supercomputers and clouds. Such large-scale computers employ GPUs to enjoy its good peak performance and high memory bandwidth. Since the performance of each job is depending on running application characteristics and underlying computing environments, scheduling MapReduce tasks onto […]
Apr, 16
Parallel Lexicographic Names Construction with CUDA
Suffix array is a simpler and compact alternative to the suffix tree, lexicographic name construction is the fundamental building block in suffix array construction process. This paper depicts the design issues of first data parallel implementation of the lexicographic name construction algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, […]
Apr, 16
A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation
In finance, securities, such as stocks, funds, warrants and bonds, are actively traded in financial markets. Abundance of market data and accurate pricing of a security can help the practitioners arbitrage or hedge their position. It can also help researhers and traders design better trading strategies. In this work, we develop a pricing and data/information […]