3721

Posts

Apr, 17

Efficient JPEG2000 EBCOT Context Modeling for Massively Parallel Architectures

Embedded Block Coding with Optimal Truncation (EBCOT) is the fundamental and computationally very demanding part of the compression process of JPEG2000 image compression standard. In this paper, we present a reformulation of the context modeling of EBCOT that allows full parallelization for massively parallel architectures such as GPUs with their single instruction multiple threads architecture. […]
Apr, 16

Exploiting Computational Resources in Distributed Heterogeneous Platforms

We have been witnessing a continuous growth of both heterogeneous computational platforms (e.g., Cell blades, or the joint use of traditional CPUs and GPUs) and multicore processor architecture; and it is still an open question how applications can fully exploit such computational potential efficiently. In this paper we introduce a run-time environment and programming framework […]
Apr, 16

Computation of Voronoi diagrams using a graphics processing unit

A parallel algorithm to compute a discrete approximation to the Voronoi diagram is presented. The algorithm, which executes in single instruction multiple data (SIMD) mode, was implemented on a high-end graphics processing unit (GPU) using NVIDIApsilas compute unified device architecture (CUDA) development environment. The performance of the resulting code is investigated and presented, and a […]
Apr, 16

Statistical testing of random number sequences using CUDA

Previous research in the field of statistical testing of random number sequences using Graphics Processing Units (GPU) has shown that this approach yields a significant increase in performance for a subset of the statistical tests proposed by National Institute of Standards and Technology (NIST). The present paper aims at further improvements in the performance of […]
Apr, 16

3-SAT on CUDA: Towards a massively parallel SAT solver

This work presents the design and implementation of a massively parallel 3-SAT solver, specifically targeting random problem instances. Our approach is deterministic and features very little communication overhead and basically no load-balancing cost at all. In the context of most current parallel SAT solvers running only on a handful of cores, we implemented our solver […]
Apr, 16

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

MapReduce is a programming model that enables efficient massive data processing in large-scale computing environments such as supercomputers and clouds. Such large-scale computers employ GPUs to enjoy its good peak performance and high memory bandwidth. Since the performance of each job is depending on running application characteristics and underlying computing environments, scheduling MapReduce tasks onto […]
Apr, 16

GP-GPU: Bridging the Gap between Modelling & Experimentation

Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time […]
Apr, 16

Parallel Lexicographic Names Construction with CUDA

Suffix array is a simpler and compact alternative to the suffix tree, lexicographic name construction is the fundamental building block in suffix array construction process. This paper depicts the design issues of first data parallel implementation of the lexicographic name construction algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, […]
Apr, 16

A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation

In finance, securities, such as stocks, funds, warrants and bonds, are actively traded in financial markets. Abundance of market data and accurate pricing of a security can help the practitioners arbitrage or hedge their position. It can also help researhers and traders design better trading strategies. In this work, we develop a pricing and data/information […]
Apr, 16

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing

Power dissipation is one of the most imminent limitation factors influencing the development of High Performance Computing (HPC). Toward power-efficient HPC on CPU-GPU hybrid platform, we are investigating software methodologies to achieve optimized power utilization by algorithm design and programming technique. In this paper we discuss power measurements of GPU, propose a method of automatic […]
Apr, 16

Accelerating Particle Swarm Algorithm with GPGPU

This paper focuses on solving large size optimization problems using GPGPU. Evolutionary Algorithms for solving these optimization problems suffer from the curse of dimensionality, which implies that their performance deteriorates as quickly as the dimensionality of the search space increases. This difficulty makes very challenging the performance studies for very high dimensional problems. Furthermore, these […]
Apr, 15

N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions

We present a high-performance N-body code for astronomical collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8MB cache and 3.40 GHz) based on Sandy […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: