Posts
Jul, 25
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without […]
Jul, 25
Efficient Rendering of Scenes with Dynamic Lighting Using a Photons Queue and Incremental Update Algorithm
Photon mapping is a popular extension to the classic ray tracing algorithm in the field of realistic image synthesis. Moreover, it benefits from the massive parallelism computational power brought by recent developments in graphics processor hardware and programming models. However rendering the scenes with dynamic lights still greatly limits the performance due to the re-construction […]
Jul, 25
Modeling of Heterogeneous Architecture with GPU to Exascale System
The High-Performance Computing (HPC) community aimed for many years at increasing performance regardless of energy consumption. However, energy is limiting the scalability of the next generation of supercomputers. Current HPC systems already consume huge amounts of power, in the order of a few MegaWatts (MW). The future HPC systems intend to achieve 10 to 100 […]
Jul, 25
Fast Exhaustive Search for Quadratic Systems in F2 on FPGAs – Extended Version
In 2010, Bouillaguet et al. proposed an efficient solver for polynomial systems over $mathbb{F}_2$ that trades memory for speed. As a result, 48 quadratic equations in 48 variables can be solved on a graphics card (GPU) in 21 minutes. The research question that we would like to answer in this paper is how specifically designed […]
Jul, 25
High Performance Implementation of Ultrasound Color Doppler Imaging on GPU platform
The ability to detect and assess information of blood flow in color Doppler imaging (CDI) has played an important role in a modern ultrasound imaging system. However, it has been mainly implemented on custom-designed hardware due to large amount of data and computations. Recent trend of programmable approach offers the advantages of flexibility and quick […]
Jul, 25
PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs
The library PRAND for pseudorandom number generation for modern CPUs and GPUs is presented. It contains both single-threaded and multi-threaded realizations of a number of modern and most reliable generators recently proposed and studied in [1,2,3,4,5] and the efficient SIMD realizations proposed in [6]. One of the useful features for using PRAND in parallel simulations […]
Jul, 24
Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU
Improving energy efficiency is an ongoing challenge in HPC because of the ever-increasing need for performance coupled with power and economic constraints. Though GPU-accelerated heterogeneous computing systems are capable of delivering impressive performance, it is necessary to explore all available power-aware technologies to meet the inevitable energy efficiency challenge. In this paper, we experimentally study […]
Jul, 24
Scheduling by Work-Stealing in Hybrid Parallel Architectures
Nowadays, parallel computing systems have been based on multicore CPUs and specialized coprocessors, such as GPUs, due to the limits achieved by traditional architectures. In order to obtain the expected performance in these systems, the workload must be distributed and redistributed in an efficient way through some technique of scheduling, like work-stealing. This work aims […]
Jul, 24
Parallel Implementation of Texture Based Image Retrieval on The GPU
Most image processing algorithms are inherently parallel, so multithreading processors are suitable in such applications. In huge image databases, image processing takes very long time for run on a single core processor because of single thread execution of algorithms. Graphical Processors Units (GPU) is more common in most image processing applications due to multithread execution […]
Jul, 24
Implementation of Filtering Beamforming Algorithms for Sonar Devices Using GPU
Beamforming is a signal processing technique used in sensor arrays to direct signal transmission or reception. Beamformer combines input signals in the array to achieve constructive interference at particular angles (beams) and destructive interference for other angles. According to the following facts: 1) Beamforming can be computationally intensive, so real-time sonar beamforming algorithms in sonar […]
Jul, 24
CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s
We present a novel Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s (CDFC) technique to perform collision queries between rigid and/or deformable models. Our method can handle arbitrary deformations and even discontinuous ones. With our approach, we subdivide the scene into connected but totally independent parts by fuzzy clustering, and therefore, the […]
Jul, 22
Multi-core CUDA Architecture for Parallelization of Hierarchical Text Clustering
Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large […]