high performance computing on graphics processing units: hgpu.org

Posts

Jun, 6

A MapReduce Framework for Heterogeneous Computing Architectures

Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiency compared to traditional multicore processors, graphics processing units (GPUs) has […]

CUDA

•

OpenCL

Jun, 6

A comprehensive study of Dynamic Memory Management in OpenCL kernels

Traditional (sequential) applications use malloc for a variety of dynamic data structures, like linked lists or trees. GPGPU is gaining attention and popularity because its massively-parallel architecture allows for great speed improvement for programs that can be parallelised and implemented for a platform like OpenCL. Programmers who try to port their existing sequential or even […]

OpenCL

Jun, 6

A Reliable Throughput Gain on GPUs

Graphic Processing Units (GPUs) are widely employed in many applications in which high computing capabilities are required and parallelism can be fruitfully exploited. A higher amount of parallel threads bring to the GPU a higher throughput, but may also increase the code neutron-induced error rate. The GPUs sensitivity depends not only on the code throughput, […]

CUDA

Jun, 6

Automating elimination of idle functions by run-time reconfiguration

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse […]

CUDA

Jun, 6

Implicit Skinning: Real-Time Skin Deformation with Contact Modeling

Geometric skinning techniques, such as smooth blending or dualquaternions, are very popular in the industry for their high performances, but fail to mimic realistic deformations. Other methods make use of physical simulation or control volume to better capture the skin behavior, yet they cannot deliver real-time feedback. In this paper, we present the first purely […]

CUDA

Jun, 6

ElastiFace: Matching and Blending Textured Faces

In this paper we present ELASTIFACE, a simple and versatile method for establishing correspondence between textured face models, either for the construction of a blend-shape facial rig or for the exploration of new characters by morphing between a set of input models. While there exists a wide variety of approaches for inter-surface mapping and mesh […]

OpenCL

Jun, 6

Accelerating Fast Fourier Transform for Wideband Channelization

Wideband channelization is a compute-intensive task with performance requirements that are arguably greater than what current multi-core CPUs can provide. To date, researchers have used dedicated hardware such as field programmable gate arrays (FPGAs) to address the performancecritical aspects of the channelizer. In this work, we assess the viability of the graphics processing unit (GPU) […]

OpenCL

Jun, 4

Efficient Execution of AMR Computations on GPU Systems

Adaptive Mesh Refinement (AMR) is a method which dynamically varies the spatio – temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. Due to high resolution discretization of localized regions of interests into rectangular mesh units called patches, AMR provides low cost of computations and high degree […]

CUDA

Jun, 4

Towards shared memory consistency models for GPUs

With the widespread use of GPUs, it is important to ensure that programmers have a clear understanding of their shared memory consistency model i.e. what values can be read when issued concurrently with writes. While memory consistency has been studied for CPUs, GPUs present very different memory and concurrency systems and have not been well […]

CUDA

Jun, 4

Using RenderScript and RCUDA for Compute Intensive tasks on Mobile Devices: a Case Study

The processing power of mobile devices is continuously increasing. In this paper we perform a case study in which we assess three different programming models that can be used to leverage this processing power for compute intensive tasks. We use an imaging algorithm and compare a reference implementation of this algorithm based on OpenCV with […]

CUDA

Jun, 4

Coating Process Monitoring Using Computer Vision

The aim of this Bachelor’s Thesis was to make a prototype system for Metso Paper Inc. for monitoring a paper roll coating process. If the coating is done badly and there are faults one has to redo the process which lowers the profits of the company since the process is costly. The work was proposed […]

CUDA

Jun, 4

Parallel Acceleration on Manycore Systems and Its Performance Analysis: OpenCL Case Study

OpenCL (Open Computing Language) is a heterogeneous programming framework for developing applications that executes across a range of device types made by different vendors[11] which efficiently maps to both heterogeneous and homogeneous, single or multiple device system consisting of CPUs, GPUs and others types of devices. OpenCL provides many benefits in the field of high-performance […]

OpenCL