9636

Posts

Jun, 6

Development of an explicit pressure-based unstructured solver for three-dimensional incompressible flows with graphics hardware acceleration

In this research, a numerical algorithm was developed to solve the incompressible Navier-Stokes equations using explicit time stepping. The goal of this research was to develop an unsteady SIMPLER based algorithm with lower computational overhead. The new explicit algorithm uses a four stage Runge-Kutta scheme to update the velocities and eliminates the need for the […]
Jun, 6

Parallelization & checkpointing of GPU applications through program transformation

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general-purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, […]
Jun, 6

A MapReduce Framework for Heterogeneous Computing Architectures

Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiency compared to traditional multicore processors, graphics processing units (GPUs) has […]
Jun, 6

A comprehensive study of Dynamic Memory Management in OpenCL kernels

Traditional (sequential) applications use malloc for a variety of dynamic data structures, like linked lists or trees. GPGPU is gaining attention and popularity because its massively-parallel architecture allows for great speed improvement for programs that can be parallelised and implemented for a platform like OpenCL. Programmers who try to port their existing sequential or even […]
Jun, 6

A Reliable Throughput Gain on GPUs

Graphic Processing Units (GPUs) are widely employed in many applications in which high computing capabilities are required and parallelism can be fruitfully exploited. A higher amount of parallel threads bring to the GPU a higher throughput, but may also increase the code neutron-induced error rate. The GPUs sensitivity depends not only on the code throughput, […]
Jun, 6

Automating elimination of idle functions by run-time reconfiguration

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse […]
Jun, 6

Implicit Skinning: Real-Time Skin Deformation with Contact Modeling

Geometric skinning techniques, such as smooth blending or dualquaternions, are very popular in the industry for their high performances, but fail to mimic realistic deformations. Other methods make use of physical simulation or control volume to better capture the skin behavior, yet they cannot deliver real-time feedback. In this paper, we present the first purely […]
Jun, 6

ElastiFace: Matching and Blending Textured Faces

In this paper we present ELASTIFACE, a simple and versatile method for establishing correspondence between textured face models, either for the construction of a blend-shape facial rig or for the exploration of new characters by morphing between a set of input models. While there exists a wide variety of approaches for inter-surface mapping and mesh […]
Jun, 6

Accelerating Fast Fourier Transform for Wideband Channelization

Wideband channelization is a compute-intensive task with performance requirements that are arguably greater than what current multi-core CPUs can provide. To date, researchers have used dedicated hardware such as field programmable gate arrays (FPGAs) to address the performancecritical aspects of the channelizer. In this work, we assess the viability of the graphics processing unit (GPU) […]
Jun, 4

Efficient Execution of AMR Computations on GPU Systems

Adaptive Mesh Refinement (AMR) is a method which dynamically varies the spatio – temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. Due to high resolution discretization of localized regions of interests into rectangular mesh units called patches, AMR provides low cost of computations and high degree […]
Jun, 4

Towards shared memory consistency models for GPUs

With the widespread use of GPUs, it is important to ensure that programmers have a clear understanding of their shared memory consistency model i.e. what values can be read when issued concurrently with writes. While memory consistency has been studied for CPUs, GPUs present very different memory and concurrency systems and have not been well […]
Jun, 4

Using RenderScript and RCUDA for Compute Intensive tasks on Mobile Devices: a Case Study

The processing power of mobile devices is continuously increasing. In this paper we perform a case study in which we assess three different programming models that can be used to leverage this processing power for compute intensive tasks. We use an imaging algorithm and compare a reference implementation of this algorithm based on OpenCV with […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org