high performance computing on graphics processing units: hgpu.org

Posts

Aug, 13

An Introduction to High Performance Computing on AWS

This paper describes a range of high performance computing (HPC) applications that are running today on Amazon Web Services (AWS). You will learn best practices for cloud deployment, for cluster and job management, and for the management of third-party software. This whitepaper covers HPC use cases that include highly distributed, highly parallel grid computing applications, […]

CUDA

•

OpenCL

Aug, 13

Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field

Computation power to simulate sound fields from the three-dimensional numerical models has progressed fast; for example, using GPU cluster systems. We can render directivity, position, distance, and reverberation of sound sources in a practical time. Furthermore, a multichannel sound field system can be realized with low-cost digital-to-analog converter modules. Moreover, some researchers are trying to […]

CUDA

Aug, 13

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

Image restoration is a long-standing problem in low-level computer vision with many interesting applications. We describe a flexible learning framework to obtain simple but effective models for various image restoration problems. The proposed approach is based on the concept of nonlinear reaction diffusion, but we extend conventional nonlinear reaction diffusion models by highly parametrized linear […]

CUDA

Aug, 12

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

‘How can GPU acceleration be obtained as a service in a cluster?’ This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), […]

CUDA

Aug, 12

Accelerating IISPH: A Parallel GPGPU Solution Using CUDA

CONTEXT: Simulating realistic fluid behavior in incompressible fluids for computer graphics has been pioneered with the implicit incompressible smoothed particle hydrodynamics (IISPH) solver. The algorithm converges faster than other incompressible SPH-solvers, but real-time performance (in the perspective of video games, 30 frames per second) is still an issue when the particle count increases. OBJECTIVES: This […]

CUDA

Aug, 12

GPU Pro 6: Advanced Rendering Techniques

The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro6: Advanced Rendering Techniques assembles a high-quality […]

CUDA

•

OpenCL

•

OpenGL

Aug, 12

Performance analysis of parallel gravitational N-body codes on large GPU cluster

We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales – NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling […]

CUDA

Aug, 12

Efficient Numerical Evaluation of Feynman Integral

Feynman loop integral is the key ingredient of high order radiation effect, which is responsible for reliable and accurate theoretical prediction. We improve the efficiency of numerical integration in sector decomposition by implementing quasi-Monte Carlo method associated with the technique of CUDA/GPU. For demonstration we present the results of several Feynman integrals up to two […]

CUDA

Aug, 11

Portable parallelized blowfish via RenderScript

The recent rise in the popularity of mobile computing has brought the attention of mobile security to the forefront. As users depend more on tablets and smartphones, sensitive data is left to be secured using devices with vastly weaker resources than a typical computer. As mobile technology matures, the industry is starting to provide devices […]

Aug, 11

SINGA: Putting Deep Learning in the Hands of Multimedia Users

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Two key factors behind deep learning’s remarkable achievement are the immense computing power and the availability of massive training datasets, which enable us to train large models to capture complex regularities of the data. There are […]

Aug, 11

Optimizing strassen matrix multiply on GPUs

Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and […]

CUDA

Aug, 11

A Parallel Implementation of the Self Organising Map using OpenCL

The self organising map is a machine learning algorithm used to produce low dimensional representations of high dimensional data. While the process is becoming more and more useful with the rise of big data, it is hindered by the sheer amount of time the algorithm takes to run serially. This project produces a parallel version […]

OpenCL