Posts
Aug, 13
An Introduction to High Performance Computing on AWS
This paper describes a range of high performance computing (HPC) applications that are running today on Amazon Web Services (AWS). You will learn best practices for cloud deployment, for cluster and job management, and for the management of third-party software. This whitepaper covers HPC use cases that include highly distributed, highly parallel grid computing applications, […]
Aug, 13
Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field
Computation power to simulate sound fields from the three-dimensional numerical models has progressed fast; for example, using GPU cluster systems. We can render directivity, position, distance, and reverberation of sound sources in a practical time. Furthermore, a multichannel sound field system can be realized with low-cost digital-to-analog converter modules. Moreover, some researchers are trying to […]
Aug, 13
Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration
Image restoration is a long-standing problem in low-level computer vision with many interesting applications. We describe a flexible learning framework to obtain simple but effective models for various image restoration problems. The proposed approach is based on the concept of nonlinear reaction diffusion, but we extend conventional nonlinear reaction diffusion models by highly parametrized linear […]
Aug, 12
Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application
‘How can GPU acceleration be obtained as a service in a cluster?’ This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), […]
Aug, 12
Accelerating IISPH: A Parallel GPGPU Solution Using CUDA
CONTEXT: Simulating realistic fluid behavior in incompressible fluids for computer graphics has been pioneered with the implicit incompressible smoothed particle hydrodynamics (IISPH) solver. The algorithm converges faster than other incompressible SPH-solvers, but real-time performance (in the perspective of video games, 30 frames per second) is still an issue when the particle count increases. OBJECTIVES: This […]
Aug, 12
GPU Pro 6: Advanced Rendering Techniques
The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro6: Advanced Rendering Techniques assembles a high-quality […]
Aug, 12
Performance analysis of parallel gravitational N-body codes on large GPU cluster
We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales – NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling […]
Aug, 12
Efficient Numerical Evaluation of Feynman Integral
Feynman loop integral is the key ingredient of high order radiation effect, which is responsible for reliable and accurate theoretical prediction. We improve the efficiency of numerical integration in sector decomposition by implementing quasi-Monte Carlo method associated with the technique of CUDA/GPU. For demonstration we present the results of several Feynman integrals up to two […]
Aug, 11
Portable parallelized blowfish via RenderScript
The recent rise in the popularity of mobile computing has brought the attention of mobile security to the forefront. As users depend more on tablets and smartphones, sensitive data is left to be secured using devices with vastly weaker resources than a typical computer. As mobile technology matures, the industry is starting to provide devices […]
Aug, 11
SINGA: Putting Deep Learning in the Hands of Multimedia Users
Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Two key factors behind deep learning’s remarkable achievement are the immense computing power and the availability of massive training datasets, which enable us to train large models to capture complex regularities of the data. There are […]
Aug, 11
Optimizing strassen matrix multiply on GPUs
Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and […]
Aug, 11
A Parallel Implementation of the Self Organising Map using OpenCL
The self organising map is a machine learning algorithm used to produce low dimensional representations of high dimensional data. While the process is becoming more and more useful with the rise of big data, it is hindered by the sheer amount of time the algorithm takes to run serially. This project produces a parallel version […]