14414

Posts

Aug, 12

Accelerating IISPH: A Parallel GPGPU Solution Using CUDA

CONTEXT: Simulating realistic fluid behavior in incompressible fluids for computer graphics has been pioneered with the implicit incompressible smoothed particle hydrodynamics (IISPH) solver. The algorithm converges faster than other incompressible SPH-solvers, but real-time performance (in the perspective of video games, 30 frames per second) is still an issue when the particle count increases. OBJECTIVES: This […]
Aug, 12

GPU Pro 6: Advanced Rendering Techniques

The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro6: Advanced Rendering Techniques assembles a high-quality […]
Aug, 12

Performance analysis of parallel gravitational N-body codes on large GPU cluster

We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales – NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling […]
Aug, 12

Efficient Numerical Evaluation of Feynman Integral

Feynman loop integral is the key ingredient of high order radiation effect, which is responsible for reliable and accurate theoretical prediction. We improve the efficiency of numerical integration in sector decomposition by implementing quasi-Monte Carlo method associated with the technique of CUDA/GPU. For demonstration we present the results of several Feynman integrals up to two […]
Aug, 11

Portable parallelized blowfish via RenderScript

The recent rise in the popularity of mobile computing has brought the attention of mobile security to the forefront. As users depend more on tablets and smartphones, sensitive data is left to be secured using devices with vastly weaker resources than a typical computer. As mobile technology matures, the industry is starting to provide devices […]
Aug, 11

SINGA: Putting Deep Learning in the Hands of Multimedia Users

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Two key factors behind deep learning’s remarkable achievement are the immense computing power and the availability of massive training datasets, which enable us to train large models to capture complex regularities of the data. There are […]
Aug, 11

Optimizing strassen matrix multiply on GPUs

Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and […]
Aug, 11

A Parallel Implementation of the Self Organising Map using OpenCL

The self organising map is a machine learning algorithm used to produce low dimensional representations of high dimensional data. While the process is becoming more and more useful with the rise of big data, it is hindered by the sheer amount of time the algorithm takes to run serially. This project produces a parallel version […]
Aug, 11

GPU-Disasm: A GPU-based x86 Disassembler

Static binary code analysis and reverse engineering are crucial operations for malware analysis, binary-level software protections, debugging, and patching, among many other tasks. Faster binary code analysis tools are necessary for tasks such as analyzing the multitude of new malware samples gathered every day. Binary code disassembly is a core functionality of such tools which […]
Aug, 10

Visual, Spatial and Temporal Quality in Video-Based Reconstruction of People: Achieving, Prototyping and Evaluating

Capturing, recreating and representing a high fidelity virtual representation of the dynamic human form has long been a target for a diverse range of applications including tele-presence, games, film and TV special effects. The complexity of the challenge, to achieve a lifelike, faithful and believable representation, is such that a wide range of techniques and […]
Aug, 10

Accelerating the pre-processing stages of JPEG encoder on a heterogenous system using OpenCL

Color space conversion and downsampling are among the major computationally intensive steps in typical image and video codec standards, and accelerating these steps will improve the performances of these applications significantly. In this paper, we describe the parallel implementation of the color space conversion and downsampling as pre-processing steps for the JPEG encoder in a […]
Aug, 10

Places205-VGGNet Models for Scene Recognition

VGGNets have turned out to be effective for object recognition in still images. However, it is unable to yield good performance by directly adapting the VGGNet models trained on the ImageNet dataset for scene recognition. This report describes our implementation of training the VGGNets on the large-scale Places205 dataset. Specifically, we train three VGGNet models, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: