high performance computing on graphics processing units: hgpu.org

Posts

Nov, 3

Colour flux-tubes in static Pentaquark and Tetraquark systems

The colour fields created by the static tetraquark and pentaquark systems are computed in quenched SU(3) lattice QCD, with gauge invariant lattice operators, in a 24^3 x 48 lattice at beta=6.2. We generate our quenched configurations with GPUs, and detail the respective benchmanrks in different SU(N) groups. While at smaller distances the coulomb potential is […]

Nov, 2

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

The recent time has seen the rise of consumer grade massively parallel environments. Powerful GPUs and multi-core processors became widely available and easy to use programming APIs such as nVidia CUDA, OpenCL, and DirectCompute simplify the development of applications that can utilize them. In this environment, the nature inspired metaheuristics can be in suitable cases […]

CUDA

Nov, 2

Multi-view Rendering Approach for Cloud-based Gaming Services

In order to render hundreds or thousands of views for multi-user games on a cloud-based gaming at interactive rates, we need a solution which is both scalable and efficient.We present a new cloud-based gaming service system which supports multiple viewpoint rendering for visualizing a 3D game scene dataset at the same time for the multi-user […]

Nov, 2

A shared file system abstraction for heterogeneous architectures

We advocate the use of high-level OS abstractions in heterogeneous systems, such as CPU-GPU hybrids. We suggest the idea of an inter-device shared file system (IDFS) for such architectures. The file system provides a unified storage space for seamless data sharing among processors and accelerators via a standard wellunderstood interface. It hides the asymmetric nature […]

Nov, 2

Analyzing Password Strength and Efficient Password Cracking

Passwords are still one of the most common means of securing computer systems. Most organizations rely on password authentication systems, and therefore, it is very important for them to enforce their users to have strong passwords. They usually try to enforce security by mandating users to follow password creation policies. They force users to follow […]

OpenCL

Nov, 2

Development of Virtual Machine Tool for Simulation and Evaluation

Machine tools play an important role in manufacturing, but it is not easy to ensure efficiency and low cost, for machine tools. A machine tool simulation and evaluation system is urgently required. This paper presents an integrated machine tool simulation and evaluation system, which incorporates 3-D motion simulation and collision detection. The evaluation system incorporates […]

OpenCL

Nov, 2

Acceleration Methods for Bayesian Network Sampling

Bayesian inference with Bayesian networks is a #P-complete problem in general. Exact Bayesian inference is feasible in practice only on small-scale Bayesian networks or networks that are dramatically simplified, such as with naive Bayes or other approximations. Stochastic sampling methods, in particular importance sampling, form one of the most prominent and efficient approximate inference techniques […]

OpenCL

Nov, 2

YaDiV-an open platform for 3D visualization and 3D segmentation of medical data

In this work, we present the concept, design and implementation of a new software to visualize and segment 3-dimensional medical data. The main goal was to create a platform that would allow to try out new approaches and ideas while staying independent from hardware and operating system, being especially useful for interdisciplinary research groups. A […]

OpenGL

Nov, 2

Hybrid GPU-CPU Adaptive Precision Ray-Triangle Intersection Tests for Robust High-Performance GPU Dosimetry Computations

Before an intervention on a nuclear site, it is essential to study different scenarios to identify the less dangerous one for the operator. Therefore, it is mandatory to dispose of an efficient dosimetry simulation code with accurate results. One classical method in radiation protection is the straight-line attenuation method with build-up factors. In the case […]

CUDA

Nov, 2

Finite Volume Errors in B_K

We discuss finite volume errors in our calculations of $B_K$ using improved staggered fermions on the MILC asqtad lattices. Using GPUs, we are now able to extrapolate using next-to-leading order (NLO) staggered SU(2) chiral perturbation theory including the finite volume corrections arising from pion loops. We find that the impact of FV fitting is very […]

CUDA

Nov, 2

Multi GPU Performance of Conjugate Gradient Solver with Staggered Fermions in Mixed Precision

GPU has a significantly higher performance in single-precision computing than that of double precision. Hence, it is important to take a maximal advantage of the single precision in the CG inverter, using the mixed precision method. We have implemented mixed precision algorithm to our multi GPU conjugate gradient solver. The single precision calculation use half […]

CUDA

Nov, 1

APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients

In this paper we show how it is possible to improve the efficiency of existing holistic forms of object detection by refining detection areas to smaller subsets. Although this method can be applied to any form of object detection, this paper will specifically focus on the topic of pedestrian detection in lowresolution non-stationary video footage.

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Colour flux-tubes in static Pentaquark and Tetraquark systems

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

Multi-view Rendering Approach for Cloud-based Gaming Services

A shared file system abstraction for heterogeneous architectures

Analyzing Password Strength and Efficient Password Cracking

Development of Virtual Machine Tool for Simulation and Evaluation

Acceleration Methods for Bayesian Network Sampling

YaDiV-an open platform for 3D visualization and 3D segmentation of medical data

Hybrid GPU-CPU Adaptive Precision Ray-Triangle Intersection Tests for Robust High-Performance GPU Dosimetry Computations

Finite Volume Errors in B_K

Multi GPU Performance of Conjugate Gradient Solver with Staggered Fermions in Mixed Precision

APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)