high performance computing on graphics processing units: hgpu.org

Posts

Jan, 29

Solving Discrete Logarithms in Smooth-Order Groups with CUDA

This paper chronicles our experiences using CUDA to implement a parallelized variant of Pollard’s rho algorithm to solve discrete logarithms in groups with cryptographically large moduli but smooth order using commodity GPUs. We first discuss some key design constraints imposed by modern GPU architectures and the CUDA framework, and then explain how we were able […]

CUDA

Jan, 29

Virtual Viewpoint Disparity Estimation and Convergence Check for Real-Time View Synthesis

In this paper, we propose a new method for real-time disparity estimation and intermediate view synthesis from stereoscopic images. Some 3D video systems employ both the left and right depth images for virtual view synthesis; however, we estimate only one disparity map at a virtual viewpoint. In addition, we utilize hierarchical belief propagation and convergence […]

CUDA

Jan, 29

A NPR System for Generating Floral Patterns based on L-System

In history, the decorative pattern represents the design of art. The styles of decorative patterns are unique according to different countries and cultures. Because of the decorative floral pattern giving human an elegant and abundance impression, it is applied to many fields including product packaging, advertising or multimedia materials design. In this paper, we simulate […]

CUDA

Jan, 29

Toward a Practical Implementation of Exemplar-Based Noise Robust ASR

In previous work it was shown that, at least in principle, an exemplar-based approach to noise robust ASR is possible. The method, sparse representation based classification (SC), works by modelling noisy speech as a sparse linear combination of speech and noise exemplars. After recovering the sparsest possible linear combination of labelled exemplars, noise robust posterior […]

Jan, 29

Construction of Efficient Kd-Trees for Static Scenes Using Voxel-visibility Heuristic

In the ray-tracing community, the surface-area heuristic (SAH) is used as a de facto standard strategy for building high-quality kd-trees. Although widely accepted as the best kd-tree construction method, it is based only on the surface-area measure, which often fails to re ect effectively the rendering characteristics of a given scene. This paper presents new […]

CUDA

Jan, 27

Parallelization of Myers Fast Bit-Vector Algorithm using GPGPU

Myers Fast Bit-Vector Algorithm for Approximate String Matching, further on referred as Myers algorithm only, is used to solve a string-matching problem in the informatics. String matching problems occurs if one text has to be compared with another text -a matching pattern or needle- for finding equalities, dissimilarities, or occurrences of this pattern in the […]

CUDA

•

OpenCL

Jan, 27

The system for visualization of synoptic objects

This work is devoted to developing tools for the visual analysis of tropical cyclones based on satellite data. The implemented system has an extensible set of algorithms for the loading, processing and visualization of data, mainly the spatial scalar fields. The well-known algorithms and author’s developments using CUDA technology, and shaders were used in creation […]

CUDA

Jan, 27

SMAA: Enhanced Subpixel Morphological Antialiasing

We present a new image-based, post-processing antialiasing technique, which offers practical solutions to the common, open problems of existing filter-based real-time antialiasing algorithms. Some of the new features include local contrast analysis for more reliable edge detection, and a simple and effective way to handle sharp geometric features and diagonal lines. This, along with our […]

Jan, 27

Bilateral Filtering with CUDA

This paper implements the Bilateral filter, using CUDA enhanced parallel computations. The Bilateral filter allows smoothing images, while preserving edges, in contrast to e.g. the Gaussian filter, which smoothes across edges. While delivering visually stunning results, Bilateral filtering is a costly operation. Using NVidia’s CUDA technology the filter can be parallelized to run on the […]

CUDA

Jan, 27

Accelerating incoherent dedispersion

Incoherent dedispersion is a computationally intensive problem that appears frequently in pulsar and transient astronomy. For current and future transient pipelines, dedispersion can dominate the total execution time, meaning its computational speed acts as a constraint on the quality and quantity of science results. It is thus critical that the algorithm be able to take […]

CUDA

Jan, 26

Parallel particle swarm optimization using GPGPU

This work presents a parallelization method for the Particle Swarm Optimization algorithm using a low-cost architecture: a General Purpose Graphics Processing Unit (GPGPU). The strategies to better suit the architecture main characteristics are addressed along success rates and convergence times for the optimization of Rastrigin’s and Ackley’s functions on a 30-dimensional search space, and compared […]

CUDA

Jan, 26

Defocus Magnification with CUDA

In photography, the application of depth-of-field can be used to make the main subject more prominent. Photographer can modify the range of depthof-field by adjusting the aperture size. Unfortunately, due to the limitation caused by the physical diameter of the lens aperture and the area of the photodiode, the compact camera cannot control the depth-of-field […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Solving Discrete Logarithms in Smooth-Order Groups with CUDA

Virtual Viewpoint Disparity Estimation and Convergence Check for Real-Time View Synthesis

A NPR System for Generating Floral Patterns based on L-System

Toward a Practical Implementation of Exemplar-Based Noise Robust ASR

Construction of Efficient Kd-Trees for Static Scenes Using Voxel-visibility Heuristic

Parallelization of Myers Fast Bit-Vector Algorithm using GPGPU

The system for visualization of synoptic objects

SMAA: Enhanced Subpixel Morphological Antialiasing

Bilateral Filtering with CUDA

Accelerating incoherent dedispersion

Parallel particle swarm optimization using GPGPU

Defocus Magnification with CUDA

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)