high performance computing on graphics processing units: hgpu.org

Posts

Mar, 3

Building Correlators with Many-Core Hardware

Radio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and LOFAR. However, the data rates are usually high and the processing requirements challenging. Many-core processors […]

CUDA

Mar, 3

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

NVIDIA CUDA and ATI Stream are the two major general-purpose GPU (GPGPU) computing technologies. We implemented RankBoost, a web relevance ranking algorithm, on both NVIDIA CUDA and ATI Stream platforms to accelerate the algorithm and illustrate the differences between these two technologies. It shows that the performances of GPU programs are highly dependent on the […]

CUDA

Mar, 3

Parallel Cycle Based Logic Simulation Using Graphics Processing Units

Graphics Processing Units (GPUs) are gaining popularity for parallelization of general purpose applications. GPUs are massively parallel processors with huge performance in a small and readily available package. At the same time, the emergence of general purpose programming environments for GPUs such as CUDA shorten the learning curve of GPU programming. We present a GPU-based […]

CUDA

Mar, 3

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Verification has grown to dominate the cost of electronic system design, consuming about 60% of design effort. Among several verification techniques, logic simulation remains the major verification technique. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). In the past, GPUs were special-purpose application […]

CUDA

Mar, 3

Real-time dynamic tone-mapping operator on GPU

This article presents the parallel implementation on a GPU of a real-time dynamic tone-mapping operator. The operator we describe in this article is generic and may be used by any application. However, the goal of our work is to integrate this operator into the graphic rendering process of a car driving simulator; thus, we studied […]

Mar, 3

Singular value decomposition for collaborative filtering on a GPU

A collaborative filtering predicts customers’ unknown preferences from known preferences. In a computation of the collaborative filtering, a singular value decomposition (SVD) is needed to reduce the size of a large scale matrix so that the burden for the next phase computation will be decreased. In this application, SVD means a roughly approximated factorization of […]

CUDA

Mar, 2

7th International Workshop on OpenMP, IWOMP 2011

The International Workshop on OpenMP (IWOMP) is an annual workshop dedicated to the promotion and advancement of all aspects of parallel programming with OpenMP. It is the premier forum to present and discuss issues, trends, recent research ideas and results related to parallel programming with OpenMP. The international workshop affords an opportunity for OpenMP users […]

Mar, 2

FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy

Fluorescence microscopy provides a powerful method for localization of structures in biological specimens. However, aspects of the image formation process such as noise and blur from the microscope’s point-spread function combine to produce an unintuitive image transformation on the true structure of the fluorescing molecules in the specimen, hindering qualitative and quantitative analysis of even […]

CUDA

•

OpenGL

Mar, 2

ECC2K-130 on NVIDIA GPUs

A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and Graphics Processing Units (GPUs), to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F2131. Optimizations have reduced the cost of the computation to approximately 2^77 bit operations in […]

CUDA

Mar, 2

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a graphics processing unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The […]

CUDA

Mar, 2

Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

In this chapter, we discuss multiple strategies to perform general computations on unstructured grids, with specific application to the assembly of finite element methods (FEMs). We review and apply two methods, discussed in depth in [1], for assembly of FEMs to produce and accelerate a FEM model for a nonlinear hyperelastic solid where the assembly, […]

CUDA

Mar, 2

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs)

In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs) [1]. In particular, we evaluated mixed- and emulatedprecision schemes on commodity graphics processors (GPUs), which […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Building Correlators with Many-Core Hardware

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

Parallel Cycle Based Logic Simulation Using Graphics Processing Units

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Real-time dynamic tone-mapping operator on GPU

Singular value decomposition for collaborative filtering on a GPU

7th International Workshop on OpenMP, IWOMP 2011

FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy

ECC2K-130 on NVIDIA GPUs

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs)

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)