high performance computing on graphics processing units: hgpu.org

Posts

Mar, 3

The sparse matrix vector product on GPUs

The sparse matrix vector product (SpMV) is a paramount operation in engineering and scientific computing and, hence, has been a subject of intense research for long. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim […]

CUDA

Mar, 3

Unified – A Sharp Turn in the Latest Era of Graphic Processors

The need of high performance and realism has increased a lot in the last few decades, especially in gaming, 3D graphics and computationally demanding applications. It has compelled the GPU vendors to put their best effort towards the improvement of ILP (Instruction Level Parallelism). As a result of which, the GPU has entered in a […]

CUDA

Mar, 3

Building Correlators with Many-Core Hardware

Radio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and LOFAR. However, the data rates are usually high and the processing requirements challenging. Many-core processors […]

CUDA

Mar, 3

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

NVIDIA CUDA and ATI Stream are the two major general-purpose GPU (GPGPU) computing technologies. We implemented RankBoost, a web relevance ranking algorithm, on both NVIDIA CUDA and ATI Stream platforms to accelerate the algorithm and illustrate the differences between these two technologies. It shows that the performances of GPU programs are highly dependent on the […]

CUDA

Mar, 3

Parallel Cycle Based Logic Simulation Using Graphics Processing Units

Graphics Processing Units (GPUs) are gaining popularity for parallelization of general purpose applications. GPUs are massively parallel processors with huge performance in a small and readily available package. At the same time, the emergence of general purpose programming environments for GPUs such as CUDA shorten the learning curve of GPU programming. We present a GPU-based […]

CUDA

Mar, 3

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Verification has grown to dominate the cost of electronic system design, consuming about 60% of design effort. Among several verification techniques, logic simulation remains the major verification technique. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). In the past, GPUs were special-purpose application […]

CUDA

Mar, 3

Real-time dynamic tone-mapping operator on GPU

This article presents the parallel implementation on a GPU of a real-time dynamic tone-mapping operator. The operator we describe in this article is generic and may be used by any application. However, the goal of our work is to integrate this operator into the graphic rendering process of a car driving simulator; thus, we studied […]

Mar, 3

Singular value decomposition for collaborative filtering on a GPU

A collaborative filtering predicts customers’ unknown preferences from known preferences. In a computation of the collaborative filtering, a singular value decomposition (SVD) is needed to reduce the size of a large scale matrix so that the burden for the next phase computation will be decreased. In this application, SVD means a roughly approximated factorization of […]

CUDA

Mar, 2

7th International Workshop on OpenMP, IWOMP 2011

The International Workshop on OpenMP (IWOMP) is an annual workshop dedicated to the promotion and advancement of all aspects of parallel programming with OpenMP. It is the premier forum to present and discuss issues, trends, recent research ideas and results related to parallel programming with OpenMP. The international workshop affords an opportunity for OpenMP users […]

Mar, 2

FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy

Fluorescence microscopy provides a powerful method for localization of structures in biological specimens. However, aspects of the image formation process such as noise and blur from the microscope’s point-spread function combine to produce an unintuitive image transformation on the true structure of the fluorescing molecules in the specimen, hindering qualitative and quantitative analysis of even […]

CUDA

•

OpenGL

Mar, 2

ECC2K-130 on NVIDIA GPUs

A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and Graphics Processing Units (GPUs), to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F2131. Optimizations have reduced the cost of the computation to approximately 2^77 bit operations in […]

CUDA

Mar, 2

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a graphics processing unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

The sparse matrix vector product on GPUs

Unified – A Sharp Turn in the Latest Era of Graphic Processors

Building Correlators with Many-Core Hardware

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

Parallel Cycle Based Logic Simulation Using Graphics Processing Units

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Real-time dynamic tone-mapping operator on GPU

Singular value decomposition for collaborative filtering on a GPU

7th International Workshop on OpenMP, IWOMP 2011

FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy

ECC2K-130 on NVIDIA GPUs

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)