high performance computing on graphics processing units: hgpu.org

Posts

Oct, 24

Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. A sentiment lexicon is a critical and essential resource for tagging subjective […]

CUDA

Oct, 24

A multi-Teraflop Constituency Parser using GPUs

Constituency parsing with rich grammars remains a computational challenge. Graphics Processing Units (GPUs) have previously been used to accelerate CKY chart evaluation, but gains over CPU parsers were modest. In this paper, we describe a collection of new techniques that enable chart evaluation at close to the GPU’s practical maximum speed (a Teraflop), or around […]

CUDA

Oct, 24

gEMpicker: A Highly Parallel GPU-Accelerated Particle Picking Tool for Cryo-Electron Microscopy

BACKGROUND: Picking images of particles in cryo-electron micrographs is an important step in solving the 3D structures of large macromolecular assemblies. However, in order to achieve sub-nanometre resolution it is often necessary to capture and process many thousands or even several millions of 2D particle images. Thus, a computational bottleneck in reaching high resolution is […]

CUDA

Oct, 24

Analysis of Genetic Expression with Microarrays using GPU Implemented Algorithms

DNA microarrays are used to simultaneously analyze the expression level of thousands of genes under multiple conditions; however, massive amount of data is generated making its analysis a challenge and an ideal candidate for massive parallel processing. Among the available technologies, the use of General Purpose computation on Graphics Processing Units (GPGPU) is an efficient […]

CUDA

Oct, 24

A Parallel PSO Algorithm for a Watermarking Application on a GPU

In this paper, a research about the usability, advantages and disadvantages of using Compute Unified Device Architecture (CUDA) is presented, implementing an algorithm based on populations called Particle Swarm Optimization (PSO) [5]. In order to test the performance of the proposed algorithm, a hide watermark image application is put into practice. The PSO is used […]

CUDA

Oct, 22

2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2^18) processors. We present error analysis and scientific application results from a series of more than ten 69 […]

CUDA

Oct, 22

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase flows are widely used in many practical applications in industry, such as oil industry, chemical and thermal engineering, bioengineering and medicine. Especially flows in tubes with granular layer. Multiphase flows in inclined tubes are poorly studied. Numerical study of multiphase flows in inclined tubes was performed. Cases of clear tube and tube with granular […]

CUDA

Oct, 22

SIMD Parallel Gibbs Sampling of Probabilistic Directed Acyclic Graphs

We present a single-chain parallelization strategy for Gibbs sampling of probabilistic Directed Acyclic Graphs, where contributions from child nodes to the conditional posterior distribution of a given node are calculated concurrently. For statistical models with many independent observations, such parallelism takes a Single-Instruction-Multiple-Data form, and can be efficiently implemented using multicore parallelization and vector instructions […]

Oct, 22

Massively parallel approximate Gaussian process regression

We explore how the big-three computing paradigms — symmetric multi-processor (SMC), graphical processing units (GPUs), and cluster computing — can together be brought to bare on large-data Gaussian processes (GP) regression problems via a careful implementation of a newly developed local approximation scheme. Our methodological contribution focuses primarily on GPU computation, as this requires the […]

CUDA

Oct, 22

Fingerprint Local Invariant Feature Extraction on GPU with CUDA

Driven from its uniqueness, immutability, acceptability, and low cost, fingerprint is in a forefront between biometric traits. Recently, the GPU has been considered as a promising parallel processing technology due to its high performance computing, commodity, and availability. Fingerprint authentication is keep growing, and includes the deployment of many image processing and computer vision algorithms. […]

CUDA

Oct, 21

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library […]

OpenCL

Oct, 21

An OpenCL-based Implementation of H.264 Encoder

We present an accelerated implementation of high-speed video stream encoder for the H.264 digital video codec standard. Based on the parallel processing techniques with GPU’s, we used an OpenCL-based GPU kernel programs. We achieved a high-level CPU-GPU interoperability, through making CPU perform all input/output operations and overall stream control, while GPU does the core encoding […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)

A multi-Teraflop Constituency Parser using GPUs

gEMpicker: A Highly Parallel GPU-Accelerated Particle Picking Tool for Cryo-Electron Microscopy

Analysis of Genetic Expression with Microarrays using GPU Implemented Algorithms

A Parallel PSO Algorithm for a Watermarking Application on a GPU

2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

SIMD Parallel Gibbs Sampling of Probabilistic Directed Acyclic Graphs

Massively parallel approximate Gaussian process regression

Fingerprint Local Invariant Feature Extraction on GPU with CUDA

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

An OpenCL-based Implementation of H.264 Encoder

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)