high performance computing on graphics processing units: hgpu.org

Posts

Jul, 13

International Conference on Parallel Computing 2013, ParCo2013

ParCo2013 continues the tradition of the international conferences on parallel computing started in Berlin, Germany in 1983. This makes it one of the longest running international conferences on parallel computing. Over the years the conference established itself as the foremost platform for exchanging know-how on the newest parallel computing strategies, technologies, methods and tools. The […]

Jul, 13

International Conference on Computational Physics, ICCP 2013

The XXXIV International Conference on Computational Physics is the premier forum for the presentation of new advances and research results in the fields of Computational Physics. The conference will bring together leading academic scientists, researchers and scholars in the domain of interest from around the world. Topics of interest for submission include, but are not […]

Jul, 13

GPU Technology Conference 2013, GTC 2013

GTC advances awareness of high performance computing, and connects the scientists, developers, graphic artists, designers, researchers, engineers, and IT managers who use GPUs to tackle enormous computational challenges. GTC 2013 will feature the latest breakthroughs and the most amazing content in GPU-enabled computing. Spanning 4 full days of world-class education delivered by some of the […]

Jul, 13

Linearised inversion with GPUs

Graphical Processing Units (GPUs) can provide considerable computational advantages over multi-core CPU nodes or distributed networks by locally accelerating certain types of floating point operations. However, when processing and inverting exploration scale seismic datasets we encounter two key problems – compounded disk IO (explicit routing through the host is necessary) and the relatively small memory […]

CUDA

Jul, 13

Real-Time Implementation of the Vertex Component Analysis Algorithm on GPUs

In this letter, we present a new parallel implementation of the vertex component analysis (VCA) algorithm for spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units. We first developed a C serial version of the VCA algorithm and three parallel versions: one using NVIDIA’s Compute Unified Device Architecture (CUDA), another using CUDA […]

CUDA

Jul, 13

Development of Parallel Computation Tools

In this project, boundary value problems of the electric field governed by the Laplace equation were formulated using different numerical methods such as FEM and BEM. The resulting systems of linear equations were then solved using different solving algorithms. The accuracy and complexity of FEM and BEM were compared. The space and time complexity of […]

CUDA

Jul, 13

A stand-alone Finite Difference Time Domain (FDTD) simulation for Integrated Optoelectronics Laboratory

Numerical solution models to Maxwell’s equations, which describe electromagnetic wave propagation phenomenon with complete clarity, are of atmost importance in pre-fabrication simulation analyses of the photonic and optoelectronic devices. The Finite Difference Time Domain (FDTD) method, which is based on modeling the differential equations as difference equations in a discretized domain in both space and […]

CUDA

Jul, 13

Fusion of Morphological Images for Airborne Target Detection

Several track-before-detection approaches for image based aircraft detection have recently been examined in an important automated aircraft collision detection application. A particularly popular approach is a two stage processing paradigm which involves: a morphological spatial filter stage (which aims to emphasize the visual characteristics of targets) followed by a temporal or track filter stage (which […]

CUDA

Jul, 12

Towards Parallel Programming Models for Predictability

Future embedded systems for performance-demanding applications will be massively parallel. High performance tasks will be parallel programs, running on several cores, rather than single threads running on single cores. For hard real-time applications, WCETs for such tasks must be bounded. Low-level parallel programming models, based on concurrent threads, are notoriously hard to use due to […]

CUDA

•

OpenCL

Jul, 12

CUDA implementation of the algorithm for simulating the epidemic spreading over large networks

For some years now, there has been an increasing interest in modeling and analyzing the spread of epidemics in both human and computer networks. The obvious advantage a computer simulation of the epidemic spread offers is that the answer is delivered in short time and the number of hosts included in simulation can approach their […]

CUDA

Jul, 12

High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

As the data protection with encryption becomes important day by day, the encryption processing using General Purpose computation on a Graphic Processing Unit (GPGPU) has been noticed as one of the methods to realize high-speed data protection technology. GPUs have evolved in recent years into powerful parallel computing devices, with a high cost-performance ratio. However, […]

CUDA

Jul, 12

A Note on Particle Filters Applied to DSGE Models

This paper compares the properties of two particle filters – the Bootstrap Filter and the Auxiliary Particle Filter – applied to the computation of the likelihood of artificial data simulated from a basic DSGE model with nominal and real rigidities. Particle filters are compared in terms of speed, quality of the approximation of the probability […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

International Conference on Parallel Computing 2013, ParCo2013

International Conference on Computational Physics, ICCP 2013

GPU Technology Conference 2013, GTC 2013

Linearised inversion with GPUs

Real-Time Implementation of the Vertex Component Analysis Algorithm on GPUs

Development of Parallel Computation Tools

A stand-alone Finite Difference Time Domain (FDTD) simulation for Integrated Optoelectronics Laboratory

Fusion of Morphological Images for Airborne Target Detection

Towards Parallel Programming Models for Predictability

CUDA implementation of the algorithm for simulating the epidemic spreading over large networks

High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

A Note on Particle Filters Applied to DSGE Models

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)