high performance computing on graphics processing units: hgpu.org

Posts

Feb, 27

2014 3rd International Conference on Knowledge Discovery, ICKD 2014

All papers of ICKD 2014 will be published in the International Journal of Computer Theory and Engineering (IJCTE)(ISSN: 1793-8201), and will be indexed by Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, INSPEC, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET). 2014-04-05 T1. Novel Algorithms T2. Association Rules T3. Knowledge […]

Feb, 27

2014 3rd International Conference on Computing and Computer Vision, ICCCV 2014

All papers for the ICCCV 2014 will be published in the Journal of Image and Graphics (JOIG, ISSN: 2301-3699) as one volume, and will be indexed by Ulrich’s Periodicals Directory, Google Scholar, EBSCO, Engineering & Technology Digital Library and Electronic Journals Digital Library. 2014-04-01 Machine Vision, Image Processing, and Pattern Analysis Imaging Sensors Color and […]

Feb, 27

Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

In astrophysical N-body simulations, Dehnen’s algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multi-core CPUs and manycore architectures (Xeon […]

CUDA

Feb, 27

Face Recognition Using OpenCL

Face recognition is the biometric identification of human’s face and matching the image against a library of known faces. The algorithm used to simulate the above is Eigen faces algorithm. The software which is been proposed to implement is Open CL. Open CL (Open Computing Language) is an open standard for general purpose parallel programming […]

OpenCL

Feb, 27

G-Heart: A GPU-based System for Electrophysiological Simulation and Multi-modality Cardiac Visualization

Cardiac electrophysiological simulation and multi-modality visualization are computationally intensive and valuable in studying the structure, mechanism, and dynamics of heart. The existing multi-CPU based approaches can reduce the calculation time, but suffer from the hardware and communication cost problems and are inefficient for 3D data visualization. Compared with multi-CPU, the highly parallel and multi-core properties […]

CUDA

Feb, 27

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy to exploit Gpus for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism and generates parallel OpenCl code from sequential […]

OpenCL

Feb, 27

Extending the Generalized Fermat Prime Number Search Beyond One Million Digits Using GPUs

Great strides have been made in recent years in the search for ever larger prime Generalized Fermat Numbers (GFN). We briefly review the history of the GFN prime search, and describe new implementations of the ‘Genefer’ software (now available as open source) using CUDA and optimised CPU assembler which have underpinned this unprecedented progress. The […]

CUDA

Feb, 26

2014 3rd International Conference on Software and Computer Applications, ICSCA 2014

All papers for the ICSCA 2014 will be published in the Journal of Lecture Notes on Software Engineering (LNSE, ISSN: 2301-3559) as one volume, and will be indexed by DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, EBSCO, Ulrich’s Periodicals Directory, International Computer Science Digital Library (ICSDL), ProQuest and Google Scholar. Software Engineering Artificial […]

Feb, 26

Real-Time GPU Implementation of Transverse Oscillation Vector Velocity Flow Imaging

Rapid estimation of blood velocity and visualization of complex flow patterns are important for clinical use of diagnostic ultrasound. This paper presents real-time processing for two-dimensional (2-D) vector flow imaging which utilizes an off-the-shelf graphics processing unit (GPU). In this work, Open Computing Language (OpenCL) is used to estimate 2-D vector velocity flow in vivo […]

OpenCL

Feb, 26

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time

In this paper, we solve the problem of estimating dense and accurate depth maps from a single moving camera. A probabilistic depth measurement is carried out in real time on a per-pixel basis and the computed uncertainty is used to reject erroneous estimations and provide live feedback on the reconstruction progress. Our contribution is a […]

CUDA

Feb, 26

Comparative evaluation of platforms for parallel Ant Colony Optimization

The rapidly growing field of nature-inspired computing concerns the development and application of algorithms and methods based on biological or physical principles. This approach is particularly compelling for practitioners in high-performance computing, as natural algorithms are often inherently parallel in nature (for example, they may be based on a "swarm"-like model that uses a population […]

OpenCL

Feb, 26

Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures

On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and thus non-reproducible mainly due to non-associativity of floating-point operations. We introduce a solution to compute deterministic sums of floating-point numbers efficiently and with the best possible accuracy. Our multi-level algorithm consists of two main stages: a filtering stage that uses […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

2014 3rd International Conference on Knowledge Discovery, ICKD 2014

2014 3rd International Conference on Computing and Computer Vision, ICCCV 2014

Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

Face Recognition Using OpenCL

G-Heart: A GPU-based System for Electrophysiological Simulation and Multi-modality Cardiac Visualization

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

Extending the Generalized Fermat Prime Number Search Beyond One Million Digits Using GPUs

2014 3rd International Conference on Software and Computer Applications, ICSCA 2014

Real-Time GPU Implementation of Transverse Oscillation Vector Velocity Flow Imaging

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time

Comparative evaluation of platforms for parallel Ant Colony Optimization

Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)