high performance computing on graphics processing units: hgpu.org

Posts

Apr, 21

A GPU-Based Enhanced Genetic Algorithm for Power-Aware Task Scheduling Problem in HPC Cloud

In this paper, we consider power-aware task scheduling (PATS) in HPC clouds. Users request virtual machines (VMs) to execute their tasks. Each task is executed on one single VM, and requires a fixed number of cores (i.e., processors), computing power (million instructions per second – MIPS) of each core, a fixed start time and non-preemption […]

CUDA

Apr, 21

Rapid Rabbit: Highly Optimized GPU Accelerated Cone-Beam CT Reconstruction

Graphical processing units (GPUs) have become widely adopted in the medical imaging community. The parallel SIMD nature of GPUs maps perfectly to many reconstruction algorithms. Because of this, it is relatively straightforward to parallelize common reconstruction algorithms (e.g. FDK backprojection). This means that significant performance improvements must come from careful memory optimizations, exploiting ASICs and […]

CUDA

Apr, 21

GACO: A GPU-based High Performance Parallel Multi-ant Colony Optimization Algorithm

As a population-based algorithm, Ant Colony Optimization (ACO) is intrinsically massively parallel, and therefore it is expected to be well-suited for implementation on GPUs (Graphics Processing Units). In this paper, we present a novel ant colony optimization algorithm (called GACO), which based on Compute Unified Device Architecture (CUDA) enabled GPU. In GACO algorithm, we utilize […]

CUDA

Apr, 21

Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solver. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O(p^2 log(N/p)) for one dimensional problems, O(Np^2) for two dimensional problems, and O(N^(4/3)p^2) for three dimensional problems, where N is the number of degrees of freedom, […]

CUDA

Apr, 19

An Automated Tool for Converting Directive Based C Code Into Parallel CUDA Code

Parallel programming has become simple and reasonable with the preamble of GPGPUs. Now a day’s many programmers transfer their application to GPGPUs with the accessibility of APIs such as NVIDIA’s CUDA. But it is very tricky task to write CUDA program. Most of the industry extensively uses the immense serial C code, and they are […]

CUDA

Apr, 19

Collision Detection Based on Fuzzy Scene Subdivision

We present a novel approach to perform collision detection queries between rigid and/or deformable models. Our method can handle arbitrary deformations and even discontinuous ones. For this, we subdivide the whole scene with all objects into connected but totally independent parts by a fuzzy clustering algorithm. Following, for every part our algorithm performs a Principal […]

CUDA

Apr, 19

Architectural Support for Virtual Memory in GPUs

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. Indeed, processor vendors have already begun embracing heterogeneous systems with […]

Apr, 19

Parallel Circuit Simulation on Graphical Processing Unit

So high integration of IC design and mix VLSI design have brought new complexity in IC design. This complexity brings new challenges for simulation IC time. There is interest to speed up Spice [1] simulation because for large IC simulation can take several days. Average 75% percent of simulation time is spent in evaluating transistor […]

CUDA

Apr, 19

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analysis such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented […]

CUDA

Apr, 18

The Reconstruction Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the Insight Toolkit (ITK)

We propose the Reconstruction Toolkit (RTK, http://www.openrtk.org), an open-source toolkit for fast cone-beam CT reconstruction, based on the Insight Toolkit (ITK) and using GPU code extracted from Plastimatch. RTK is developed by an open consortium (see affiliations) under the non-contaminating Apache 2.0 license. The quality of the platform is daily checked with regression tests in […]

CUDA

•

OpenCL

Apr, 18

Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System

The development of computational systems that mimics the physiological response of organs or even the entire body is a complex task. One of the issues that makes this task extremely complex is the huge computational resources needed to execute the simulations. For this reason, the use of parallel computing is mandatory. In this work, we […]

CUDA

Apr, 18

DBMS Index for Hierarchical Data Using Nested Intervals and Residue Classes

In the work an index based on B+ tree and oriented to storage of tree which are coded by nested intervals method with usage of system of residual classes is described.

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A GPU-Based Enhanced Genetic Algorithm for Power-Aware Task Scheduling Problem in HPC Cloud

Rapid Rabbit: Highly Optimized GPU Accelerated Cone-Beam CT Reconstruction

GACO: A GPU-based High Performance Parallel Multi-ant Colony Optimization Algorithm

Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

An Automated Tool for Converting Directive Based C Code Into Parallel CUDA Code

Collision Detection Based on Fuzzy Scene Subdivision

Architectural Support for Virtual Memory in GPUs

Parallel Circuit Simulation on Graphical Processing Unit

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

The Reconstruction Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the Insight Toolkit (ITK)

Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System

DBMS Index for Hierarchical Data Using Nested Intervals and Residue Classes

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)