high performance computing on graphics processing units: hgpu.org

Posts

Dec, 12

Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization

Today’s productivity programmers, such as scientists who need to write code to do science, are typically forced to choose between productive and maintainable code with modest performance (e.g. Python plus native libraries such as SciPy [SciPy]) or complex, brittle, hardware-specific code that entangles application logic with performance concerns but runs two to three orders of […]

CUDA

Dec, 11

Self-Supervised Clustering for Codebook Construction: An Application to Object Localization

Approaches to object localization based on codebooks do not exploit the dependencies between appearance and geometric information present in training data. This work addresses the problem of computing a codebook tailored to the task of localization by applying regularization based on geometric information. We present a novel method, the Regularized Combined Partitional-Agglomerative clustering, which extends […]

CUDA

Dec, 11

Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive Robotics Research

This paper presents a novel open-source software Aquila developed as a part of the iTalk and RobotDoC projects. This software provides many different tools and biologically inspired systems that are useful for cognitive robotics research. Aquila addresses the need for high-performance robot control by adopting the latest parallel processing paradigm based on the NVidia CUDA […]

CUDA

Dec, 11

Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems

The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions […]

CUDA

Dec, 11

Accelerating Swarm Intelligence Algorithms with GPU-Computing

Swarm intelligence describes the ability of groups of social animals and insects to exhibit highly organized and complex problem-solving behaviors that allow the group as a whole to accomplish tasks which are beyond the capabilities of any individual. This phenomenon found in nature is the inspiration for swarm intelligence algorithms — systems that utilize the […]

CUDA

Dec, 11

Fast Face Detection Using Graphics Processor

Fast face detection is one of the key components of various computer vision applications. Viola-Jones algorithm provides a good and fast detection for low and medium resolution images. This paper proposes a new and fast approach to perform real time face detection. The proposed method includes the enhanced Haar-like features and uses SVM for training […]

CUDA

Dec, 11

A Dynamic Approach to Weighted Suffix Tree Construction Algorithm

In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our […]

CUDA

Dec, 11

Generalizing Execution of Vectorizable Computations by Generating Vector Oriented Byte Code

Computer simulations, which are widely used in both academia and in the industry, often work on very large data sets. This makes them well suited for harvesting the computing power of modern, highly parallel computing systems, such as GPU’s, clusters and vector processors. The challenge lies in the fact, that these systems must be programmed […]

CUDA

Dec, 11

Data analysis and 3D evolution in High Energy Physics using graphic processor

One of the main challenges in High Energy Physics (HEP) is to make fast analysis of high amount of experimental and simulated data. For example, the amount of data generated at Large Hadron Collider (LHC) is estimated to reach 1 PetaByte/year. The time taken to analyze the data and to obtain fast results depends on […]

CUDA

Dec, 11

ALICE HLT High Speed Tracking on GPU

The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion collisions, corresponding to an input data stream of 30 GB/s. In order to fulfill the time requirements, a fast on-line […]

CUDA

Dec, 11

Evaluating graph coloring on GPUs

This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for other parallel architectures, we find that the large number of cores and relatively high global memory bandwidth of a GPU lead to different strategies for the […]

CUDA

Dec, 10

Achieving High Throughput Sequencing with Graphics Processing Units

High throughput sequencing has become a powerful technique for genome analysis after this concept was raised in recent years. Currently, there is a huge demand from patients that have genetic diseases which cannot be satisfied due to the limitation of computation power. Though several softwares are developed using currently most efficient algorithm to deal with […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization

Self-Supervised Clustering for Codebook Construction: An Application to Object Localization

Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive Robotics Research

Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems

Accelerating Swarm Intelligence Algorithms with GPU-Computing

Fast Face Detection Using Graphics Processor

A Dynamic Approach to Weighted Suffix Tree Construction Algorithm

Generalizing Execution of Vectorizable Computations by Generating Vector Oriented Byte Code

Data analysis and 3D evolution in High Energy Physics using graphic processor

ALICE HLT High Speed Tracking on GPU

Evaluating graph coloring on GPUs

Achieving High Throughput Sequencing with Graphics Processing Units

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)