Papers on hgpu.org (.txt-file)
Scientific computation for simulations on programmable graphics hardware

Scientific Computation on Graphics Processing Unit using CUDA

Scientific Computation Through a GPU
Scientific Computing on Heterogeneous Architectures

Scientific Computing on Hybrid Architectures

Scientific Computing Using Consumer Video-Gaming Hardware Devices

Scientific Computing with Python on GPUs

Scientific GPU Programming with Data-Flow Languages

Scientific Programming for Heterogeneous Systems – Bridging the Gap between Algorithms and Applications

Scientific Visualization in Astronomy: Towards the Petascale Astronomy Era

Scope for performance enhancement of CMU Sphinx by parallelising with OpenCL

Scope is all you need: Transforming LLMs for HPC Code

Scout: a data-parallel programming language for graphics processors
Seamless acceleration of Fortran intrinsics via AMD AI engines

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

Seamless GPU acceleration for C++ based physics with the Metal Shading Language on Apple’s M series unified chips

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

Searching for a counterexample of Kurepa’s Conjecture

Searching for Concurrent Design Patterns in Video Games

Searching for sinks of Henon map using a multiple-precision GPU arithmetic library

Second Order Pre-Integrated Volume Rendering

Secret Key Cryptography Using Graphics Cards

Secure 3D graphics for virtual machines

Secure Distributed Computing on a Manycore Cloud

SecureMed: Secure Medical Computation using GPU-Accelerated Homomorphic Encryption Scheme

Securing GPU via Region-based Bounds Checking

Seeded ND medical image segmentation by cellular automaton on GPU

Seeing through the fog: an algorithm for fast and accurate touch detection in optical tabletop surfaces

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Seismic Attributes Extraction Based on GPU

Seismic damage simulation for urban buildings based on high-performance GPU computing

Seismic imaging based on spectral differentiation matrix and GPU implementation

Seismic volume visualization for horizon extraction

Seismic Wave Propagation Simulation Using Accelerated Support Operator Rupture Dynamics on Multi-GPU

Seismic Wave Propagation Simulation Using Support Operator Method on multi-GPU system

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Selection algorithm of graphic accelerators in heterogeneous cluster for optimization computing

Selection of Task Implementations in the Nanos++ Runtime

Self-Adapting Parallel Framework for Long-Term Object Tracking

Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures

Self-calibration of geometric and radiometric parameters for cone-beam computed tomography

self-CD: Interactive Self-collision Detection for Deformable Body Simulation Using GPUs

Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques

Self-Supervised Clustering for Codebook Construction: An Application to Object Localization

Self-Tuning Distribution of DB-Operations on Hybrid CPU/GPU Platforms

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Semantic Pose using Deep Networks Trained on Synthetic RGB-D

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

SemCache: Semantics-aware Caching for Efficient GPU Offloading

Semi-Analytic Solutions to the Radiative Transfer Equations via Hetergeneous Computing

Semi-Global Filtering of Airborne LiDAR Data for Fast Extraction of Digital Terrain Models

Semi-Global Matching-Motivation, Developments and Applications

Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

Separate Compilation in a Language-Integrated Heterogeneous Environment

Sequence alignment with GPU: Performance and design challenges

Sequence Data Indexing Method Exploiting the Parallel Processing Resources of GPGPU

Sequence Homology Search using Fine-Grained Cycle Sharing of Idle GPUs

Sequence Parallelism: Making 4D Parallelism Possible

Sequential Code Parallelization for Multi-core Embedded Systems: A Survey of Models, Algorithms and Tools

Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

Sequential Monte Carlo Optimisation for Air Traffic Management

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

Serpent encryption algorithm implementation on Compute Unified Device Architecture (CUDA)

Serverless Computing Strategies on Cloud Platforms

Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs

SESH framework: A Space Exploration Framework for GPU Application and Hardware Codesign

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

SGO: An ultrafast engine for atomic structure global optimization by differential evolution

SGPU 2: a runtime system for using large applications on clusters of hybrid nodes

Shader Performance Analysis on a Modern GPU Architecture

Shader-based tessellation to save memory bandwidth in a mobile multimedia processor
Shader-based visual simulation of ocean wave
SHADOW3 API: The Application Programming Interface for the ray tracing code SHADOW

Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies

Shallow Water Simulation on GPUs for Sparse Domains

Shallow water simulations on multiple GPUs

Shape Modeling and GPU Based Image Warping

Shape Transformation of Multidimensional Density Functions using Distribution Interpolation of the Radon Transforms

SHARC: A streaming model for FPGA accelerators and its application to Saliency

Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput

Shared Sampling for Real-Time Alpha Matting

ShearLab 3D: Faithful Digital Shearlet Transforms based on Compactly Supported Shearlets

Shell: A Spatial Decomposition Data Structure for 3D Curve Traversal on Many-core Architectures

Ship Detection from SAR Imagery Using CUDA and Performance Analysis of the System

Short-time Fourier transform laser Doppler holography

Shortening design time through multiplatform simulations with a portable OpenCL golden-model: the LDPC decoder case

Shortest-Path Queries in Planar Graphs on GPU-Accelerated Architectures

ShoveRand: a model-driven framework to easily generate random numbers on GP-GPU

Shredder: GPU-Accelerated Incremental Storage and Computation

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Sieve: Stratified GPU-Compute Workload Sampling

SiftCU: An Accelerated Cuda Based Implementation of SIFT

Titles: 100
open PDFs: 94
packages: 10
