Papers on hgpu.org (.txt-file)
Improved GPU Co-processor Sorting Algorithm with Barrier Synchronization
Improved Implementation of Simulation for Membrane Computing on the Graphic Processing Unit
Improved Integral Histogram Algorithm for Big Sized Images in CUDA Environment
Improved Lossless Image Compression Model Using Coefficient Based Discrete Wavelet Transform
Improved OpenCL-based Implementation of Social Field Pedestrian Model
Improved Performance of CaFE and IRIS Model Fitting Using CUDA
Improved Poisson Matting for a Real Time Tele-presence System Using GPU
Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring
Improved Real-Time Stereo on Commodity Graphics Hardware
Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU
Improved Sequential & Parallel Designs and Implementations of the Eight Direction Prewitt Edge Detection
Improvement of the fused CUDA kernels performance prediction
Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture
Improvements to Physically Based Cloth Simulation
Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors
Improving accuracy for matrix multiplications on GPUs
Improving Atmospheric Model Performance on a Multi-Core Cluster System
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Improving Cache Locality for GPU-based Volume Rendering
Improving Cache Locality for Ray Casting with CUDA
Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters
Improving Communication Performance in GPU-Accelerated HPC Clusters
Improving CUDA DNA Analysis Software with Genetic Programming
Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices
Improving energy and power efficiency using NComputing and approaches for predicting reliability of complex computing systems
Improving Energy Efficiency of Basic Linear Algebra Routines on Heterogeneous Systems with Multiple GPUs
Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection of Near Optimal Configurations
Improving GPGPU Concurrency with Elastic Kernels
Improving GPU particle filter with shader model 3.0 for visual tracking
Improving GPU Performance by Regrouping CPU-Memory Data
Improving GPU Performance Prediction with Data Transfer Modeling
Improving GPU Performance through Instruction Redistribution and Diversification
Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
Improving GPU Performance: Reducing Memory Conflicts and Latency
Improving GPU programming models through hardware cache coherence
Improving GPU Robustness by Making Use of Faulty Parts
Improving GPU Simulations of Spiking Neural P Systems
Improving GPU Sparse Matrix-Vector Multiplication for Probabilistic Model Checking
Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search
Improving Hybrid OpenCL Performance by High Speed Networks
Improving Locality of Unstructured Mesh Algorithms on GPUs
Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS
Improving many flavor QCD simulations using multiple GPUs
Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms
Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs
Improving OpenACC compatibility within accULL
Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering
Improving OpenCL Programmability with the Heterogeneous Programming Library
Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers
Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra
Improving Performance and Energy Efficiency of GPUs through Locality Analysis
Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA
Improving performance for emergent environments parameter tuning and simulation in games using GPU
Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study
Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels
Improving Performance of Matrix Multiplication and FFT on GPU
Improving Performance of OpenCL on CPUs
Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations
Improving Performance Portability in OpenCL Programs
Improving processing time for visual measurements of displacements of IPMC actuators using CUDA
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Improving Resource Efficiency in Virtualized Datacenters
Improving Resource Utilization in Heterogeneous CPU-GPU Systems
Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations
Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Improving Student Learning in Computer Science Courses by Using Virtual OpenCL Laboratory
Improving Synchronization and Data Access in Parallel Programming Models
Improving tasks throughput on accelerators using OpenCL command concurrency
Improving the Efficiency of GPU Clusters
Improving the Efficiency of OpenCL Kernels through Pipes
Improving the GPU space of computation under triangular domain problems
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs
Improving the Neural GPU Architecture for Algorithm Learning
Improving the Performance of a Ray Tracing Algorithm Using a GPU
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose
Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
Improving the performance of PIR Protocol in Outsourced Databases
Improving the performance of spatial raster analysis in GIS using GPU
Improving the Performance of the Contextual Spaces Re-Ranking Algorithm on Heterogeneous Systems
Improving the Performance of the Linear Systems Solvers Using CUDA
Improving the Performance of the Sparse Matrix Vector Product with GPUs
Improving the Performance, Portability, and Productivity of Hardware Accelerators
Improving the Programmability of GPU Architectures
Improving the scalability of modern applications by parallel multi-core and many-core programming
Improving the speed of neural networks on CPUs
Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture
Improving the usability of hierarchical representations for interactively labeling large image data sets
In Search of Self-Organization
In Situ Power Analysis of General Purpose Graphical Processing Units
In vivo interactive visualization of four-dimensional blood flow patterns
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Memory Data Analytics on Coupled CPU-GPU Architectures
In-memory database acceleration on FPGAs: a survey
In-memory grid files on graphics processors
In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL
Titles: 100
open PDFs: 88
packages: 11