Papers on hgpu.org (.txt-file)
“Local Rank Differences” Image Feature Implemented on GPU

.NET High Performance Computing

10×10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency

190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

2D and 3D level-set algorithms on GPU
2D/3D image registration on the GPU

2PARMA: Parallel Paradigms and Run-time Management Techniques for Many-Core Architectures

3-SAT on CUDA: Towards a massively parallel SAT solver

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

3D Edge Bundling for Geographical Data Visualization

3D finite difference computation on GPUs using CUDA

3D finite element numerical integration on GPUs

3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis

3D HAAR-LIKE ELLIPTICAL FEATURES FOR OBJECT CLASSIFICATION IN MICROSCOPY

3D Information Extraction Based on GPU

3D Modeling, Distance and Gradient Computation for Motion Planning: A Direct GPGPU Approach

3D nonrigid registration via optimal mass transport on the GPU

3D Recursive Gaussian IIR on GPU and FPGAs: A Case Study for Accelerating Bandwidth-Bounded Applications

3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation

3D tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy

3D vision of electromagnetic fields in antenna and microwave technique
3I: A tool for visualizing and processing in parallel 2D & 3D images

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

5.6: GPU enhancement of FDTD-PIC plasma-wave simulations
A (ir)regularity-aware task scheduler for heterogeneous platforms

A 3D Convex Hull Algorithm for Graphics Hardware

A 3D radiative transfer framework. VIII. OpenCL implementation

A 3D radiative transfer framework: XIII. OpenCL implementation

A 57mW embedded mixed-mode neuro-fuzzy accelerator for intelligent multi-core processor
A balanced programming model for emerging heterogeneous multicore systems

A Batched GPU Algorithm for Set Intersection

A block-asynchronous relaxation method for graphics processing units

A Braille Conversion Service Using GPU and Human Interaction by Computer Vision

A breadth-first course in multicore and manycore programming

A capabilities-aware framework for using computational accelerators in data-intensive computing

A Case Study for Petascale Applications in Astrophysics: Simulating Gamma-Ray Bursts

A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU
A case study on porting scientific applications to GPU/CUDA

A CG-based Poisson solver on a GPU-cluster
A characterization and analysis of PTX kernels

A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

A cluster for CS education in the manycore era

A Co-Prime Blur Scheme for Data Security in Video Surveillance

A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics
A code motion technique for accelerating general-purpose computation on the GPU

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

A Code Transformation Framework for Scientific Applications on Structured Grids

A code-based analytical approach for using separate device coprocessors in computing systems
A collision detection algorithm using adaptive particle sensor
A Common GPU n-Dimensional Array for Python and C

A Comparative Analysis of GPU Implementations of Spectral Unmixing Algorithms

A comparative benchmarking of the FFT on Fermi and Evergreen GPUs
A comparative study of GPU programming models and architectures using neural networks
A Comparative Study of OpenACC Implementations

A Comparative Study of Parallel Algorithms for the Girth Problem

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

A comparison of CPU and GPU performance for Fourier pseudospectral simulations of the Navier-Stokes, Cubic Nonlinear Schrodinger and Sine Gordon Equations

A Comparison of CPU and OpenCL Parallelization Methods for Correlation and Graph Layout Algorithms used in the Network Analysis of High Dimensional Data

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

A Comparison of Gradient Estimation Methods for Volume Rendering on Unstructured Meshes

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

A Comparison of Modern GPU and CPU Architectures: And the Common Convergence of Both

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

A Comparison of xPU Platforms Exemplified with Ray Tracing Algorithms
A Compile-Time Managed Multi-Level Register File Hierarchy

A Compiler and Runtime for Heterogeneous Computing

A compiler for high performance computing with many-core accelerators

A compiler framework for optimization of affine loop nests for gpgpus

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

A Complete Descritpion of the UnPython and Jit4GPU Framework

A complete modular resultant algorithm targeted for realization on graphics hardware

A comprehensive analysis and parallelization of an image retrieval algorithm

A Comprehensive Performance Comparison of CUDA and OpenCL

A Computational Model of Afterimages

A computationally efficient and scalable approach for privacy preserving kNN classification

A Computationally Efficient Approach for Exemplar-based Color Image Inpainting using GPU

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A computing origami: Optimized code generation for emerging parallel platforms

A constant-space belief propagation algorithm for stereo matching

A Consumer Application for GPGPUs: Desktop Search

A Contour-Guided Deformable Image Registration Algorithm for Adaptive Radiotherapy

A control-structure splitting optimization for GPGPU
A convex formulation for color image segmentation in the context of passive emitter localization

A CPU-GPU Hybrid Runtime for the Aeminium Language

A Cross-Input Adaptive Framework for GPU Programs Optimization

A CUDA Based Implementation of an Image Authentication Algorithm

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

A CUDA SIMT Interpreter for Genetic Programming

A CUDA SIMT interpreter for genetic programming. Revised

A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems

Titles: 100
open PDFs: 84
packages: 8
Most viewed papers (last 30 days)
- Graphics Programming on the Web WebCL Course Notes
- Use NVIDIA CUDA technology to create genetic algorithms with extensive population
- Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
- Implementations of the FFT algorithm on GPU
- Secrets from the GPU
- GPU Scripting and Code Generation with PyCUDA
- A General-Purpose GPU Reservoir Computer
- One OpenCL to Rule Them All?
- Fluid Motion Modelling Using Vortex Particle Method on GPU
- Adding GPU Computing to Computer Organization Courses
Rating
Duality based optical flow algorithms with applications
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Graphics Programming on the Web WebCL Course Notes
Automatic Compilation for Heterogeneous Architectures with Single Assignment C
Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes
Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection
A parallel decoding algorithm of LDPC codes using CUDA
Optimizing MapReduce for GPUs with effective shared memory usage
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
CUDA implementation of the algorithm for simulating the epidemic spreading over large networks
Recent source codes
Events
October 1-4, 2013 Lyon, France The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013 |
November 13-15, 2013 Zhangjiajie, China 3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013 |
February 2-6, 2014 San Francisco, USA |
February 12-14, 2014 Turin, Italy |
November 11-14, 2013 San Jose, California, USA |
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.
The information send to hgpu.org will be treated according to our Privacy Policy

