high performance computing on graphics processing units: hgpu.org

Posts

Aug, 27

Accelerated Deep Learning using Intel Xeon Phi

Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, […]

Aug, 27

MemcachedGPU: Scaling-up Scale-out Key-value Stores

This paper tackles the challenges of obtaining more efficient data center computing while maintaining low latency, low cost, programmability, and the potential for workload consolidation. We introduce GNoM, a software framework enabling energy-efficient, latency bandwidth optimized UDP network and application processing on GPUs. GNoM handles the data movement and task management to facilitate the development […]

CUDA

Aug, 24

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

With Exascale systems on the horizon at the same time that conventional von-Neumann architectures are suffering from rising power densities, we are facing an era with power, energy-efficiency, and cooling as first-class constraints for scalable HPC. FPGAs can tailor the hardware to the application, avoiding overheads of general-purpose architectures–for example, through customized datapaths and memory […]

Aug, 24

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Document-oriented databases are popular databases, in which users can store their documents in a schema-less manner and perform search queries for them. They have been widely used for web applications that process a large collection of documents because of their high scalability and rich functions. One of major functions of documentoriented databases is a string […]

CUDA

Aug, 24

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

CONTEXT: Embedded platforms GPUs are reaching a level of performance comparable to desktop hardware. Therefore it becomes interesting to apply Computer Vision techniques to modern smartphones.The platform holds different challenges, as energy use and heat generation can be an issue depending on load distribution on the device. OBJECTIVES: We evaluate the viability of a feature […]

OpenCL

Aug, 24

Scheduling for new computing platforms with GPUs

More and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which […]

OpenCL

Aug, 24

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs’ compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose […]

CUDA

•

OpenCL

Aug, 24

Semi-Global Filtering of Airborne LiDAR Data for Fast Extraction of Digital Terrain Models

Automatic extraction of ground points, called filtering, is an essential step in producing Digital Terrain Models from airborne LiDAR data. Scene complexity and computational performance are two major problems that should be addressed in filtering, especially when processing large point cloud data with diverse scenes. This paper proposes a fast and intelligent algorithm called Semi-Global […]

CUDA

Aug, 21

A Parallelizing Matlab Compiler Framework and Run time for Heterogeneous Systems

Compute-intensive applications incorporate ever increasing data processing requirements on hardware systems. Many of these applications have only recently become feasible thanks to the increasing computing power of modern processors. The Matlab language is uniquely situated to support the description of these compute-intensive scientific applications, and consequently has been continuously improved to provide increasing computational support […]

Aug, 21

Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420

Computer vision algorithms are becoming increasingly important in mobile, embedded, and wearable devices and applications. These compute-intensive workloads are challenging to implement with good performance and power-efficiency. In many applications, implementing critical portions of computer vision workloads on a general-purpose graphics processing unit (GPU) is an attractive solution. Qualcomm enables programming of the Adreno GPU […]

OpenCL

Aug, 21

A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL

In recent times, it has become possible to parallelize many multimedia applications using multicore platforms such as CPUs and GPUs. In this paper, we propose a parallel processing approach for a multimedia application by using both the CPU and GPU. Instead of distributing the parallelizable workload to either the CPU or GPU, we distribute the […]

OpenCL

Aug, 21

Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU

We study work-stealing based scheduling on a cluster of nodes with CPUs and GPUs. In particular, we evaluate locality aware scheduling in the context of distributed shared memory style programming, where the user is oblivious to data placement. Our runtime maintains a distributed map of data resident on various nodes and uses it to estimate […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerated Deep Learning using Intel Xeon Phi

MemcachedGPU: Scaling-up Scale-out Key-value Stores

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

Scheduling for new computing platforms with GPUs

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Semi-Global Filtering of Airborne LiDAR Data for Fast Extraction of Digital Terrain Models

A Parallelizing Matlab Compiler Framework and Run time for Heterogeneous Systems

Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420

A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL

Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)