13761

Posts

Mar, 20

The More We Share, The More We Have: Improving GPU performance through Register Sharing

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The amount of thread level parallelism that can be utilized depends on the number of resident threads on each of the SMs. The threads are typically structured […]
Mar, 20

Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning

Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing […]
Mar, 18

Fast Sparse Matrix Multiplication on GPU

Sparse matrix multiplication is an important algorithm in a wide variety of problems, including graph algorithms, simulations and linear solving to name a few. Yet, there are but a few works related to acceleration of sparse matrix multiplication on a GPU. We present a fast, novel algorithm for sparse matrix multiplication, outperforming the previous algorithm […]
Mar, 18

Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments

In several parts of query optimization, like join enumeration or physical operator selection, there is always the question of how much optimization is needed and how large the performance benefits are. In particular, a decision for either global optimization (e.g., during query optimization) or local optimization (during query execution) has to be taken. In this […]
Mar, 18

Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling

Artificial neural network (ANN) is widely applied as the data-driven modeling tool in hydroinformatics due to its broad applicability of handling implicit and nonlinear relationships between the input and output data. To obtain a reliable ANN model, training ANN using the data is essential, but the training is usually taking many hours for a large […]
Mar, 18

Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

Large scale-free graphs are famously difficult to process efficiently: the highly skewed vertex degree distribution makes it difficult to obtain balanced workload partitions for parallel processing. Our research instead aims to take advantage of vertex degree heterogeneity by partitioning the workload to match the strength of the individual computing elements in a hybrid architecture. This […]
Mar, 18

A Switched Dynamical System Framework for Analysis of Massively Parallel Asynchronous Numerical Algorithms

In the near future, massively parallel computing systems will be necessary to solve computation intensive applications. The key bottleneck in massively parallel implementation of numerical algorithms is the synchronization of data across processing elements (PEs) after each iteration, which results in significant idle time. Thus, there is a trend towards relaxing the synchronization and adopting […]
Mar, 18

Fast Radix Sort for Sparse Linear Algebra on GPU

Fast sorting is an important step in many parallel algorithms, which require data ranking, ordering or partitioning. Parallel sorting is a widely researched subject, and many algorithms were developed in the past. In this paper, the focus is on implementing highly efficient sorting routines for the sparse linear algebra operations, such as parallel sparse matrix […]
Mar, 14

Heterogeneous Acceleration of Volumetric JPEG 2000

We present the implementation of a volumetric JPEG 2000 codec as a real-world use case of software acceleration with GPUs and multi-core CPUs. We present a generic methodology to accelerate existing code written in C with OpenCL. Furthermore, we account for the volumetric nature of the processed data and formulate associated optimization guidelines. The resulting […]
Mar, 14

EmoNets: Multimodal deep learning approaches for emotion recognition in video

The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which […]
Mar, 14

Parallel Statistical Multi-resolution Estimation

We discuss several strategies to implement Dykstra’s projection algorithm on NVIDIA’s compute unified device architecture (CUDA). Dykstra’s algorithm is the central step in and the computationally most expensive part of statistical multi-resolution methods. It projects a given vector onto the intersection of convex sets. Compared with a CPU implementation our CUDA implementation is one order […]
Mar, 14

HELIOS-K: An Ultrafast, Open-source Opacity Calculator for Radiative Transfer

We present an ultrafast opacity calculator for application to exoplanetary atmospheres, which we name HELIOS-K. It takes a line list as an input, computes the shape of each spectral line (e.g., a Voigt profile) and provides an option for grouping an enormous number of lines into a manageable number of bins. We implement a combination […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: