Posts
Mar, 18
Fast Sparse Matrix Multiplication on GPU
Sparse matrix multiplication is an important algorithm in a wide variety of problems, including graph algorithms, simulations and linear solving to name a few. Yet, there are but a few works related to acceleration of sparse matrix multiplication on a GPU. We present a fast, novel algorithm for sparse matrix multiplication, outperforming the previous algorithm […]
Mar, 18
Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments
In several parts of query optimization, like join enumeration or physical operator selection, there is always the question of how much optimization is needed and how large the performance benefits are. In particular, a decision for either global optimization (e.g., during query optimization) or local optimization (during query execution) has to be taken. In this […]
Mar, 18
Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling
Artificial neural network (ANN) is widely applied as the data-driven modeling tool in hydroinformatics due to its broad applicability of handling implicit and nonlinear relationships between the input and output data. To obtain a reliable ANN model, training ANN using the data is essential, but the training is usually taking many hours for a large […]
Mar, 18
Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures
Large scale-free graphs are famously difficult to process efficiently: the highly skewed vertex degree distribution makes it difficult to obtain balanced workload partitions for parallel processing. Our research instead aims to take advantage of vertex degree heterogeneity by partitioning the workload to match the strength of the individual computing elements in a hybrid architecture. This […]
Mar, 18
A Switched Dynamical System Framework for Analysis of Massively Parallel Asynchronous Numerical Algorithms
In the near future, massively parallel computing systems will be necessary to solve computation intensive applications. The key bottleneck in massively parallel implementation of numerical algorithms is the synchronization of data across processing elements (PEs) after each iteration, which results in significant idle time. Thus, there is a trend towards relaxing the synchronization and adopting […]
Mar, 18
Fast Radix Sort for Sparse Linear Algebra on GPU
Fast sorting is an important step in many parallel algorithms, which require data ranking, ordering or partitioning. Parallel sorting is a widely researched subject, and many algorithms were developed in the past. In this paper, the focus is on implementing highly efficient sorting routines for the sparse linear algebra operations, such as parallel sparse matrix […]
Mar, 14
Heterogeneous Acceleration of Volumetric JPEG 2000
We present the implementation of a volumetric JPEG 2000 codec as a real-world use case of software acceleration with GPUs and multi-core CPUs. We present a generic methodology to accelerate existing code written in C with OpenCL. Furthermore, we account for the volumetric nature of the processed data and formulate associated optimization guidelines. The resulting […]
Mar, 14
EmoNets: Multimodal deep learning approaches for emotion recognition in video
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which […]
Mar, 14
Parallel Statistical Multi-resolution Estimation
We discuss several strategies to implement Dykstra’s projection algorithm on NVIDIA’s compute unified device architecture (CUDA). Dykstra’s algorithm is the central step in and the computationally most expensive part of statistical multi-resolution methods. It projects a given vector onto the intersection of convex sets. Compared with a CPU implementation our CUDA implementation is one order […]
Mar, 14
HELIOS-K: An Ultrafast, Open-source Opacity Calculator for Radiative Transfer
We present an ultrafast opacity calculator for application to exoplanetary atmospheres, which we name HELIOS-K. It takes a line list as an input, computes the shape of each spectral line (e.g., a Voigt profile) and provides an option for grouping an enormous number of lines into a manageable number of bins. We implement a combination […]
Mar, 14
Accelerating DEM simulations on GPUs by reducing the impact of warp divergences
A way to accelerate DEM calculations on the GPUs is developed. We examined how warp divergences take place in the contact detection and the force calculations taking account of the GPU architecture. Then we showed a strategy to reduce the impact of the warp divergences on the runtime of the DEM force calculations.
Mar, 14
2nd International Conference on Multimedia and Communication Technologies (ICMCT2015), 2015
2015 2nd International Conference on Multimedia and Communication Technologies (ICMCT2015) September 19-20, 2015 Hong Kong Organized by American Society for Research (ASR) http://www.icmct.org/ Submission Deadline: 2015-06-05 Topics: Hardware & Software for Multimedia Systems Enabling Technologies for Multimedia Multimedia Applications Consumer Systems and Networks Speech and Audio Processing Image and Video Processing Applied Signal Processing Communication […]