Posts
Oct, 8
GPU Concurrency Choices in Graph Analytics
Graph analytics is becoming ever more ubiquitous in today’s world. However, situational dynamic changes in input graphs, such as changes in traffic and weather patterns, lead to variations in concurrency. Moreover, graph algorithms are known to have data dependent loops and fine-grain synchronization that makes them hard to scale on parallel machines. Recent trends in […]
Oct, 8
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with […]
Oct, 8
Rinnegan: Efficient Resource Use in Heterogeneous Architectures
Current processors provide a variety of different processing units to improve performance and power efficiency. For example, ARM’s big.LITTLE, AMD’s APUs, and Oracle’s M7 provide heterogeneous processors, on-die GPUs, and on-die accelerators. However, the performance experienced by programs using these processing units can vary widely due to contention from multiprogramming, thermal constraints and other issues. […]
Oct, 8
A Runtime Controller for OpenCL Applications on Heterogeneous System Architectures
Heterogeneous architectures nowadays are becoming very attractive in the embedded and mobile markets thanks to the possibility to exploit the best computational resource to optimize the performance per Watt figure of merit. Unfortunately, deciding the right resource to use and its operating frequency is a difficult problem that depends on the actual conditions in which […]
Oct, 4
Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs to significantly improve their classification and training times. With these improvements, many frameworks have become available for implementing CNNs on both […]
Oct, 4
Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
Sparse matrix-vector multiplication (SpMV) is an important operation in computational science, and needs be accelerated because it often represents the dominant cost in many widely-used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different […]
Oct, 4
APL on GPUs: A TAIL from the Past, Scribbled in Futhark
This paper demonstrates translation schemes by which programs written in a functional subset of APL can be compiled to code that is run efficiently on general purpose graphical processing units (GPGPUs). Furthermore, the generated programs can be straightforwardly interoperated with mainstream programming environments, such as Python, for example for purposes of visualization and user interaction. […]
Oct, 4
Explicit Fourth-Order Runge-Kutta Method on Intel Xeon Phi Coprocessor
This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge-Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this work an implementation based on Intel Math Kernel Library (Intel MKL) routines and the authors’ own implementation, both using […]
Oct, 4
Training a Feedback Loop for Hand Pose Estimation
We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep […]
Sep, 30
GPU-based timetable generation
Throughout an academic year, educational institutions need to generate hundreds of different timetables, this complex task demands a considerable amount of time and human resources.In the past, timetable generation was handmade, in current days as this task complexity increases, it is performed by specialized software which allows to reduce time and costs.Since nearly 10 years […]
Sep, 30
Programming Models and Tools for Many-Core Platforms
The negotiation between power consumption, performance, programmability, and portability drives all computing industry designs, in particular the mobile and embedded systems domains. Two design paradigms have proven particularly promising in this context: architectural heterogeneity and many-core processors. Parallel programming models are key to effectively harness the computational power of heterogeneous many-core SoC. This thesis presents […]
Sep, 30
Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability
This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, […]