15520

Posts

Mar, 3

Heuristics for the Variable Sized Bin Packing Problem Using a Hybrid P-System and CUDA Architecture

The Variable Sized Bin Packing Problem has a wide range of application areas including packing, scheduling, and manufacturing. Given a list of items and variable sized bin types, the objective is to minimize the total size of the used bins. This problem is known to be NP-hard. In this article, we present two new heuristics […]
Mar, 3

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their ability to use different representations per layer while maintaining accuracy. This flexibility provides an additional opportunity to improve performance and energy compared to conventional DNN implementations that use a single, uniform representation for all layers throughout the […]
Mar, 2

International Conference on Cloud Computing and Big Data (ICCCBD), 2016

2016 International Conference on Cloud Computing and Big Data (ICCCBD 2016) will be held during July 5-7, 2016 in Chengdu, China, technical sponsored by Sichuan Institue of Electronics and Sichuan Province Computer Federation. Paper Publication All accepted papers must be written in English and will be published into conference proceedings by IEEE. The proceedings will […]
Mar, 1

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data process pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning […]
Mar, 1

Alpaka – An Abstraction Library for Parallel Kernel Acceleration

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical […]
Mar, 1

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems

The performance of molecular dynamics (MD) software such as GROMACS is limited by the software’s ability to perform force calculations. The largest part of this is for nonbonded interactions such as between water molecules and water molecules and solute. The determination of nonbonded interactions may account for over 90% of the total computation and real […]
Mar, 1

Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher’s flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the […]
Mar, 1

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each […]
Feb, 25

GPU Robot Motion Planning using Semi-Infinite Nonlinear Programming

We propose a many-core GPU implementation of robotic motion planning formulated as a semi-infinite optimization program. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. To ensure the continuous satisfaction of our constraints, we use polynomial approximations over time intervals. Because […]
Feb, 25

Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application

For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD vector programming techniques has been shown to give near optimal performance, however they are difficult to implement for all classes of applications particularly ones with very irregular memory accesses and […]
Feb, 25

DIANNE: Distributed Artificial Neural Networks for the Internet of Things

Nowadays artificial neural networks are widely used to accurately classify and recognize patterns. An interesting application area is the Internet of Things (IoT), where physical things are connected to the Internet, and generate a huge amount of sensor data that can be used for a myriad of new, pervasive applications. Neural networks’ ability to comprehend […]
Feb, 25

Parallel Approaches to Shortest-Path Problems for Multilevel Heterogeneous Computing

Many graph algorithms have given solution to the problem of finding shortest paths between nodes in a graph. These problems are considered among the fundamental combinatorial optimization problems. They have many applications, such as car/robot navigation systems, traffic simulations, tramp steamer problem, courier-scheduling optimization, Internet route planners, web searching, or exploiting arbitrage opportunities in currency […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: