Posts
Mar, 1
Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems
The performance of molecular dynamics (MD) software such as GROMACS is limited by the software’s ability to perform force calculations. The largest part of this is for nonbonded interactions such as between water molecules and water molecules and solute. The determination of nonbonded interactions may account for over 90% of the total computation and real […]
Mar, 1
Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design
The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher’s flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the […]
Mar, 1
GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models
Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each […]
Feb, 25
GPU Robot Motion Planning using Semi-Infinite Nonlinear Programming
We propose a many-core GPU implementation of robotic motion planning formulated as a semi-infinite optimization program. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. To ensure the continuous satisfaction of our constraints, we use polynomial approximations over time intervals. Because […]
Feb, 25
Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application
For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD vector programming techniques has been shown to give near optimal performance, however they are difficult to implement for all classes of applications particularly ones with very irregular memory accesses and […]
Feb, 25
DIANNE: Distributed Artificial Neural Networks for the Internet of Things
Nowadays artificial neural networks are widely used to accurately classify and recognize patterns. An interesting application area is the Internet of Things (IoT), where physical things are connected to the Internet, and generate a huge amount of sensor data that can be used for a myriad of new, pervasive applications. Neural networks’ ability to comprehend […]
Feb, 25
ANTS2 package: simulation and experimental data processing for Anger camera type detectors
ANTS2 is a simulation and data processing package developed for position sensitive detectors with Anger camera type readout. The simulation module of ANTS2 is based on ROOT package from CERN, which is used to store the detector geometry and to perform 3D navigation. The module is capable of simulating particle sources, performing particle tracking, generating […]
Feb, 25
Parallel Approaches to Shortest-Path Problems for Multilevel Heterogeneous Computing
Many graph algorithms have given solution to the problem of finding shortest paths between nodes in a graph. These problems are considered among the fundamental combinatorial optimization problems. They have many applications, such as car/robot navigation systems, traffic simulations, tramp steamer problem, courier-scheduling optimization, Internet route planners, web searching, or exploiting arbitrage opportunities in currency […]
Feb, 23
The 3rd Int. Conference on Robotics and Mechatronics (ICROM), 2016
★Place:Quality Hotel,Singapore 201 Balestier Road Singapore 329926 | Tel: (65)6355 9988 | Fax: (65) 6255 0998 ★★KEYNOTE★★ 1. Prof. Hubert Roth, Siegen University, Germany 2. Prof. Shujiro Dohta, Okayama University of Science, Japan 3. Prof. Wei-Hsin Liao, Chinese University of Hong Kong, Hong Kong 4. Prof. Zhang Shanyong, Sam, Nanyang Technological University, Singapore ★★ All […]
Feb, 23
First Int. Workshop on Pattern Recognition (IWPR 2016), 2016
Publication: Submitted and accepted papers will be published by SPIE. Indexing: Scopus, Ei Compendex, ISI, Inspec, Google Scholar. Sponsored by: University of Toyama, Japan Hosei University, Japan Kogakuin University, Japan Teikyo University, Japan North Carolina Agricultural and Technical State University, USA Hainan University, China Keynote Speakers: Prof. Chiharu Ishll, Hosei University, Japan Prof. Genci Capi,University […]
Feb, 23
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue", to a specific device for the entire program. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort […]
Feb, 23
VirtCL: a framework for OpenCL device abstraction and management
The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. However, the existing heterogeneous programming models (e.g., OpenCL) abstract details of GPU devices at the per-device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, multiple applications running […]