Posts
Jun, 17
2nd International Conference on Mechanical, Aeronautical and Automotive Engineering (ICMAA), 2015
Topics: Mechanical Engineering Applied Mechanics Automation Biomechanics Computational Fluid Dynamics Design and Manufacturing Energy Management Fluid Dynamics Fuels and Combustion Green Manufacturing Heat and Mass Transfer Industrial Tribology Instrumentation and Control Internal Combustion Engines Mechatronics Micro-Machining Modeling of Processes Nano- Technology Optimization of Systems Renewable and Non-Renewable Energies Reverse Engineering Robotics Solid Mechanics Oil and […]
Jun, 17
RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles
RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU’s). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to […]
Jun, 17
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA
OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application’s execution with much less effort. Coprocessors offer significant computing power […]
Jun, 17
GPU-Enabled Particle-Particle Particle-Tree Scheme for Simulating Dense Stellar Cluster System
We describe the implementation and performance of the P^3T (Particle-Particle Particle-Tree) scheme for simulating dense stellar systems. In P^3T, the force experienced by a particle is split into short-range and long-range contributions. Short-range forces are evaluated by direct summation and integrated with the fourth order Hermite predictor-corrector method with the block timesteps. For long-range forces, […]
Jun, 17
Automatic Data Layout Optimizations for GPUs
Memory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, […]
Jun, 17
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving. Recently, stixels, stix-mantics, and tiered scene labeling methods have been proposed to model street view images. We propose a 4-layer street view model, a compact representation over the recently proposed stix-mantics model. Our layers […]
Jun, 16
Perfect Hashing Structures for Parallel Similarity Searches
Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementation. We propose a 2-stage index structure which is based on neighborhood indexing and perfect […]
Jun, 16
Falcon: A Graph Manipulation Language for Heterogeneous Systems
Graph algorithms are used in several domains such as social networking, biological sciences, computational geometry, and compilers, to name a few. It has been shown that they possess enough parallelism to keep several computing resources busy – even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware […]
Jun, 16
Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sparse matrix-vector multiplication (SpMV) is a widely used kernel in scientific applications as well as data analytics. Many GPU implementations of SpMV have been proposed, proposing different sparse matrix representations. However, no sparse matrix representation is consistently superior, and the best representation varies for sparse matrices with different sparsity patterns. In this paper we study […]
Jun, 16
GPU Predictor-Corrector Interior Point Method for Large-Scale Linear Programming
This master’s thesis concerns the implementation of a GPUaccelerated version of Mehrotra’s predictor-corrector interior point algorithm for large-scale linear programming (LP). The implementations are tested on LP problems arising in the financial industry, where there is high demand for faster LP solvers. The algorithm was implemented in C++, MATLAB and CUDA, using double precision for […]
Jun, 16
Parallelization of DIRA and CTmod using OpenMP and OpenCL
Parallelization is the answer to the ever-growing demands of computing power by taking advantage of multi-core processor technology and modern many-core graphics compute units. Multi-core CPUs and many-core GPUs have the potential to substantially reduce the execution time of a program but it is often a challenging task to ensure that all available hardware is […]
Jun, 14
Automatic Selection of Sparse Matrix Representation on GPUs
Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse representations and aiming at maximizing overall performance. No single sparse matrix representation is uniformly superior, and the best performing representation varies for sparse […]