Posts
Jun, 19
Autotuning Tensor Contraction Computations on GPUs
We describe a framework for generating optimized GPU code for computing tensor contractions, a multidimensional generalization of matrix-matrix multiplication that arises frequently in computational science applications. Typical performance optimization strategies for such computations transform the tensors into sequences of matrix-matrix multiplications to take advantage of an optimized BLAS library, but this approach is not appropriate […]
Jun, 19
Study of Sparse-Matrix Vector Multiplication (SpMV) on Different Architectures and Libraries
With the advent of parallel processing architectures and a steep increase in parallelism found among the recent applications, GPGPUs have gained attention with respect to their importance in the execution of these applications. In this document, we specifically analyze Sparse-Matrix Vector Multiplication(SPMV) across different architectures, libraries and matrix formats. The experimental platforms include but are […]
Jun, 19
Bulk GCD Computation Using a GPU to Break Weak RSA Keys
RSA is one the most well-known public-key cryptosystems widely used for secure data transfer. An RSA encryption key includes a modulus n which is the product of two large prime numbers p and q. If an RSA modulus n can be decomposed into p and q, the corresponding decryption key can be computed easily from […]
Jun, 19
Parallel BTF Compression with Multi-Level Vector Quantization in OpenCL
Bidirectional Texture Function (BTF) as an effective visual fidelity representation of surface appearance is becoming more and more widely used. In this paper we report on contributions to BTF data compression for multi-level vector quantization. We describe novel decompositions that improve the compression ratio by 15% in comparison with the original method, without loss of […]
Jun, 19
Accelerated dimension-independent adaptive Metropolis
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known adaptive Metropolis (AM) algorithm of (Haario etal. 2001) is extended herein to scale asymptotically uniformly with respect to the underlying parameter dimension for Gaussian targets, by respecting the variance of the target. The resulting algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) […]
Jun, 19
Visualization of OpenCL Application Execution on CPU-GPU Systems
Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of […]
Jun, 17
2nd International Conference on Mechanical, Aeronautical and Automotive Engineering (ICMAA), 2015
Topics: Mechanical Engineering Applied Mechanics Automation Biomechanics Computational Fluid Dynamics Design and Manufacturing Energy Management Fluid Dynamics Fuels and Combustion Green Manufacturing Heat and Mass Transfer Industrial Tribology Instrumentation and Control Internal Combustion Engines Mechatronics Micro-Machining Modeling of Processes Nano- Technology Optimization of Systems Renewable and Non-Renewable Energies Reverse Engineering Robotics Solid Mechanics Oil and […]
Jun, 17
RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles
RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU’s). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to […]
Jun, 17
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA
OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application’s execution with much less effort. Coprocessors offer significant computing power […]
Jun, 17
GPU-Enabled Particle-Particle Particle-Tree Scheme for Simulating Dense Stellar Cluster System
We describe the implementation and performance of the P^3T (Particle-Particle Particle-Tree) scheme for simulating dense stellar systems. In P^3T, the force experienced by a particle is split into short-range and long-range contributions. Short-range forces are evaluated by direct summation and integrated with the fourth order Hermite predictor-corrector method with the block timesteps. For long-range forces, […]
Jun, 17
Automatic Data Layout Optimizations for GPUs
Memory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, […]
Jun, 17
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving. Recently, stixels, stix-mantics, and tiered scene labeling methods have been proposed to model street view images. We propose a 4-layer street view model, a compact representation over the recently proposed stix-mantics model. Our layers […]