Posts
Aug, 8
CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression
We present High-dimensional Overdetermined Laplacian Partial Differential Equations (HD-ODETLAP), a high dimensional lossy compression algorithm and CUDA implementation that exploits data correlations across multiple dimensions of gridded GIS data. Exploiting the GPU gives a considerable speedup. In addition, HD-ODETLAP compresses much better than JPEG2000 and 3D-SPIHT, when fixing either the average or the maximum error.
Aug, 8
Policy-based Tuning for Performance Portability and Library Co-optimization
Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon. For GPU software assets to be reusable across problem instances, they must be inherently flexible and tunable. To illustrate, we survey the performance-portability landscape for a suite of common GPU primitives, evaluating thousands of reasonable program variants across […]
Aug, 8
Large Scale Finite Element Analysis Using GPU Parallel Computing
In the past years, graphic processing units have become a new abundant parallelcomputing resource on personal computers. In this work parallel computation ofa typical case in nite element analysis for solids has been practiced. The solutionof 3-D linear elastic static problems with 3 degree of freedom is fully implementedutilizing the current GPU technology. Discretization of […]
Aug, 8
Using GPU-based Computing To Accelerate Finite Element Problems
Historically Graphics Processing Units (GPU) have been used for offloading graphical visualization and made popular in use for video games, but with the development of NVIDIA’s CUDA architecture and programing language there has been an increase in the use of GPUs in general purpose (GPGPU) programing. Problems involving large systems of linear equations, such as […]
Aug, 7
Efficient Algorithms for Sorting on GPUs
Sorting is an important problem in computing that has a rich history of investigation by various researchers. In this thesis we focus on this vital problem. In particular, we develop a novel algorithm for sorting on Graphics Processing Units (GPUs). GPUs are multicore architectures that offer the potential of affordable parallelism. We present an efficient […]
Aug, 7
Efficient Monte Carlo sampler for detecting parametric objects in large scenes
Point processes have demonstrated efficiency and competitiveness when addressing object recognition problems in vision. However, simulating these mathematical models is a difficult task, especially on large scenes. Existing samplers suffer from average performances in terms of computation time and stability. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits […]
Aug, 7
Landau Gauge Fixing on GPUs and String Tension
We explore the performance of CUDA in performing Landau gauge fixing in Lattice SU(3), using the steepest descent method with Fourier acceleration. The code performance was tested in a Tesla C2070, Fermi architecture. We also present a study of the string tension at finite temperature in the confined phase. The string tension is extracted from […]
Aug, 7
CuBA – a CUDA implementation of BAMPS
Using CUDA as programming language, we create a code named CuBA which is based on the CPU code "Boltzmann Approach for Many Parton Scattering (BAMPS)" developed in Frankfurt in order to study a system of many colliding particles resulting from heavy ion collisions. Furthermore, we benchmark our code with the Riemann Problem and compare the […]
Aug, 7
Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems
We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using highly-parallel Graphics Processing Unit (GPU), such as NVIDIA’s Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body simulations with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands […]
Aug, 6
Accelerating Cryptographic Primitives with GPUs
In this paper, we review the current state-of-the-art in accelerating cryptographic and other computer-security-related primitives using graphics processing units and provide a critical analysis of the appropriateness of graphics accelerators to this task. Generalpurpose programming of graphics processing units (GPGPUs) has garnered much attention recently in the high-performance computing community, as it offers orders-of-magnitude performance […]
Aug, 6
Coordinated system level resource management for heterogeneous many-core platforms
A challenge posed by future computer architectures is the efficient exploitation of their many and sometimes heterogeneous computational cores. This challenge is exacerbated by the multiple facilities for data movement and sharing across cores resident on such platforms. To answer the question of how systems software should treat heterogeneous resources, this dissertation describes an approach […]
Aug, 6
Comparison of OpenMP & OpenCL Parallel Processing Technologies
This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing […]