Posts
Sep, 18
Sparse Matrix Algorithms Using GPGPU
The purpose of this thesis was to benchmark and compare different representations of sparse matrices and algorithms for multiplying them with a vector. Also, to see the performance differences of running the algorithms on a CPU and GPU(s). Four different storage formats were tested – full matrix storage, coordinate storage (COO), ELLPACK (ELL), compressed sparse […]
Sep, 18
Acceleration of recovery simulation on big model using GPU
Software that calculate different scenarios of field development play important role in petroleum industry. Increasing number of cells in the simulation grid significantly slows down the calculations. In order to obtain accuracy results it is necessary to spend a lot of time for the simulations (days or weeks) or use expensive high-performance systems or supercomputers. […]
Sep, 18
A GPU-based Parallel Procedure for Nonlinear Analysis of Complex Structures Using a Coupled FEM/DEM Approach
This study reports the GPU parallelization of complex three-dimensional software for nonlinear analysis of concrete structures. It focuses on coupled thermo-mechanical analysis of complex structures. A coupled FEM/DEM approach (CDEM) is given from a fundamental theoretical viewpoint. As the modeling of a large structure by means of FEM/DEM may lead to prohibitive computation times, a […]
Sep, 18
A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)
The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ideally one would like to do thousand-year long simulations, but the current performance of POP prohibits this type of simulations. In this work, using a new distributed computing approach, two innovations to improve the performance of POP are presented. The first […]
Sep, 17
Parallel Motion Estimation Implementation for Different Block Matching Algorithms onto GPGPU
This work presents an efficient method to map Motion Estimation (ME) algorithms onto General Purpose Graphic Processing Unit (GPGPU) architectures using CUDA programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelization potential of ME algorithms: Full Search (FS) and Diamond Search (DS). Our main goal is to […]
Sep, 17
Kokkos: Enabling performance portability across manycore architectures
The manycore revolution in computational hardware can be characterized by increasing thread counts, decreasing memory per thread, and architecture specific performance constraints for memory access patterns. High performance computing (HPC) on emerging manycore architectures requires codes to exploit every opportunity for thread-level parallelism and satisfy conflicting performance constraints. We developed the Kokkos C++ library to […]
Sep, 17
High Throughput Low Latency LDPC Decoding on GPU for SDR Systems
In this paper, we present a high throughput and low latency LDPC (low-density parity-check) decoder implementation on GPUs (graphics processing units). The existing GPU-based LDPC decoder implementations suffer from low throughput and long latency, which prevent them from being used in practical SDR (software-defined radio) systems. To overcome this problem, we present optimization techniques for […]
Sep, 17
Parallel Computing Methods For Particle Accelerator Design
We present methods for parallelizing the transport map construction for multi-core processors and for Graphics Processing Units (GPUs). We provide an efficient implementation of the transport map construction. We describe a method for multi-core processors using the OpenMP framework which brings performance improvement over the serial version of the map construction. We developed a novel […]
Sep, 17
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers
The continuum theory applied to bimolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and […]
Sep, 16
Optimal Configuration of GPU Cache Memory to Maximize the Performance
GPU devices offer great performance when dealing with algorithms that require intense computational resources. A developer can configure the L1 cache memory of the latest GPU Kepler architecture with different cache size and cache set associativity, per Streaming Multiprocessors (SM). The performance of the computation intensive algorithms can be affected by these cache parameters. In […]
Sep, 16
Run-time Image and Video Resizing Using CUDA-enabled GPUs
A recently proposed approach, called seam carving, has been widely used for content-aware resizing of images and videos with little to no perceptible distortion. Unfortunately, for high-resolution videos and large images it is not computationally feasible to do the resizing in real-time using small-scale CPU systems. In this paper, we exploit highly parallel computational capabilities […]
Sep, 16
On the Performance and Energy-efficiency of Multi-core SIMD CPUs and CUDA-enabled GPUs
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs using a set of kernels and full applications. Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs. We discuss general optimization techniques for our CPU-only and CPU-GPU platforms. To fairly […]