Posts
Nov, 20
Dataflow-driven GPU performance projection for multi-kernel transformations
Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one […]
Nov, 20
Accelerating MapReduce on a coupled CPU-GPU architecture
The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, […]
Nov, 20
A scalable, numerically stable, high-performance tridiagonal solver using GPUs
In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to […]
Nov, 20
MPC Toolbox with GPU Accelerated Optimization Algorithms
The introduction of Graphical Processing Units (GPUs) in scientific computing has shown great promise in many different fields. While GPUs are capable of very high floating point performance and memory bandwidth, its massively parallel architecture requires algorithms to be reimplemented to suit the different architecture. Interior point method can be used to solve convex optimization […]
Nov, 20
Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs
The topic of this thesis is GPU accelerated sparse linear algebra for subsurface reservoir modeling. Numerical techniques for reservoir sim- ulations are described and we present the open source reservoir simulation software toolbox MRST. We discuss some of the challenges related to linear algebra and reservoir simulation. Furthermore, we discuss the possibility GPU-acceleraing the linear […]
Nov, 19
CUDA-enabled Optimisation of Technical Analysis Parameters
The optimisation of Technical Trading parameters is a computationally intensive exercise. Models comprising a modest number of Technical Indicators require many thousands of simulations to be executed over a sample period of data, with the best performing sets of parameters employed to generate future trading signals. The purpose of this research is to investigate the […]
Nov, 19
Modern GPGPU Frameworks and their Application to the Physical Core of the ASUCA Weather Prediction Model
One of today’s biggest challenges in the field of high performance computing is the efficient exploitation of the heavily increasing parallelism on socket level, especially when both CPU and GPU resources are to be applied – a challenge becoming very real for the physical processes of ASUCA. ASUCA is the Japan Meteorological Agency’s next-generation weather […]
Nov, 19
Parallel Search of k-Nearest Neighbors with Synchronous Operations
We present a new study of parallel algorithms for locating k-nearest neighbors (kNN) of each single query in a high dimensional (feature) space on a many-core processor or accelerator that favors synchronous operations, such as on a graphics processing unit. Exploiting the intimate relationships between two primitive operations, select and sort, we introduce a cohort […]
Nov, 19
Criticality of the XY model in complex topologies
The critical behavior of the O(2) model on dilute Levy graphs built on a 2D square lattice is analyzed. Different qualitative cases are probed, varying the exponent rho governing the dependence on the distance of the connectivity probability distribution. The mean-field regime, as well as the long-range and short-range non-mean-field regimes are investigated by means […]
Nov, 19
Accelerated molecular dynamics force evaluation on graphics processing units for thermal conductivity calculations
In this paper, we develop a highly efficient molecular dynamics code fully implemented on graphics processing units for thermal conductivity calculations using the Green-Kubo formula. We compare two different schemes for force evaluation, a previously used thread-scheme where a single thread is used for one particle and each thread calculates the total force for the […]
Nov, 18
Auto-tunable GPU BLAS (thesis)
In this paper, we present our implementation of an Auto tuning system, written in C++, which incorporate the use of OpenCL kernels. We deploy this approach on different GPU architectures, evaluating the performance of the approach. Our main focus is to easily generate tuned code, that would otherwise require a large amount of empirical testing, […]
Nov, 18
Facial Recognition Using Neural Networks over GPGPU
This article introduces a parallel neural network approach implemented over Graphic Processing Units (GPU) to solve a facial recognition problem, which consists in deciding where the face of a person in a certain image is pointing. The proposed method uses the parallel capabilities of GPU in order to train and evaluate a neural network used […]