Posts
Feb, 1
Productive and Efficient Computational Science Through Domain-specific Abstractions
In an ideal world, scientific applications are computationally efficient, maintainable and composable and allow scientists to work very productively. We argue that these goals are achievable for a specific application field by choosing suitable domain-specific abstractions that encapsulate domain knowledge with a high degree of expressiveness. This thesis demonstrates the design and composition of domain-specific […]
Feb, 1
Performance Analysis and Optimization of a Distributed Processing Framework for Data Mining Accelerated with Graphics Processing Units
In this age, a huge amount of data is generated every day by human interactions with services. Discovering the patterns of these data are very important to take business decisions. Due to the size of this data, it requires very high intensive computation power. Thus, many frameworks have been developed using Central Processing Units (CPU) […]
Jan, 30
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support […]
Jan, 30
OpenCL Implementation of LiDAR Data Processing
When designing a safety system, the faster the response time, the greater the reflexes of the system to hazards. As more commercial interest in autonomous and assisted vehicles grows, the number one concern is safety. If the system cannot react as fast as or faster than an average human, then the public will deem it […]
Jan, 30
Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture
The objective of this paper is to use different optimization strategies on multicore GPU architecture. Here for performance evaluation we have used parallel reduction algorithm. GPU on-chip shared memory is very fast than local and global memory. Shared memory latency is roughly 100x lower than non-cached global memory (make sure that there are no bank […]
Jan, 30
Accelerate micromagnetic simulations with GPU programming in MATLAB
A finite-difference Micromagnetic simulation code written in MATLAB is presented with Graphics Processing Unit (GPU) acceleration. The high performance of Graphics Processing Unit (GPU) is demonstrated compared to a typical Central Processing Unit (CPU) based code. The speed-up of GPU to CPU is shown to be greater than 30 for problems with larger sizes on […]
Jan, 30
Design Space Exploration of OpenCL Applications on Heterogeneous Parallel Platforms
Parallel programming is a skill which software engineers no longer can do without, since multi- and many-core architectures have been widely adopted for general-purpose computing platforms. In 2006 Intel introduced the first multi-core processor on the consumer market and, at the same time, NVIDIA unveiled CUDA, a programming paradigm to exploit Graphics Processing Units (GPUs) […]
Jan, 28
maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures using a single kernel. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but […]
Jan, 28
Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic
Numerical continuation methods apply predictor-corrector algorithms to track a solution path defined by a family of systems, the so-called homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of […]
Jan, 28
GPU Programming – Speeding Up the 3D Surface Generator VESTA
The novel "Volume-Enclosing Surface exTraction Algorithm" (VESTA) generates triangular isosurfaces from computed tomography volumetric images and/or three-dimensional (3D) simulation data. Here, we present various benchmarks for GPU-based code implementations of both VESTA and the current state-of-the-art Marching Cubes Algorithm (MCA). One major result of this study is that VESTA runs significantly faster than the MCA.
Jan, 28
On Longest Repeat Queries Using GPU
Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by Ileri et al., using a total of the optimal O(n) time and space, where n is the string size. However, their solution can only find […]
Jan, 28
The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing
The risk of reinsurance portfolios covering globally occurring natural catastrophes, such as earthquakes and hurricanes, is quantified by employing simulations. These simulations are computationally intensive and require large amounts of data to be processed. The use of many-core hardware accelerators, such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are desirable […]