Posts
Dec, 30
Characterization of OpenCL on a Scalable FPGA Architecture
The recent release of Altera’s SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for […]
Dec, 30
Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware […]
Dec, 30
A Tool for Automatic Suggestions for Irregular GPU Kernel Optimization
Future computing systems, from handhelds all the way to supercomputers, will be more parallel and more heterogeneous than today’s systems to provide more performance without an increase in power consumption. Therefore, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. The growing complexity, non-uniformity, […]
Dec, 30
Spectral classification using convolutional neural networks
There is a great need for accurate and autonomous spectral classification methods in astrophysics. This thesis is about training a convolutional neural network (ConvNet) to recognize an object class (quasar, star or galaxy) from one-dimension spectra only. Author developed several scripts and C programs for datasets preparation, preprocessing and post-processing of the data. EBLearn library […]
Dec, 30
How to Correctly Deal With Pseudorandom Numbers in Manycore Environments – Application to GPU programming with Shoverand
Stochastic simulations are often sensitive to the source of randomness that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by relying more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such […]
Dec, 30
To Use or Not to Use: Graphics Processing Units for Pattern Matching Algorithms
String matching is an important part in today’s computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching applications using the Aho-Corasick algorithm as a benchmark. We have to identify the best unit to run […]
Dec, 26
Automatic Tuning of Local Memory Use on GPGPUs
The use of local memory is important to improve the performance of OpenCL programs. However, its use may not always benefit performance, depending on various application characteristics, and there is no simple heuristic for deciding when to use it. We develop a machine learning model to decide if the optimization is beneficial or not. We […]
Dec, 26
Accelerating Correlation Power Analysis Using Graphics Processing Units
Correlation Power Analysis (CPA) is a type of power analysis based side channel attack that can be used to derive the secret key of encryption algorithms including DES (Data Encryption Standard) and AES (Advanced Encryption Standard). A typical CPA attack on unprotected AES is performed by analysing a few thousand power traces that requires about […]
Dec, 26
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final […]
Dec, 26
Computationally Efficient Implementation of a Hamming Code Decoder using a Graphics Processing Unit
This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio (SDR), which is a software alternative for realizing wireless communication. The Hamming code algorithm is challenging to parallelize effectively on a GPU because it works on sparsely located data items with several […]
Dec, 26
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution implementations: one based on NVIDIA’s cuFFT library, and another based on a Facebook authored FFT implementation, fbfft, that provides significant speedups over cuFFT (over 1.5x) for whole CNNs. […]
Dec, 22
Legion: Programming Distributed Heterogeneous Architectures with Logical Regions
This thesis covers the design and implementation of Legion, a new programming model and runtime system for targeting distributed heterogeneous machine architectures. Legion introduces logical regions as a new abstraction for describing the structure and usage of program data. We describe how logical regions provide a mechanism for applications to express important properties of program […]