Posts
Sep, 3
A Real-time Coherent Dedispersion Pipeline for the Giant Metrewave Radio Telescope
A fully real-time coherent dedispersion system has been developed for the pulsar back-end at the Giant Metrewave Radio Telescope (GMRT). The dedispersion pipeline uses the single phased array voltage beam produced by the existing GMRT software back-end (GSB) to produce coherently dedispersed intensity output in real time, for the currently operational bandwidths of 16 MHz […]
Sep, 3
A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example
Modern SoC-FPGA that consists of FPGA with embedded ARM cores is being popularized as an embedded vision system platform. However, the design approach of SoC-FPGA applications still follows traditional hardware-software separate workflow, which becomes the barrier of rapid product design and iteration on SoC-FPGA. High-Level Synthesis (HLS) and OpenCL-based system-level design approaches provide programmers the […]
Sep, 3
OpenCL 2.0 for FPGAs using OCLAcc
Designing hardware is a time-consuming and complex process. Realization of both, embedded and high-performance applications can benefit from a design process on a higher level of abstraction. This helps to reduce development time and allows to iteratively test and optimize the hardware design during development, as common in software development. We present our tool, OCLAcc, […]
Sep, 3
Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU
Memory performance is of great importance to achieve high performance on the Nvidia CUDA GPU. Previous work has proposed specific optimizations such as thread coarsening, caching data in shared memory, and global data layout transformation. We argue that vectorization based on hyper loop parallelism can be used as a unified technique to optimize the memory […]
Aug, 31
Partitioning Large Scale Deep Belief Networks Using Dropout
Deep learning methods have shown great promise in many practical applications, ranging from speech recognition, visual object recognition, to text processing. However, most of the current deep learning methods suffer from scalability problems for large-scale applications, forcing researchers or users to focus on small-scale problems with fewer parameters. In this paper, we consider a well-known […]
Aug, 31
A parallel algorithm for implicit depletant simulations
We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded […]
Aug, 31
An Asynchronous Event Communication Technique for Soft Real-Time GPGPU Applications
CONTEXT. Interactive GPGPU applications requires low response time feedback from events such as user input in order to provide a positive user experience. Communication of these events must be performed asynchronously as to not cause significant performance penalties. OBJECTIVES. In this study the usage of CPU/GPU shared virtual memory to perform asynchronous communication is explored. […]
Aug, 31
A GPU-accelerated local search algorithm for the Correlation Clustering problem
The solution of the Correlation Clustering (CC) problem can be used as a criterion to measure the amount of balance in signed social networks, where positive (friendly) and negative (antagonistic) interactions take place. Metaheuristics have been used successfully for solving not only this problem, as well as other hard combinatorial optimization problems, since they can […]
Aug, 31
Dynamic Memory Allocation for OpenCL
Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. […]
Aug, 28
Boosting Java Performance using GPGPUs
Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows […]
Aug, 28
VisPy: Harnessing The GPU For Fast, High-Level Visualization
The growing availability of large, multidimensional data sets has created demand for high-performance, interactive visualization tools. VisPy leverages the GPU to provide fast, interactive, and beautiful visualizations in a high-level API. Here we introduce the main features, architecture, and techniques used in VisPy.
Aug, 28
High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting
In this thesis we design, implement and study a high-speed object detection framework. Our baseline detector uses integral channel features as object representation and AdaBoost as supervised learning algorithm. We suggest the implementation of two approximation techniques for speeding up the baseline detector and show their effectiveness by performing experiments on both detection quality and […]