Posts
Jan, 16
Programming on Parallel Machines: GPU, Multicore, Clusters and More
This open-source textbook on parallel programming is aimed more on the practical end of things, in that: There is very little theoretical content, such as O() analysis, maximum theoretical speedup, acyclic graphs and so on; Real code is featured throughout; We use the main parallel platforms-OpenMP, CUDA and MPI-rather than languages that at this stage […]
Jan, 16
FPGA Based Acceleration of Decimal Operations
Field Programmable Gate-Arrays (FPGAs) can efficiently implement application specific processors in non-conventional number systems, such as the decimal (Binary-Coded Decimal, or BCD) number system required for accounting accuracy in financial applications. The main purpose of this work is to show that applications requiring several decimal (BCD) operations can be accelerated by a processor implemented on […]
Jan, 16
A Modular System Architecture for Online Parallel Vision Pipelines
We present an architecture for real-time, online vision systems which enables development and use of complex vision pipelines integrating any number of algorithms. Individual algorithms are implemented using modular plugins, allowing integration of independently developed algorithms and rapid testing of new vision pipeline configurations. The architecture exploits the parallelization of graphics processing units (GPUs) and […]
Jan, 16
Fast Regularization of Matrix-Valued Images
Regularization of matrix-valued data is of importance in medical imaging, motion analysis and scene understanding. In this report we describe a novel method for efficient regularization of matrix group-valued images. Using the augmented Lagrangian framework we separate the total-variation regularization of matrix-valued images into a regularization and projection steps, both of which are fast and […]
Jan, 16
Theano: Deep Learning on GPUs with Python
In this paper, we present Theano, a framework in the Python programming language for defining, optimizing and evaluating expressions involving high-level operations on tensors. Theano offers most of NumPy’s functionality, but adds automatic symbolic differentiation, GPU support, and faster expression evaluation. Theano is a general mathematical tool, but it was developed with the goal of […]
Jan, 16
A Fast Jet Finder Algorithm Using Graphic Processing Unit
A collimated emission of hadrons usually called Jet is the experimental counterparts of the partons (quarks and gluons) which are not observed separately. The CMS detector at LHC is ideally designed to study jet tomography which is an important probe to investigate the hot and dense medium formed during the heavy ion collisions. Although CMS […]
Jan, 16
Data Triage and Visual Analytics for Scientific Visualization
As the speed of computers continues to increase at a very fast rate, the size of data generated from scientific simulations has now reached petabytes ($10^{12}$ bytes) and beyond. Under such circumstances, no existing techniques can be used to perform effective data analysis at a full precision. To analyze large scale data sets, visual analytics […]
Jan, 16
Feature Extraction and Visualization from Higher-Order CFD Data
Computational fluid dynamics (CFD) methods have been employed in the studies of subjects such as aeroacoustics, gas dynamics, turbo machinery, viscoelastic fluids, among others. However, the need for accuracy and high performance resulted in methods whose solutions are becoming increasingly more complex. In this context, feature extraction and visualization methods play a key role, making […]
Jan, 16
Parallel AES Encryption Engines for Many-Core Processor Arrays
By exploring different granularities of data-level and task-level parallelism, we map 16 implementations of an Advanced Encryption Standard (AES) encipher with both online and offline key expansion on a fine-grained many-core system. The smallest design utilizes only 6 cores for offline key expansion and 8 cores for online key expansion, while the largest requires 107 […]
Jan, 16
Nested Intervals Tree Encoding with System of Residual Classes
This paper describes one of ways to represent tree-like structures in Data Bases. Authors suggest to expand V. Tropashko nested intervals model. Numbers in intervals are described in system of residual classes. It allows to avoid storage of big numbers and easily implemented by parallel-programming algorithms.
Jan, 15
Spectral Method Characterization on FPGA and GPU Accelerators
As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing highperformance computing demands, particularly with respect to performance, power and productivity. This paper compares the sustained performance of a complex, single precision, floating-point, […]
Jan, 15
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction application
BACKGROUND: Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or […]