Posts
Sep, 22
A case for neuromorphic ISAs
The desire to create novel computing systems, paired with recent advances in neuroscientific understanding of the brain, has led researchers to develop neuromorphic architectures that emulate the brain. To date, such models are developed, trained, and deployed on the same substrate. However, excessive co-dependence between the substrate and the algorithm prevents portability, or at the […]
Sep, 22
Acceleration of the speed of tissue characterization algorithm for coronary plaque by employing GPGPU technique
The general purpose computation technique on Graphics Processing Unit (GPGPU) has got into the limelight recently. The authors have proposed the multiple k-nearest neighbor (MkNN) classifier for the tissue characterization of coronary plaque. Its characterization performance is highly evaluated. The purpose of this paper is to accelerate the speed of MkNN classifier aiming for it […]
Sep, 21
Reconstructing hash reversal based proof of work schemes
Proof of work schemes use client puzzles to manage limited resources on a server and provide resilience to denial of service attacks. Attacks utilizing GPUs to inflate computational capacity, known as resource inflation, are a novel and powerful threat that dramatically increase the computational disparity between clients. This disparity renders proof of work schemes based […]
Sep, 21
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple […]
Sep, 21
Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs
Fourier Domain Optical Coherence Tomography (FD-OCT) is an emerging biomedical imaging technology featuring ultra-high resolution and fast imaging speed. Due to the complexity of the FD-OCT algorithm, real time FD-OCT imaging demands high performance computing platforms. However, the scaling of real-time FD-OCT processing for increasing data acquisition rates and 3-dimensional (3D) imaging is quickly outpacing […]
Sep, 21
Non-deterministic parallelism considered useful
The development of distributed execution engines has greatly simplified parallel programming, by shielding developers from the gory details of programming in a distributed system, and allowing them to focus on writing sequential code [8, 11, 18]. The "sacred cow" in these systems is transparent fault tolerance, which is achieved by dividing the computation into atomic […]
Sep, 21
Parallel graduated assignment algorithm for multiple graph matching based on a common labelling
This paper presents a new parallel algorithm to compute multiple graph-matching based on the Graduated Assignment. The aim of developing this parallel algorithm is to perform multiple graph matching in a current desktop computer, but, instead of executing the code in the generic processor, we execute a parallel code in the graphic processor unit. Our […]
Sep, 21
GPU-based cloud performance for LiDAR data processing
Goal of this paper is to compare the timing/performance results of CPU and GPU on local and Cloud platform for processing massive Light Detecting and Ranging (LiDAR) topographic data. We have used locally various multi-core CPU technologies as well as GPU implementations on various graphics cards of nVidia which support CUDA, where as a cloud […]
Sep, 21
Multicore performance optimization using partner cores
As the push for parallelism continues to increase the number of cores on a chip, system design has become incredibly complex; optimizing for performance and power efficiency is now nearly impossible for the application programmer. To assist the programmer, a variety of techniques for optimizing performance and power at runtime have been developed, but many […]
Sep, 21
Mint: realizing CUDA performance in 3D stencil methods with annotated C
We present Mint, a programming model that enables the non-expert to enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint targets stencil methods, which are an important class of scientific applications. We have implemented the Mint programming model with a source-to-source translator that generates optimized CUDA C from traditional […]
Sep, 20
Scalable heterogeneous parallelism for atmospheric modeling and simulation
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the […]
Sep, 20
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s computer systems have tremendous […]