Posts
Aug, 18
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual […]
Aug, 16
Automatic Generation of OpenCL Code for ARM Architectures
The efficient exploitation of the increasing computational capabilities of mobile devices is still a challenge. The heterogeneity of Systems on Chip (SoC) makes necessary a very specific knowledge of their hardware in order to harness their full potential. OpenCL is a well known standard for cross-platform usage of accelerator devices. We follow an annotation-based approach […]
Aug, 16
OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture
There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC […]
Aug, 16
Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment
This paper describes a convolutional neural network based deep learning approach for bird song classification that was used in an audio record-based bird identification challenge, called BirdCLEF 2016. The training and test set contained about 24k and 8.5k recordings, belonging to 999 bird species. The recorded waveforms were very diverse in terms of length and […]
Aug, 16
Learning Structured Sparsity in Deep Neural Networks
High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to […]
Aug, 16
Near Memory Similarity Search on Automata Processors
Embedded devices and multimedia applications today generate unprecedented volumes of data which must be indexed and made searchable. As a result, similarity search has become a critical idiom for many modern data intensive applications in natural language processing (NLP), vision, and robotics. At its core, similarity search is implemented using k-nearest neighbors (kNN) where computation […]
Aug, 15
3rd International Conference on Biomedical and Bioinformatics Engineering (ICBBE), 2016
Publication After a careful reviewing process by at least 2-3 experts, all accepted papers for the ICBBE 2016 will be published in: International Conference Proceedings Series by ACM, which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation […]
Aug, 15
5th International Conference on Environment, Chemistry and Biology (ICECB), 2016
Prof. Wei Yu, The University of Auckland, New Zealand will be our keynote speaker. Publishing in the Volume of Journal (IPCBEE, ISSN: 2010-4618), indexed by Ei Geobase (Elsevier). Submission Methods Email: icecb@cbees.org http://www.icecb.org
Aug, 15
2nd International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’16), 2016
With Exascale systems on the horizon at the same time that conventional von-Neumann architectures are suffering from rising power densities, we are facing an era with power, energy-efficiency, and cooling as first-class constraints for scalable HPC. FPGAs can tailor the hardware to the application, avoiding overheads of general-purpose architectures–for example, through customized datapaths and memory […]
Aug, 11
A Comparison of Potential Interfaces for Batched BLAS Computations
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem into thousands of small problems which can be solved independently. There is a clear need for a batched BLAS standard, allowing users to perform thousands of small BLAS operations in parallel and making efficient use of their hardware. There […]
Aug, 11
CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms
Off-the-shelf accelerator-based embedded platforms offer a competitive energy-efficient solution for lightweight deep learning computations over CPU-based systems. Low-complexity classifiers used in power-constrained and performance-limited scenarios are characterized by operations on small image maps with 2-3 deep layers and few class labels. For these use cases, we consider a range of embedded systems with 5-20 W […]
Aug, 11
Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL
Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an […]