16401

Posts

Aug, 11

CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms

Off-the-shelf accelerator-based embedded platforms offer a competitive energy-efficient solution for lightweight deep learning computations over CPU-based systems. Low-complexity classifiers used in power-constrained and performance-limited scenarios are characterized by operations on small image maps with 2-3 deep layers and few class labels. For these use cases, we consider a range of embedded systems with 5-20 W […]
Aug, 11

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an […]
Aug, 11

9th International Conference on Machine Learning and Computing (ICMLC), 2017

Paper Publication ICMLC 2017 proceedings will be published in the International Conference Proceedings Series by ACM, which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). Proceedings ISBN: 978-1-4503-4783-9 Submission Methods You can […]
Aug, 11

International Conference on Bioinformatics and Computing Technologies (ICBCT), 2017

Publication All papers accepted by this conference will be published by International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB) / International Journal of Machine Learning and Computing (IJMLC), and will be submitted to EI(INSPEC) to include. Submission Please submit your full paper to us:icbct@saise.org
Aug, 11

9th International Conference on Computer and Automation Engineering (ICCAE), 2017

Publication All accepted papers of ICCAE 2017 will be published in the International Conference Proceedings Series by ACM, (ISBN: 978-1-4503-4791-4), which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). ICCAE 2016 conference […]
Aug, 8

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on many-core architectures. This paper presents first performance results of the particle-in-cell plasma simulation code […]
Aug, 8

Accelerating Computational Finance Simulations with OpenCL

Computational finance is a domain, where performance is in high demand. Therefore, we investigate the suitability of two families of accelerators for computational finance simulations. Specifically, we use a scenario-based ALM (Asset Liability Management) model and design a suitable OpenCL implementation. We further improve the performance of the application by applying several typical optimization techniques […]
Aug, 8

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today’s platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But […]
Aug, 8

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over […]
Aug, 8

OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory

We present a method to classify objects in video streams using a brain-inspired Hierarchical Temporal Memory (HTM) algorithm. Object classification is a challenging task where humans still significantly outperform machine learning algorithms due to their unique capabilities. We have implemented a system which achieves very promising performance in terms of recognition accuracy. Unfortunately, conducting more […]
Aug, 5

Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Adaptive Mesh Refinement methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. This paper […]
Aug, 4

Parallel experiments with RARE-BLAS

Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. […]
Page 31 of 912« First...1020...2930313233...405060...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: