16388

Posts

Aug, 11

9th International Conference on Machine Learning and Computing (ICMLC), 2017

Paper Publication ICMLC 2017 proceedings will be published in the International Conference Proceedings Series by ACM, which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). Proceedings ISBN: 978-1-4503-4783-9 Submission Methods You can […]
Aug, 11

International Conference on Bioinformatics and Computing Technologies (ICBCT), 2017

Publication All papers accepted by this conference will be published by International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB) / International Journal of Machine Learning and Computing (IJMLC), and will be submitted to EI(INSPEC) to include. Submission Please submit your full paper to us:icbct@saise.org
Aug, 11

9th International Conference on Computer and Automation Engineering (ICCAE), 2017

Publication All accepted papers of ICCAE 2017 will be published in the International Conference Proceedings Series by ACM, (ISBN: 978-1-4503-4791-4), which will be archived in the ACM Digital Library, and indexed by Ei Compendex and Scopus and submitted to be reviewed by Thomson Reuters Conference Proceedings Citation Index (ISI Web of Science). ICCAE 2016 conference […]
Aug, 8

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today’s platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But […]
Aug, 8

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on many-core architectures. This paper presents first performance results of the particle-in-cell plasma simulation code […]
Aug, 8

Accelerating Computational Finance Simulations with OpenCL

Computational finance is a domain, where performance is in high demand. Therefore, we investigate the suitability of two families of accelerators for computational finance simulations. Specifically, we use a scenario-based ALM (Asset Liability Management) model and design a suitable OpenCL implementation. We further improve the performance of the application by applying several typical optimization techniques […]
Aug, 8

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over […]
Aug, 8

OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory

We present a method to classify objects in video streams using a brain-inspired Hierarchical Temporal Memory (HTM) algorithm. Object classification is a challenging task where humans still significantly outperform machine learning algorithms due to their unique capabilities. We have implemented a system which achieves very promising performance in terms of recognition accuracy. Unfortunately, conducting more […]
Aug, 5

Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Adaptive Mesh Refinement methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. This paper […]
Aug, 4

Parallel experiments with RARE-BLAS

Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. […]
Aug, 4

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be […]
Aug, 4

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. On platforms with multiple CPUs, Cilk’s "work-first" principle underlies how task-parallel applications can achieve performance, but work-first is a poor fit for GPUs. We build upon work-first to create the "work-together" principle that addresses the specific strengths […]
Page 40 of 921« First...102030...3839404142...506070...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: