15765

Posts

May, 4

3rd Intl. Conference on Soft Computing and Machine Intelligence, 2016

Topics of interest for submission include, but are not limited to: Advanced Intelligent Systems; Ant Colony Optimization and Swarm Intelligence; Artificial Immune Systems; Artificial Intelligence; Artificial Life; Associative Memory; Automatic Annotation; Bioinformatics and Biological Computing; Case-Based and Temporal Reasoning;   Conference Schedule November 23, 2016: Registration and Collecting conference materials November 24, 2016: Keynote speeches […]
May, 4

Post-Moore’s Era Supercomputing Workshop (PMES), 2016

This interdisciplinary workshop is organized to explore the scientific issues, challenges, and opportunities for supercomputing beyond the scaling limits of Moore’s Law, with the ultimate goal of keeping supercomputing at the forefront of computing technologies beyond the physical and conceptual limits of current systems. Continuing progress of supercomputing beyond the scaling limits of Moore’s Law […]
May, 4

International Conf. on System Engineering Management (ICSEM), 2016

Publication International Journal of Modeling and Optimization (ISSN: 2010-3697) Indexed by Engineering & Technology Digital Library, ProQuest, Crossref, Electronic Journals Library, DOAJ, Google Scholar, EI (INSPEC, IET). Conference Schedule 1) July 22, 2016—Conference Materials Collection 2) July 23, 2016—Keynote Speeches & Oral Presentations 3) July 24, 2016—One-day tour in Beijing Conference Chairs Prof. Dr. Chen-Huei […]
May, 4

2nd IEEE Int. Conf. on Control Science and Systems Engineering (CCSSE), 2016

Call for paper download: PDF The 2nd International Conference on Control Science and Systems Engineering (CCSSE 2016) will be held at Singapore, during July 27-29. Control Science and Systems Engineering are entering the second golden phase. These new challenges bring with it new research questions resulting in the need to stimulate the rapid awareness of […]
May, 3

Fast hyperbolic Radon transform represented as convolutions in log-polar coordinates

The hyperbolic Radon transform is a commonly used tool in seismic processing, for instance in seismic velocity analysis, data interpolation and for multiple removal. A direct implementation by summation of traces with different moveouts is computationally expensive for large data sets. In this paper we present a new method for fast computation of the hyperbolic […]
May, 3

TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing

Deep learning framework supported by CUDA parallel computing platform boosts advances of studies on machine learning. The advantage of parallel processing largely comes from an efficiency of matrix-matrix multiplication using many CUDA-enabled graphics processing units (GPU). Therefore, for recurrent neural networks (RNNs), the usage of a zero-filled matrix representing variable lengths of sentences for a […]
May, 3

Automatic Test Case Reduction for OpenCL

We report on an extension to the C-Reduce tool, for automatic reduction of C test cases, to handle OpenCL kernels. This enables an automated method for detecting bugs in OpenCL compilers, by generating large random kernels using the CLsmith generator, identifying kernels that yield result differences across OpenCL platforms and optimisation levels, and using our […]
May, 3

Polly-ACC: Transparent compilation to heterogeneous hardware

Programming today’s increasingly complex heterogeneous hardware is difficult, as it commonly requires the use of data-parallel languages, pragma annotations, specialized libraries, or DSL compilers. Adding explicit accelerator support into a larger code base is not only costly, but also introduces additional complexity that hinders long-term maintenance. We propose a new heterogeneous compiler that brings us […]
May, 3

Exposing Errors Related to Weak Memory in GPU Applications

We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never […]
Apr, 29

Array Program Transformation with Loo.py by Example: High-Order Finite Elements

To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetching, parallelization, and algorithmic […]
Apr, 29

On the design of sparse hybrid linear solvers for modern parallel architectures

In the context of this thesis, our focus is on numerical linear algebra, more precisely on solution of large sparse systems of linear equations. We focus on designing efficient parallel implementations of MaPHyS, an hybrid linear solver based on domain decomposition techniques. First we investigate the MPI+threads approach. In MaPHyS, the first level of parallelism […]
Apr, 29

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays’ software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. […]
Page 4 of 869« First...23456...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1893 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

420 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: