16709

Posts

Nov, 20

7th International Conference on Biomedical Engineering and Technology (ICBET), 2017

The objective of the 2017 7th International Conference on Biomedical Engineering and Technology (ICBET 2017) is to provide a platform for researchers, engineers, academicians as well as industrial professionals from all over the world to present their research results and development activities in Biomedical Engineering and Technology. 2017 7th International Conference on Biomedical Engineering and […]
Nov, 20

International Conference on High Performance Compilation, Computing and Communications (HP3C-2017), 2017

You are cordially invited to join us at the International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) in Kuala Lumpur, Malaysia during March 22-24, 2017, with the sponsor of American Society for Research. With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful parallel and […]
Nov, 19

Evaluation of an OpenCL-Based FPGA Platform for Particle Filter

Particle filter is one promising method to estimate the internal states in dynamical systems, and can be used for various applications such as visual tracking and mobile-robot localization. The major drawback of particle filter is its large computational amount, which causes long computational-time and large powerconsumption. In order to solve this problem, this paper proposes […]
Nov, 19

HIPAcc: A Domain-Specific Language and Compiler for Image Processing

Domain-Specific Languages (DSLs) provide high-level and domain-specific abstractions that allow expressive and concise algorithm descriptions. Since the description in a DSL hides also the properties of the target hardware, DSLs are a promising path to target different parallel and heterogeneous hardware from the same algorithm description. In theory, the DSL description can capture all characteristics […]
Nov, 19

How to scale distributed deep learning?

Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS). To minimize training time, the training of a deep neural network must be scaled beyond a single machine to as […]
Nov, 19

Performance Analysis of Parallel Sorting Algorithms using GPU Computing

Sorting is a well interrogating issue in computer science. Many authors have invented numerous sorting algorithms on CPU (Central Processing Unit). In today’s life sorting on the CPU is not so efficient. To get the efficient sorting parallelization should be done. There are many ways of parallelization of sorting but at the present time GPU […]
Nov, 19

Lattice QCD simulations using the OpenACC platform

In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with […]
Nov, 16

Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of […]
Nov, 16

Efficient Communications in Training Large Scale Neural Networks

We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overall execution […]
Nov, 16

Data Acquisition with GPUs: The DAQ for the Muon g-2 Experiment at Fermilab

Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be […]
Nov, 16

Automatic code generation methods applied to numerical linear algebra in high performance computing

Parallelism in today’s computer architectures is ubiquitous whether it be in supercomputers, workstations or on portable devices such as smartphones. Exploiting efficiently these systems for a specific application requires a multidisciplinary effort that concerns Domain Specific Languages (DSL), code generation and optimization techniques and application-specific numerical algorithms. In this PhD thesis, we present a method […]
Nov, 16

Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors

The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates […]
Page 20 of 916« First...10...1819202122...304050...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: