17343

Posts

Jul, 15

4th International Conference on Virtual Reality (ICVR), 2018

2018 4th International Conference on Virtual Reality (ICVR 2018) will be held during February 24-26, 2018 in Hong Kong. ICVR 2018 will bring together top researchers from Asian Pacific areas, North America, Europe and all around the world to exchange research results and address open issues in all aspects of Virtual Reality. Publication Accepted papers […]
Jul, 15

10th International Conference on Machine Learning and Computing (ICMLC), 2018

The ICMLC 2018: 2018 10th International Conference on Machine Learning and Computing aims to bring together leading academic scientists, researchers and research scholars to exchange and share their experiences and research results on all aspects of Machine Learning and Computing. It also provides a premier interdisciplinary platform for researchers, practitioners and educators to present and […]
Jul, 14

Cooperative Kernels: GPU Multitasking for Blocking Algorithms

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that […]
Jul, 14

Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs

Recently, field-programmable gate array (FPGA) vendors (such as Altera) have started to address the programmability issues of FPGAs via OpenCL SDKs. In this paper, we analyze the performance of relational database applications on FPGAs using OpenCL. In particular, we study how to improve the performance of data partitioning, which is a very important building block […]
Jul, 14

Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the […]
Jul, 14

DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns

Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations make it difficult to analyze the performance of deep learning applications. In this paper, through examing the features of GPU traces and deep learning […]
Jul, 14

A Similarity Measure for GPU Kernel Subgraph Matching

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures […]
Jul, 5

OpenCL-Based Implementation of an FPGA Accelerator for Molecular Dynamics Simulation

Molecular dynamics (MD) simulations are very important to studyphysical properties of the atoms and molecules. However, a huge amount of processing time is required to simulate a few nano-seconds of an actual experiment. Although the hardware accelerationusing FPGAs provides promising results, huge design time and hardware design skills are required to implement an accelerator successfully. […]
Jul, 5

Real-time colouring and filtering with graphics shaders

Despite the popularity of the Graphics Processing Unit (GPU) for general purpose computing, one should not forget about the practicality of the GPU for fast scientific visualisation. As astronomers have increasing access to three dimensional (3D) data from instruments and facilities like integral field units and radio interferometers, visualisation techniques such as volume rendering offer […]
Jul, 5

A Fast Method For Computing Principal Curvatures From Range Images

Estimation of surface curvature from range data is important for a range of tasks in computer vision and robotics, object segmentation, object recognition and robotic grasping estimation. This work presents a fast method of robustly computing accurate metric principal curvature values from noisy point clouds which was implemented on GPU. In contrast to existing readily […]
Jul, 5

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide […]
Jul, 5

Parle: parallelizing stochastic gradient descent

We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been […]
Page 4 of 926« First...23456...102030...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: