17409

Posts

Jul, 25

On Simplifying and Optimizing Programs for Heterogeneous Computing Systems

Today, with the growth of highly parallel and heterogeneous architectures, systems composed of a combination of multicore CPUs, GPUs, and accelerators are becoming more common in HPC. Although heterogeneous architectures bring considerable benefits from a performance and energy perspective, they also make application development very challenging introducing the necessity of different parallel programming paradigms. Recently, […]
Jul, 25

FUX-Sim: Implementation of a fast universal simulation/reconstruction framework for X-ray systems

The availability of digital X-ray detectors, together with advances in reconstruction algorithms, creates an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden. The development of new flexible X-ray systems could benefit from […]
Jul, 25

ParTeCL: parallel testing using OpenCL

With the growing complexity of software, the number of test cases needed for effective validation is extremely large. Executing these large test suites is expensive and time consuming, putting an enormous pressure on the software development cycle. In previous work, we proposed using Graphics Processing Units (GPUs) to accelerate test execution by running test cases […]
Jul, 25

OpenCL Library for Parallel Graph Search Algorithms

Graphs are a popular data structure to represent large amounts of data and the relationship between them. As serial hardware hits the wall in terms of computation speed, a lot of research has been made recently in parallelizing Graph Search Algorithms such as Breadth First Search or the Single Source Shortest Path Problem hence make […]
Jul, 25

Memory-Efficient Implementation of DenseNets

The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization and contiguous convolution operations can produce feature maps that grow quadratically with network depth. In this technical report, we introduce strategies to […]
Jul, 23

International Conference on Intelligent Autonomous Systems (ICIAS), 2018

The conference will be held in Singapore during March 1-3, 2018. The theme of ICIAS2018 is “Frontier of intelligent autonomous systems”, reflecting the ever growing interests in research, development and applications in the dynamic and exciting areas of robotics. It also provides a premier interdisciplinary platform for researchers, practitioners and educators to present and discuss […]
Jul, 23

International Conference on Robotics and Intelligent System (ICRIS), 2018

Publication All submissions will be peer reviewed 2-3 reviewers, and the accepted papers after registration will be published in the International Conference Proceedings Series by ACM, which will beindexed by Ei Compendex and Scopus. Submission ICRIS 2018 is now accepting manuscript submissions. Please submit your full paper to us: icris@academic.net
Jul, 22

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. […]
Jul, 22

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU accelerated C++ open-source implementations and created an open-source C++ benchmark project. We modifed all the implementations to run on actual […]
Jul, 22

GPU accelerated computation of Polarized Subsurface BRDF for Flat Particulate Layers

BRDF of most real world materials has two components, the surface BRDF due to the light reflecting at the surface of the material and the subsurface BRDF due to the light entering and going through many scattering events inside the material. Each of these events modifies light’s path, power, polarization state. Computing polarized subsurface BRDF […]
Jul, 22

Parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

This paper presents a parallel, GPU-based, deforming mesh-enabled unsteady numerical solver for solving moving body problems by using OpenACC. Both the 2D and 3D parallel algorithms based on spring-like deforming mesh methods are proposed and then implemented through OpenACC programming model. Furthermore, these algorithms are coupled with an unstructured mesh based, implicit time scheme integrated […]
Jul, 22

Automatically Selecting Profitable Thread Block Sizes Using Machine Learning

Graphics processing units (GPUs) provide high performance at low power consumption as long as resources are well utilized. Thread block size is one factor in determining a kernel’s occupancy, which is a metric for measuring GPU utilization. A general guideline is to find the block size that leads to the highest occupancy. However, many combinations […]
Page 7 of 931« First...56789...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: