16281

Posts

Jul, 20

Algorithmic Trading: A brief, computational finance case study on data centre FPGAs

Increasingly FPGAs will be deployed at scale due to the need for increased need for power efficient computation and improved high level synthesis tool flows, creating a new category of device: data centre FPGAs. A method for using these FPGAs is to identify what proportion of a given workload would benefit from being implemented upon […]
Jul, 20

Lowering IrGL to CUDA

The IrGL intermediate representation is an explicitly parallel representation for irregular programs that targets GPUs. In this report, we describe IrGL constructs, examples of their use and how IrGL is compiled to CUDA by the Galois GPU compiler.
Jul, 20

THOR: A New and Flexible Global Circulation Model to Explore Planetary Atmospheres

We have designed and developed, from scratch, a global circulation model named THOR that solves the three-dimensional non-hydrostatic Euler equations. Our general approach lifts the commonly used assumptions of a shallow atmosphere and hydrostatic equilibrium. We solve the "pole problem" (where converging meridians on a sphere lead to increasingly smaller time steps near the poles) […]
Jul, 20

Scientific Computing Using Consumer Video-Gaming Hardware Devices

Commodity video-gaming hardware (consoles, graphics cards, tablets, etc.) performance has been advancing at a rapid pace owing to strong consumer demand and stiff market competition. Gaming hardware devices are currently amongst the most powerful and cost-effective computational technologies available in quantity. In this article, we evaluate a sample of current generation video-gaming hardware devices for […]
Jul, 20

Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off

We present a novel dynamic configuration technique for deep neural networks that permits step-wise energy-accuracy trade-offs during runtime. Our configuration technique adjusts the number of channels in the network dynamically depending on response time, power, and accuracy targets. To enable this dynamic configuration technique, we co-design a new training algorithm, where the network is incrementally […]
Jul, 18

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

This paper illustrates how GPU computing can be used to accelerate computational fluid dynamics (CFD) simulations. For sparse linear systems arising from finite volume discretization, we evaluate and optimize the performance of Conjugate Gradient (CG) routines designed for manycore accelerators and compare against an industrial CPU-based implementation. We also investigate how the recent advances in […]
Jul, 18

IODA: an Input/Output Deep Architecture for image labeling

In this article, we propose a deep neural network (DNN) architecture called Input Output Deep Architecture (IODA) for solving the problem of image labeling. IODA directly links a whole image to a whole label map, assigning a label to each pixel using a single neural network forward step. Instead of designing a handcrafted a priori […]
Jul, 18

HPC on the Intel Xeon Phi: Homomorphic Word Searching

In this paper, the suitability of implementing parallel homomorphic word searching on Intel Xeon Phi coprocessors is evaluated for the first time. Homomorphic encryption allows to produce a cryptogram that encrypts the result of applying some values to any function, even when the input values are encrypted and without access to the privatekey. For example, […]
Jul, 18

A Kinetic Vlasov Model for Plasma Simulation Using Discontinuous Galerkin Method on Many-Core Architectures

Advances are reported in the three pillars of computational science achieving a new capability for understanding dynamic plasma phenomena outside of local thermodynamic equilibrium. A continuum kinetic model for plasma based on the Vlasov-Maxwell system for multiple particle species is developed. Consideration is added for boundary conditions in a truncated velocity domain and supporting wall […]
Jul, 18

Heterogeneous Computing for Data Stream Mining

Graphical Processing Units are de-facto standard for acceleration of data parallel tasks in high performance computing. They are widely used to accelerate batch machine learning algorithms. High-end discrete GPUs are characterized by a very high number of cores (thousands), high bandwidth memory optimized for the stream access and high power requirements. Integrated GPUs are characterized […]
Jul, 16

9th International Conference on Computer and Electrical Engineering (ICCEE), 2016

Paper publication All paper submissions will be peer reviewed and evaluated based on originality, research content, relevance to conference, contributions, and readability. All accepted papers will be published in one of the indexed Journals after proper registration and presentation. – Journal of Computers(JCP, ISSN: 1796-203X) Indexed by: ULRICH’s Periodicals Directory; Google Scholar; INSPEC; etc. – […]
Jul, 16

9th International Conference on Computer Science and Information Technology (ICCSIT), 2016

Paper Publication: All papers are reviewed using a single-blind review process, the accepted paper will be published in journal. Journal of Communications (JCM, ISSN: 1796-2021); Journal of Software(JSW, ISSN: 1796-217X); Journal of Computers(JCP, ISSN: 1796-203X); International Journal of Future Computer and Communication (IJFCC, ISSN: 2010-3751); International Journal of Computer Theory and Engineering (IJCTE, ISSN: 1793-8201); […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: