high performance computing on graphics processing units: hgpu.org

Posts

Jul, 1

4th International Conference on Information Computer Application, ICICA 2015

Submission Deadline: 2014-10-05 Publication: The ICICA 2015 conference proceeding will be published in the International Journal of Computer and Communication Engineering (ISSN:2010-3743 www.ijcce.org ), which will be indexed by Google Scholar, Engineering & Technology Digital Library,ProQuest, and Crossref Call for Paper: Algorithms Automated Software Engineering Bioinformatics and Scientific Computing Compilers and Interpreters Computer Animation Artificial […]

Jul, 1

3rd International Conference on System Modeling and Optimization, ICSMO 2015

Submission Deadline: 2014-09-20 Publication: The ICSMO 2015 conference proceeding will be published in the International Journal of Modeling and Optimization (ISSN: 2010-3697 www.ijmo.org ), and will be included in the Engineering & Technology Digital Library, and indexed by ProQuest, Google Scholar and Crossref. Call for Paper: Agent Based Simulation Analytical and Stochastic Modelling Techniques and […]

Jul, 1

6th International Conference on Computer Modeling and Simulation, ICCMS 2015

Submission Deadline: 2014-09-30 Publication: As usual, all accepted papers for the ICCMS 2015 will be published in the International Journal of Computer Theory and Engineering (ISSN:1793-8201 www.ijcte.org ), will be indexed by Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, INSPEC, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET). Call […]

Jul, 1

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

N-body simulations represent an important class of numerical simulations in order to study a wide range of physical phenomena for which researchers demand fast and accurate implementations. Due to the computational complexity, simple brute-force methods to solve the long-distance interaction between bodies can only be used for small-scale simulations. Smarter approaches utilize neighbor lists, tree […]

OpenCL

Jul, 1

Fast Galactic Structure Finding using Graphics Processing Units

Cosmological simulations are used by astronomers to investigate large scale structure formation and galaxy evolution. Structure finding, that is, the discovery of gravitationally-bound objects such as dark matter halos, is a crucial step in many such simulations. During recent years, advancing computational capacity has lead to halo-finders needing to manage increasingly larger simulations. As a […]

CUDA

Jul, 1

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Part of the promise of DNNs is their ability to represent increasingly complex functions as the number of DNN parameters increases. This paper investigates the performance of DNN-based hybrid speech recognition systems as DNN model size and training data […]

Jul, 1

The design and verification of Mumax3

We report on the design, verification and performance of mumax3, an open-source GPU-accelerated micromagnetic simulation program. This software solves the time- and space dependent magnetization evolution in nano- to micro scale magnets using a finite-difference discretization. Its high performance and low memory requirements allow for large-scale simulations to be performed in limited time and on […]

CUDA

Jul, 1

Speedup of Micromagnetic Simulations with C++ AMP On Graphics Processing Units

A finite-difference Micromagnetic solver is presented utilizing the C++ Accelerated Massive Parallelism (C++ AMP). The high speed performance of a single Graphics Processing Unit (GPU) is demonstrated compared to a typical CPU-based solver. The speed-up of GPU to CPU is shown to be greater than 100 for problems with larger sizes. This solver is based […]

OpenCL

Jun, 28

Performance and Efficiency Analysis of Modern Accelerators: Fine-Grained Parallelism on the Intel Xeon Phi

Supercomputers define the pinnacle of computational power and are an essential tool for solving vast scientific computational problems. They employ increasingly parallel architectures to ever increase their nominal peak performance and to allow them to solve larger problems. Employing the vast amount of computation power is however difficult and optimising for many-core architectures has become […]

Jun, 28

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel […]

Jun, 28

Modified Bloom filter for high performance hybrid NoSQL systems

This article addresses problems of implementation of a modified Bloom filter as an additional module for mass data storage systems in supercomputers with hybrid CPU/GPU architecture. It is proposed to use a modified filter with counters, which makes it possible to monitor not only data addition, but also data removal. A comparative analysis has been […]

CUDA

Jun, 28

Implementation of the genetic algorithm by means of CUDA technology involved in travelling salesman problem

The research was intended to solve the travelling salesman problem by means of genetic algorithms. The implementation of the algorithm was by virtue of CUDA technology. The research was focused on checking how much the system can improve if instead of classical CPU processors one uses GPU graphical processors enabled to perform the operations parallel. […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

4th International Conference on Information Computer Application, ICICA 2015

3rd International Conference on System Modeling and Optimization, ICSMO 2015

6th International Conference on Computer Modeling and Simulation, ICCMS 2015

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

Fast Galactic Structure Finding using Graphics Processing Units

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

The design and verification of Mumax3

Speedup of Micromagnetic Simulations with C++ AMP On Graphics Processing Units

Performance and Efficiency Analysis of Modern Accelerators: Fine-Grained Parallelism on the Intel Xeon Phi

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Modified Bloom filter for high performance hybrid NoSQL systems

Implementation of the genetic algorithm by means of CUDA technology involved in travelling salesman problem

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)