18106

Posts

Mar, 31

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both […]
Mar, 25

Scalable Breadth-First Search on a GPU Cluster

On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction […]
Mar, 25

Optimization of Hierarchical Matrix Computation on GPU

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H-matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H-matrices is more […]
Mar, 25

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals II

Evaluation of a wide variety of Feynman diagrams with multi-loop integrals and physical parameters and its comparison with high energy experiments are expected to investigate new physics beyond the Standard Model. We have been developing a direct computation method of multi-loop integrals of Feynman diagrams. One of features of our method is that we adopt […]
Mar, 25

MALBEC: a new CUDA-C ray-tracer in General Relativity

A new CUDA-C code for tracing orbits around non-charged black holes is presented. This code is named MALBEC, and take advantage of the graphic processing units and the CUDA platform in order to track the geodesic motion of null and timelike test particles in Schwarzschild and Kerr. Additionally, a new general set of equations that […]
Mar, 25

Accelerating CNN inference on FPGAs: A Survey

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as […]
Mar, 22

The VOLNA-OP2 Tsunami Code (Version 1.0)

In this paper, we present the VOLNA-OP2 tsunami model and implementation; a finite volume non-linear shallow water equations (NSWE) solver built on the OP2 domain specific language for unstructured mesh computations. VOLNA-OP2 is unique among tsunami solvers in its support for several high performance computing platforms: CPUs, the Intel Xeon Phi, and GPUs. This is […]
Mar, 22

FPGA in HPC: High Level Synthesys of OpenCL kernels for Molecular Dynamics

The overall goal of this thesis is to evaluate the feasibility of FPGA based computer system in HPC. This works is performed within ExaNeSt, an EU funded project which aims to develop and prototype energy efficient solutions for the production of exascale-level supercomputers. As the matter of fact, the current computer architectures need to be […]
Mar, 22

A multi-agent architecture for scheduling of high performance services in a GPU cluster

Nowadays, clusters containing multiple GPU nodes are widely used to execute high-performance computing applications. Diverse disciplines use these clusters to improve the performance of several services that consume high computational resources. The challenge of executing high-performance computing applications becomes harder when the applications are executed concurrently and each one of them may demand multiple GPU […]
Mar, 22

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Graphics Processing Units (GPUs) perform the majority of computations in state-of-the-art supercomputers. Programming these GPUs is often assisted using a programming model such as (amongst others) the directive-driven OpenACC. Unfortunately, OpenACC (and other similar models) are incapable of automatically targeting and distributing work across several GPUs, which decreases productivity and forces needless manual labor upon […]
Mar, 22

TBD: Benchmarking and Analyzing Deep Neural Network Training

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference — i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary […]
Mar, 18

International Conference on Biomedicine & Pharmacotherapy, 2018

International Conference on Biomedicine & Pharmacotherapy is going to be held during August 06-07, 2018 in Osaka, Japan. The conferences focuses on foremost topics such as Biomedicine, Biomedical Statistics, Biomedical Diagnosis, Frontiers in Biomedicine, Industrial Pharmacy, Pharmacotherapy, Molecular Biomedicine, Computational Biomedicine, Tissue Engineering, Medical Devices, Biomedical Model, Personalized Medicine, Biomedical Technology, Nanotechnology, Pharmacotherapy, Pharmaceutical Sciences, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: