13614

Posts

Mar, 3

An efficient solution for hazardous geophysical flows simulation using GPUs

The movement of poorly sorted material over steep areas constitutes a hazardous environmental problem. Computational tools help in the understanding and predictions of such landslides. The main drawback is the high computational effort required for obtaining accurate numerical solutions due to the high number of cells involved in the calculus. In order to overcome this […]
Mar, 3

Energy-and cost-efficient Lattice-QCD computations using graphics processing units

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, […]
Mar, 3

Adaptive Video Encoding Based on OpenCL Face Recognition

Video chatting is now a popular way of communication. However, poor network ruins the experience as the faces are blurred. To solve this problem, the team offers a solution to preserve the clarity of faces under limited transmission rate. In this project, the primary goal is to design a video encoder that reduces the size […]
Mar, 3

Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures

This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]
Mar, 3

Counting Triangles in Large Graphs on GPU

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. […]
Mar, 3

GPU Based Path Integral Control with Learned Dynamics

We present an algorithm which combines recent advances in model based path integral control with machine learning approaches to learning forward dynamics models. We take advantage of the parallel computing power of a GPU to quickly take a massive number of samples from a learned probabilistic dynamics model, which we use to approximate the path […]
Mar, 2

Iris Matching Algorithm on Many-Core Platforms

Biometrics matching has been widely adopted as a secure way for identification and verification purpose. However, the computation demand associated with running this algorithm on a big data set poses great challenge on the underlying hardware platform. Even though modern processors are equipped with more cores and memory capacity, the software algorithm still requires careful […]
Mar, 2

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in […]
Mar, 2

Runtime Compilation of Array-Oriented Python Programs

The Python programming language has become a popular platform for data analysis and scientific computing. To mitigate the poor performance of Python’s standard interpreter, numerically intensive computations are typically offloaded to library functions written in high-performance compiled languages such as Fortran or C. When there is no efficient library implementation available for a particular algorithm, […]
Mar, 2

Evaluating Performance Portability of OpenACC

Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle the problem. While the abstraction endowed by OpenACC offers productivity, it raises questions on its portability. This paper evaluates the performance portability obtained […]
Mar, 2

MILJS: Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

MILJS is a collection of state-of-the-art, platform-independent, scalable, fast JavaScript libraries for matrix calculation and machine learning. Our core library offering a matrix calculation is called Sushi, which exhibits far better performance than any other leading machine learning libraries written in JavaScript. Especially, our matrix multiplication is 177 times faster than the fastest JavaScript benchmark. […]
Feb, 27

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Recent breakthroughs in the development of multi-layer convolutional neural networks have led to stateof-the-art improvements in the accuracy of non-trivial recognition tasks such as large-category image classification and automatic speech recognition [1]. These many-layered neural networks are large, complex, and require substantial computing resources to train and evaluate [2]. Unfortunately, these demands come at an […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: