13587

Posts

Mar, 2

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in […]
Mar, 2

Runtime Compilation of Array-Oriented Python Programs

The Python programming language has become a popular platform for data analysis and scientific computing. To mitigate the poor performance of Python’s standard interpreter, numerically intensive computations are typically offloaded to library functions written in high-performance compiled languages such as Fortran or C. When there is no efficient library implementation available for a particular algorithm, […]
Mar, 2

Evaluating Performance Portability of OpenACC

Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle the problem. While the abstraction endowed by OpenACC offers productivity, it raises questions on its portability. This paper evaluates the performance portability obtained […]
Mar, 2

MILJS: Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

MILJS is a collection of state-of-the-art, platform-independent, scalable, fast JavaScript libraries for matrix calculation and machine learning. Our core library offering a matrix calculation is called Sushi, which exhibits far better performance than any other leading machine learning libraries written in JavaScript. Especially, our matrix multiplication is 177 times faster than the fastest JavaScript benchmark. […]
Feb, 27

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Recent breakthroughs in the development of multi-layer convolutional neural networks have led to stateof-the-art improvements in the accuracy of non-trivial recognition tasks such as large-category image classification and automatic speech recognition [1]. These many-layered neural networks are large, complex, and require substantial computing resources to train and evaluate [2]. Unfortunately, these demands come at an […]
Feb, 27

Face Detection on CUDA

Face Detection finds an application in various fields in today’s world. However CPU single thread implementation of face detection consumes lot of time, and despite various optimization techniques, it performs poorly at real time. With the advent of General Purpose GPU (GPGPU) and growing support for parallel programming language like CUDA, it has become possible […]
Feb, 27

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to […]
Feb, 27

Simulation of the hydrogen ground state in Stochastic Electrodynamics

Stochastic electrodynamics is a classical theory which assumes that the physical vacuum consists of classical stochastic fields with average energy $frac{1}{2}hbar omega$ in each mode, i.e., the zero-point Planck spectrum. While this classical theory explains many quantum phenomena related to harmonic oscillator problems, hard results on nonlinear systems are still lacking. In this work the […]
Feb, 27

GPU accelerated image reconstruction in a two-strip J-PET tomograph

We present a fast GPU implementation of the image reconstruction routine, for a novel two strip PET detector that relies solely on the time of flight measurements.
Feb, 27

Implementation of Smith-Waterman algorithm in OpenCL for GPUs

In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences […]
Feb, 24

Very Fast Non-Dominated Sorting

A new and very efficient parallel algorithm for the Fast Non-dominated Sorting of Pareto fronts is proposed. By decreasing its computational complexity, the application of the proposed method allows us to increase the speedup of the best up to now Fast and Elitist Multi-Objective Genetic Algorithm (NSGA-II) more than two orders of magnitude. Formal proofs […]
Feb, 24

Raster2Mesh: Rasterization based CVT meshing

In this paper, we propose to extend high quality Centroidal Voronoi Tessellation (CVT) remeshing techniques to the case of surfaces which are not defined by triangle meshes, such as implicit surfaces. Our key observation is that rasterization routines are usually available to visualize these alternative representations, most often as OpenGL shaders efficiently producing surface samples […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org