14503

Posts

Sep, 3

Fast GPU-based calculations in few-body quantum scattering

A principally novel approach towards solving the few-particle (many-dimensional) quantum scattering problems is described. The approach is based on a complete discretization of few-particle continuum and usage of massively parallel computations of integral kernels for scattering equations by means of GPU. The discretization for continuous spectrum of a few-particle Hamiltonian is realized with a projection […]
Aug, 31

A parallel algorithm for implicit depletant simulations

We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded […]
Aug, 31

An Asynchronous Event Communication Technique for Soft Real-Time GPGPU Applications

CONTEXT. Interactive GPGPU applications requires low response time feedback from events such as user input in order to provide a positive user experience. Communication of these events must be performed asynchronously as to not cause significant performance penalties. OBJECTIVES. In this study the usage of CPU/GPU shared virtual memory to perform asynchronous communication is explored. […]
Aug, 31

A GPU-accelerated local search algorithm for the Correlation Clustering problem

The solution of the Correlation Clustering (CC) problem can be used as a criterion to measure the amount of balance in signed social networks, where positive (friendly) and negative (antagonistic) interactions take place. Metaheuristics have been used successfully for solving not only this problem, as well as other hard combinatorial optimization problems, since they can […]
Aug, 31

Dynamic Memory Allocation for OpenCL

Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. […]
Aug, 31

Partitioning Large Scale Deep Belief Networks Using Dropout

Deep learning methods have shown great promise in many practical applications, ranging from speech recognition, visual object recognition, to text processing. However, most of the current deep learning methods suffer from scalability problems for large-scale applications, forcing researchers or users to focus on small-scale problems with fewer parameters. In this paper, we consider a well-known […]
Aug, 28

Boosting Java Performance using GPGPUs

Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows […]
Aug, 28

VisPy: Harnessing The GPU For Fast, High-Level Visualization

The growing availability of large, multidimensional data sets has created demand for high-performance, interactive visualization tools. VisPy leverages the GPU to provide fast, interactive, and beautiful visualizations in a high-level API. Here we introduce the main features, architecture, and techniques used in VisPy.
Aug, 28

High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting

In this thesis we design, implement and study a high-speed object detection framework. Our baseline detector uses integral channel features as object representation and AdaBoost as supervised learning algorithm. We suggest the implementation of two approximation techniques for speeding up the baseline detector and show their effectiveness by performing experiments on both detection quality and […]
Aug, 28

Deep Convolutional Neural Networks for Smile Recognition

This thesis describes the design and implementation of a smile detector based on deep convolutional neural networks. It starts with a summary of neural networks, the difficulties of training them and new training methods, such as Restricted Boltzmann Machines or autoencoders. It then provides a literature review of convolutional neural networks and recurrent neural networks. […]
Aug, 28

A Parallel Algorithm to Test Chordality of Graphs

We present a simple parallel algorithm to test chordality of graphs which is based on the parallel Lexicographical Breadth-First Search algorithm. In total, the algorithm takes time O(N) on N-threads machine and it performs work O(N^2), where N is the number of vertices in a graph. Our implementation of the algorithm uses a GPU environment […]
Aug, 27

CudaChain: A Practical GPU-accelerated 2D Convex Hull Algorithm

This paper presents a practical GPU-accelerated convex hull algorithm and a novel Sorting-based Preprocessing Approach (SPA) for planar point sets. The proposed algorithm consists of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of calculating the expected convex hull on the CPU. We first discard the interior points […]

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: