14498

Posts

Aug, 31

A GPU-accelerated local search algorithm for the Correlation Clustering problem

The solution of the Correlation Clustering (CC) problem can be used as a criterion to measure the amount of balance in signed social networks, where positive (friendly) and negative (antagonistic) interactions take place. Metaheuristics have been used successfully for solving not only this problem, as well as other hard combinatorial optimization problems, since they can […]
Aug, 31

Dynamic Memory Allocation for OpenCL

Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. […]
Aug, 31

Partitioning Large Scale Deep Belief Networks Using Dropout

Deep learning methods have shown great promise in many practical applications, ranging from speech recognition, visual object recognition, to text processing. However, most of the current deep learning methods suffer from scalability problems for large-scale applications, forcing researchers or users to focus on small-scale problems with fewer parameters. In this paper, we consider a well-known […]
Aug, 28

Boosting Java Performance using GPGPUs

Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows […]
Aug, 28

VisPy: Harnessing The GPU For Fast, High-Level Visualization

The growing availability of large, multidimensional data sets has created demand for high-performance, interactive visualization tools. VisPy leverages the GPU to provide fast, interactive, and beautiful visualizations in a high-level API. Here we introduce the main features, architecture, and techniques used in VisPy.
Aug, 28

High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting

In this thesis we design, implement and study a high-speed object detection framework. Our baseline detector uses integral channel features as object representation and AdaBoost as supervised learning algorithm. We suggest the implementation of two approximation techniques for speeding up the baseline detector and show their effectiveness by performing experiments on both detection quality and […]
Aug, 28

Deep Convolutional Neural Networks for Smile Recognition

This thesis describes the design and implementation of a smile detector based on deep convolutional neural networks. It starts with a summary of neural networks, the difficulties of training them and new training methods, such as Restricted Boltzmann Machines or autoencoders. It then provides a literature review of convolutional neural networks and recurrent neural networks. […]
Aug, 28

A Parallel Algorithm to Test Chordality of Graphs

We present a simple parallel algorithm to test chordality of graphs which is based on the parallel Lexicographical Breadth-First Search algorithm. In total, the algorithm takes time O(N) on N-threads machine and it performs work O(N^2), where N is the number of vertices in a graph. Our implementation of the algorithm uses a GPU environment […]
Aug, 27

CudaChain: A Practical GPU-accelerated 2D Convex Hull Algorithm

This paper presents a practical GPU-accelerated convex hull algorithm and a novel Sorting-based Preprocessing Approach (SPA) for planar point sets. The proposed algorithm consists of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of calculating the expected convex hull on the CPU. We first discard the interior points […]
Aug, 27

gScan: Accelerating Graham Scan on the GPU

This paper presents a fast implementation of the Graham scan on the GPU. The proposed algorithm is composed of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of finding the convex hull on the CPU. We first discard the interior points that locate inside a quadrilateral formed by […]
Aug, 27

Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

The study of disordered spin systems through Monte Carlo simulations has proven to be a hard task due to the adverse energy landscape present at the low temperature regime, making it difficult for the simulation to escape from a local minimum. Replica based algorithms such as the Exchange Monte Carlo (also known as parallel tempering) […]
Aug, 27

Accelerated Deep Learning using Intel Xeon Phi

Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org