## Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Stochastic Gradient Descent, a stochastic optimization of Gradient Descent, is an algorithm that is used in different topics, like for example for linear regression or logistic regression. After the Netflix prize, SGD start to be used also in recommender systems to compute matrix factorization. Considering the large amounts of data that this kind of system […]

November 29, 2015 by hgpu

Segmentation of histopathology sections is an ubiquitous requirement in digital pathology and due to the large variability of biological tissue, machine learning techniques have shown superior performance over standard image processing methods. As part of the GlaS@MICCAI2015 colon gland segmentation challenge, we present a learning-based algorithm to segment glands in tissue of benign and malignant […]

November 29, 2015 by hgpu

Today, there is no one who disagrees on how important data is in every industry especially in enterprise market. More recently, the key point that decides the survival of a business is the management of their big data, which is defined by the 3V’s: Volume, Velocity, and Variety [1]. While the rate of data generation […]

November 25, 2015 by hgpu

We present a parallel algorithm for finding the shortest path whose total weight is smaller than a pre-determined value. The passage times over the edges are assumed to be positive integers. In each step the processing elements are not analyzing the entire graph. Instead they are focusing on a subset of vertices called active vertices. […]

November 24, 2015 by hgpu

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous […]

November 20, 2015 by hgpu

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store […]

November 20, 2015 by hgpu

We present a novel, GPU-accelerated per-pixel inverse rendering (IR) optimization algorithm based on Particle Swarm Optimization (PSO), IRPSO. IRPSO estimates the per-pixel scene attributes including reflectance properties of a 3D model, and is fast enough to do in situ visualization of the optimization in real-time. We utilize the GPU framebuffer as a computational domain, where […]

November 20, 2015 by hgpu

This paper focuses on the design and implementing of GPU-accelerated Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm. The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the spatial points distribution pattern and achieve more accurate predictions than those by IDW. In this paper, we first […]

November 13, 2015 by hgpu

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our […]

November 12, 2015 by hgpu

In our study we implemented and compared seven sequential and parallel sorting algorithms: bitonic sort, multistep bitonic sort, adaptive bitonic sort, merge sort, quicksort, radix sort and sample sort. Sequential algorithms were implemented on a central processing unit using C++, whereas parallel algorithms were implemented on a graphics processing unit using CUDA platform. We chose […]

November 12, 2015 by hgpu

Accelerators like Intel Xeon Phi aim to fulfill the computational requirements of modern applications. A particular interest to us are those applications that are based on Stencil Computations. Stencils are finite-difference algorithms used in many scientific and engineering applications for solving large-scale and high-dimension partial differential equations. Programmability on massively parallel architectures of such kernels […]

November 11, 2015 by hgpu

In such an exciting age of information explosion, huge amount of visual data are produced continuously 24 hours, 7 days in both daily life and scientific research. Processing and storage of such a huge amount of data forms big challenges. Use of supercomputers tackles the need-for-speed challenge partially, but is blocked by its high cost […]

November 10, 2015 by hgpu