15012

Posts

Nov, 29

A Problem-Based Learning Approach to GPU Computing

Compared to CPUs, modern GPUs exhibit a high ratio of computing performance per watt, and so current supercomputer designs often include multiple racks of GPUs in order to achieve high teraflop counts at minimal energy cost. GPU programming is thus becoming increasingly important, and yet it remains a challenging task. This paper describes a course […]
Nov, 29

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across […]
Nov, 29

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Contemporary GPUs allow concurrent execution of small computational kernels in order to prevent idling of GPU resources. Despite the potential concurrency between independent kernels, the order in which kernels are issued to the GPU will significantly influence the application performance. A technique for deriving suitable kernel launch orders is therefore presented, with the aim of […]
Nov, 29

Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Stochastic Gradient Descent, a stochastic optimization of Gradient Descent, is an algorithm that is used in different topics, like for example for linear regression or logistic regression. After the Netflix prize, SGD start to be used also in recommender systems to compute matrix factorization. Considering the large amounts of data that this kind of system […]
Nov, 29

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

Segmentation of histopathology sections is an ubiquitous requirement in digital pathology and due to the large variability of biological tissue, machine learning techniques have shown superior performance over standard image processing methods. As part of the GlaS@MICCAI2015 colon gland segmentation challenge, we present a learning-based algorithm to segment glands in tissue of benign and malignant […]
Nov, 25

Acceleration of Agent-Based Pandemic Modeling on Multiple GPUs

Epidemiology computation models are crucial for the assessment and control of public health crises. Agent-based simulations of pandemic influenza are useful for forecasting the infectious disease spreading in order to help public health policy makers during emergencies. In such emergencies decisions are required for public health preparedness in cycles of less than a day, and […]
Nov, 25

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single-Program Multiple-Data (SPMD) programming paradigm for GPU-based parallel processing brings in the challenge of resource underutilization, with the asymmetrical processor/co-processor distribution. In other words, under SPMD, balanced CPU/GPU distribution […]
Nov, 25

Optimization of a Machine Learning Algorithm on the Heterogeneous system using OpenCL

Today, there is no one who disagrees on how important data is in every industry especially in enterprise market. More recently, the key point that decides the survival of a business is the management of their big data, which is defined by the 3V’s: Volume, Velocity, and Variety [1]. While the rate of data generation […]
Nov, 25

GPU-based Acceleration of Deep Convolutional Neural Networks on Mobile Platforms

Mobile applications running on wearable devices and smartphones can greatly benefit from accurate and scalable deep CNN-based machine learning algorithms. While mobile CPU performance does not match the intensive computational requirement of deep CNNs, the embedded GPU which already exists in many mobile platforms can be leveraged for acceleration of CNN computations on the local […]
Nov, 25

Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Pulsar acceleration searches are methods for recovering signals from radio telescopes, that may otherwise be lost due to the effect of orbital acceleration in binary systems. The vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the […]
Nov, 24

Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Scene understanding, such as image classification and semantic image segmentation, has been a challenging problem in computer vision. The difficulties mainly come from the feature representation, i.e., how to find a good representation for images. Instead of improving over hand-crafted features such as SIFT or HoG, we focus on learning image representations by generative and […]
Nov, 24

A parallel algorithm for the constrained shortest path problem on lattice graphs

We present a parallel algorithm for finding the shortest path whose total weight is smaller than a pre-determined value. The passage times over the edges are assumed to be positive integers. In each step the processing elements are not analyzing the entire graph. Instead they are focusing on a subset of vertices called active vertices. […]

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: