Oct, 4

Deep Dynamic Neural Networks for Gesture Segmentation and Recognition

The purpose of this paper is to describe a novel method called Deep Dynamic Neural Networks(DDNN) for the Track 3 of the Chalearn Looking at People 2014 challenge [1]. A generalised semi-supervised hierarchical dynamic framework is proposed for simultaneous gesture segmentation and recognition taking both skeleton and depth images as input modules. First, Deep Belief […]
Oct, 4

GPU Accelerated Radio Wave Propagation Modeling Using Ray Tracing

Radar producers, which are mostly in defense industry, need radar environment simulator to test their products during the development. Such a simulator helps them to be able to get rid of costly field tests. For developing a radar environment simulator, radio wave propagation should be modeled. However, this is a computationally expensive and time consuming […]
Oct, 4

Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions

Voronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges encode the required distance information. […]
Oct, 4

Teaching Parallel Programming Using Java

This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that […]
Oct, 4

A massively parallel algorithm for constructing the BWT of large string sets

We present a new scalable, lightweight algorithm to incrementally construct the BWT and FM-index of large string sets such as those produced by Next Generation Sequencing. The algorithm is designed for massive parallelism and can effectively exploit the combination of low capacity high bandwidth memory and slower external system memory typical of GPU accelerated systems. […]
Oct, 3

Fast Automatic Heuristic Construction Using Active Learning

Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data. However, obtaining this data can take months per platform. This is becoming an ever more critical problem and if no solution […]
Oct, 3

Microarchitectural Performance Characterization of Irregular GPU Kernels

GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of […]
Oct, 3

Stable large-scale solver for Ginzburg-Landau equations for superconductors

Understanding the interaction of vortices with inclusions in type-II superconductors is a major outstanding challenge both for fundamental science and energy applications. At application-relevant scales, the long-range interactions between a dense configuration of vortices and the dependence of their behavior on external parameters, such as temperature and an applied magnetic field, are all important to […]
Oct, 3

A fast GPU-based Monte Carlo simulation of proton transport with detailed modeling of non-elastic interactions

Purpose: Very fast Monte Carlo (MC) simulations of proton transport have been implemented recently on GPUs. However, these usually use simplified models for non-elastic (NE) proton-nucleus interactions. Our primary goal is to build a GPU-based proton transport MC with detailed modeling of elastic and NE collisions. Methods: Using CUDA, we implemented GPU kernels for these […]
Oct, 3

A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA

In view of the rapid rise of the number of cores in modern supercomputers, time-parallel methods that introduce concurrency along the temporal axis are becoming increasingly popular. For the solution of time-dependent partial differential equations, these methods can add another direction for concurrency on top of spatial parallelization. The paper presents an implementation of the […]
Oct, 2

Fast Estimation of Gaussian Mixture Model Parameters on GPU using CUDA

Gaussian Mixture Models (GMMs) are widely used among scientists e.g. in statistics toolkits and data mining procedures. In order to estimate parameters of a GMM the Maximum Likelihood (ML) training is often utilized, more precisely the Expectation-Maximization (EM) algorithm. Nowadays, a lot of tasks works with huge datasets, what makes the estimation process time consuming […]
Sep, 30

Parallel QuadTree Encoding of Large-Scale Raster Geospatial Data on Multicore CPUs and GPGPUs

Global remote sensing and large-scale environment modeling have generated vast amounts of raster geospatial images. To gain a better understanding of this data, researchers are interested in performing spatial queries over them, and the computation of those queries’ results is greatly facilitated by the existence of spatial indices. Additionally, though there have been major advances […]
Page 5 of 761« First...34567...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

167 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1273 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: