13927

Posts

May, 5

Coherent Photon Mapping on the Intel MIC Architecture

Photon mapping is a global illumination algorithm which is composed of two steps: photon tracing and photon searching. During photon searching step, each shading point needs to search the photon-tree to find k-neighbouring photons for reflected radiance estimation. As the number of shading points and the size of photon-tree are dramatically large, the photon searching […]
May, 5

GPU Accelerated Real-Time Collision Handling in Virtual Disassembly

Previous collision detection methods for virtual disassembly mainly detect collisions at discrete time interval, and use oriented bounding boxes to speedup the process. However, these discrete methods cannot guarantee no penetration occurs when the components moving. Meanwhile, because some of the components are embedded into each other, these components cannot be separated in the subsequent […]
May, 5

Workload Aware Algorithms for Heterogeneous Platforms

Algorithms that aim to simultaneously run on a heterogeneous collection of devices on a commodity platform have been in recent research focus. On such platforms, individual devices can have very differing architectures, clock rates, and execution models. Hence, one of the fundamental challenges in designing and implementing such algorithms is to identify load balancing mechanisms […]
May, 5

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

This paper proposes a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones. Our method is inspired by the loop perforation technique from source code optimization and accelerates the evaluation of bottleneck convolutional layers by exploiting the […]
May, 3

IPMACC: Translating OpenACC API to OpenCL

In this paper, we introduce IPMACC a framework for executing OpenACC for C applications over OpenCL runtime. We use over framework to compare performance of OpenACC and OpenCL. OpenACC API abstractions remove the low-level control from programmers’ hand. To understand the low-level OpenCL optimizations that are not applicable in OpenACC, we compare highly-optimized OpenCL and […]
May, 3

Efficient Implementation of Bi-directional Path Tracer on GPU

Most of the implementations solving photo-realistic image rendering use standard unidirectional path tracing, having fast and accurate results for scenes without caustics or hard cases. These hard cases are usually solved by a bidirectional path tracing algorithm. However, due to the complexity of the bidirectional path tracing algorithms, its implementations almost exclusively target sequential CPUs. […]
May, 3

Fine-Grained Synchronizations and Dataflow Programming on GPUs

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and […]
May, 3

Massively Parallel kNN using CUDA on Spam-Classification

Email Spam-classification is a fundamental, unseen element of everyday life. As email communication becomes more prolific, and email systems become more robust, it becomes increasingly necessary for Spam-classification systems to run accurately and efficiently while remaining all but invisible to the user. We propose a massively parallel implementation of Spam-classification using the k-Nearest Neighbors (kNN) […]
May, 3

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

We present a fast and user friendly exoplanet transit light curve modelling package PyTransit, implementing optimised versions of the Gimen’ez and the Mandel & Agol transit models. The package offers an object-oriented Python interface to access the two models implemented natively in Fortran with OpenMP parallelisation. A partial OpenCL version of the quadratic Mandel-Agol model […]
Apr, 27

Parallel Genetic Algorithms on a GPU to Solve the Travelling Salesman Problem

The implementation of parallel genetic algorithms on a graphic processor GPU to solve the Travelling Salesman Problem instances is presented. Two versions of parallel genetic algorithms are implemented, a Parallel Genetic Algorithm with Islands Model and a Parallel Genetic Algorithm with Elite Island; the two versions were executed on a GPU. In both cases, each […]
Apr, 27

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of […]
Apr, 27

GPU Accelerated framework for financial nested simulations

In this thesis we present a state-of-the-art approach to accelerate Monte Carlo valuations of embedded options. Due to regulations and improved risk management, nested simulations (scenarios in scenarios) are becoming increasingly important for institutional investors like: insurance companies, pension funds and housing corporations. Preferably one wishes to use a framework in which multiple related problems […]
Page 20 of 818« First...10...1819202122...304050...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1513 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

260 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: