Feb, 2

Performance Analysis and Optimization of Hermite Methods on NVIDIA GPUs Using CUDA

In this thesis we present the first, to our knowledge, implementation and performance analysis of Hermite methods on GPU accelerated systems. We give analytic background for Hermite methods; give implementations of the Hermite methods on traditional CPU systems as well as on GPUs; give the reader background on basic CUDA programming for GPUs; discuss performance […]
Feb, 2

Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors

Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Consequently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to […]
Feb, 2

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

We present a library for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. The library is based on the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. Acting as matrix library developers, using this model we do not have to explicitly deal with distribution of work and data or communication between computational nodes […]
Feb, 2

Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations

We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters […]
Feb, 1

Optimized Data Transfers Based on the OpenCL Event Management Mechanism

In standard OpenCL programming, hosts such as CPUs are supposed to control their compute devices such as GPUs. Since compute devices are dedicated to kernel computation, only hosts can execute several kinds of data transfers such as inter-node communication and file access. These data transfers require one host to simultaneously play two or more roles […]
Feb, 1

In-Memory Data Analytics on Coupled CPU-GPU Architectures

In the big data era, in-memory data analytics is an effective means of achieving high performance data processing and realizing the value of data in a timely manner. Efforts in this direction have been spent on various aspects, including in-memory algorithmic designs and system optimizations. In this paper, we propose to develop the next-generation in-memory […]
Feb, 1

Mascar: Speeding up GPU Warps by Reducing Memory Pitstops

With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. […]
Feb, 1

Productive and Efficient Computational Science Through Domain-specific Abstractions

In an ideal world, scientific applications are computationally efficient, maintainable and composable and allow scientists to work very productively. We argue that these goals are achievable for a specific application field by choosing suitable domain-specific abstractions that encapsulate domain knowledge with a high degree of expressiveness. This thesis demonstrates the design and composition of domain-specific […]
Feb, 1

Performance Analysis and Optimization of a Distributed Processing Framework for Data Mining Accelerated with Graphics Processing Units

In this age, a huge amount of data is generated every day by human interactions with services. Discovering the patterns of these data are very important to take business decisions. Due to the size of this data, it requires very high intensive computation power. Thus, many frameworks have been developed using Central Processing Units (CPU) […]
Jan, 30

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support […]
Jan, 30

OpenCL Implementation of LiDAR Data Processing

When designing a safety system, the faster the response time, the greater the reflexes of the system to hazards. As more commercial interest in autonomous and assisted vehicles grows, the number one concern is safety. If the system cannot react as fast as or faster than an average human, then the public will deem it […]
Jan, 30

Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture

The objective of this paper is to use different optimization strategies on multicore GPU architecture. Here for performance evaluation we have used parallel reduction algorithm. GPU on-chip shared memory is very fast than local and global memory. Shared memory latency is roughly 100x lower than non-cached global memory (make sure that there are no bank […]
Page 30 of 812« First...1020...2829303132...405060...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1493 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

252 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: