Feb, 5

Fast Fourier Transforms over Prime Fields of Large Characteristic and their Implementation on Graphics Processing Units

Prime field arithmetic plays a central role in computer algebra and supports computation in Galois fields which are essential to coding theory and cryptography algorithms. The prime fields that are used in computer algebra systems, in particular in the implementation of modular methods, are often of small characteristic, that is, based on prime numbers that […]
Feb, 5

Clustering Throughput Optimization on the GPU

Large datasets in astronomy and geoscience often require clustering and visualizations of phenomena at different densities and scales in order to generate scientific insight. We examine the problem of maximizing clustering throughput for concurrent dataset clustering in spatial dimensions. We introduce a novel hybrid approach that uses GPUs in conjunction with multicore CPUs for algorithmic […]
Feb, 5

GraviDy: a GPU modular, parallel N-body integrator

A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles ($Ngtrsim 10^{6}$) under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and the detection of sources of gravitational radiation. The direct-summation […]
Feb, 2

Analysis and implementation of a BLAST-Like algorithm for MIC architectures

Sequence alignment is becoming increasingly important in our current day and age, and with the rise of coprocessors, it is important to adapt sequence alignment algorithms to the new architecture. Parallelization using SIMD technology has previously been achieved that implement alignment algorithms e efficiently such as SWIPE, described by Rognes in 2011. The Intel Xeon […]
Feb, 2

MPI-GPU parallelism in iterative eigensolvers for block-tridiagonal matrices

We consider the computation of a few eigenpairs of a generalized eigenvalue problem Ax = lambda Bx with block-tridiagonal matrices, not necessarily symmetric, in the context of Krylov methods. In this kind of computation, it is often necessary to solve a linear system of equations in each iteration of the eigensolver, for instance when B […]
Feb, 2

Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes […]
Feb, 2

Optimum Application Deployment Technology for Heterogeneous IaaS Cloud

Recently, cloud systems composed of heterogeneous hardware have been increased to utilize progressed hardware power. However, to program applications for heterogeneous hardware to achieve high performance needs much technical skill and is difficult for users. Therefore, to achieve high performance easily, this paper proposes a PaaS which analyzes application logics and offloads computations to GPU […]
Feb, 2

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models […]
Jan, 31

CFP: Fifth International Workshop on OpenCL (IWOCL 2017) – EXTENDED

Now in its fifth year, the International Workshop on OpenCL (IWOCL) will be hosted by The University of Toronto, Canada, at the Bahen Centre on May 16th-18th 2017. May 16th sees two activities: an Advanced Hands On OpenCL tutorial and a SYCL workshop, while May 17th and 18th will include of a mix of keynotes, […]
Jan, 26

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, different approaches and designs for implementing algorithms with high data load must be studied and compared. This work compares several different algorithm designs and parallelization APIs (such as OpenMP, OpenCL and CUDA) for both CPU and GPU platforms. We used the Cholesky […]
Jan, 26

Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement […]
Jan, 26

Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for computer vision tasks (such as object detection) in terms of accuracy and […]
Page 10 of 915« First...89101112...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: