16962

Posts

Feb, 5

Critical Comparison of the Classification Ability of Deep Convolutional Neural Network Frameworks with Support Vector Machine Techniques in the Image Classification Process

Recently, a number of new image classification models have been developed to diversify the number of options available to prospective machine learning classifiers, such as Deep Learning. This is particularly important in the field of medical image classification as a misdiagnosis could have a severe impact on the patient. However, an assessment on the level […]
Feb, 5

Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU’s massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing […]
Feb, 5

Fast Fourier Transforms over Prime Fields of Large Characteristic and their Implementation on Graphics Processing Units

Prime field arithmetic plays a central role in computer algebra and supports computation in Galois fields which are essential to coding theory and cryptography algorithms. The prime fields that are used in computer algebra systems, in particular in the implementation of modular methods, are often of small characteristic, that is, based on prime numbers that […]
Feb, 5

Clustering Throughput Optimization on the GPU

Large datasets in astronomy and geoscience often require clustering and visualizations of phenomena at different densities and scales in order to generate scientific insight. We examine the problem of maximizing clustering throughput for concurrent dataset clustering in spatial dimensions. We introduce a novel hybrid approach that uses GPUs in conjunction with multicore CPUs for algorithmic […]
Feb, 5

GraviDy: a GPU modular, parallel N-body integrator

A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles ($Ngtrsim 10^{6}$) under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and the detection of sources of gravitational radiation. The direct-summation […]
Feb, 2

Analysis and implementation of a BLAST-Like algorithm for MIC architectures

Sequence alignment is becoming increasingly important in our current day and age, and with the rise of coprocessors, it is important to adapt sequence alignment algorithms to the new architecture. Parallelization using SIMD technology has previously been achieved that implement alignment algorithms e efficiently such as SWIPE, described by Rognes in 2011. The Intel Xeon […]
Feb, 2

MPI-GPU parallelism in iterative eigensolvers for block-tridiagonal matrices

We consider the computation of a few eigenpairs of a generalized eigenvalue problem Ax = lambda Bx with block-tridiagonal matrices, not necessarily symmetric, in the context of Krylov methods. In this kind of computation, it is often necessary to solve a linear system of equations in each iteration of the eigensolver, for instance when B […]
Feb, 2

Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes […]
Feb, 2

Optimum Application Deployment Technology for Heterogeneous IaaS Cloud

Recently, cloud systems composed of heterogeneous hardware have been increased to utilize progressed hardware power. However, to program applications for heterogeneous hardware to achieve high performance needs much technical skill and is difficult for users. Therefore, to achieve high performance easily, this paper proposes a PaaS which analyzes application logics and offloads computations to GPU […]
Feb, 2

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models […]
Jan, 31

CFP: Fifth International Workshop on OpenCL (IWOCL 2017) – EXTENDED

Now in its fifth year, the International Workshop on OpenCL (IWOCL) will be hosted by The University of Toronto, Canada, at the Bahen Centre on May 16th-18th 2017. May 16th sees two activities: an Advanced Hands On OpenCL tutorial and a SYCL workshop, while May 17th and 18th will include of a mix of keynotes, […]
Jan, 26

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, different approaches and designs for implementing algorithms with high data load must be studied and compared. This work compares several different algorithm designs and parallelization APIs (such as OpenMP, OpenCL and CUDA) for both CPU and GPU platforms. We used the Cholesky […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: