13913

Posts

Apr, 27

Implementation and performance analysis of the AXPY, DOT, and SpMV functions on Intel Xeon Phi and NVIDIA Tesla using OpenCL

The present work is an analysis of the performance of the AXPY, DOT and SpMV functions using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to nature of the AXPY function, only two versions were implemented, the routine to be executed by the CPU and […]
Apr, 25

Algorithm 9xx: Sparse QR Factorization on the GPU

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge, and is up to eleven times faster than a […]
Apr, 25

A Study of Scheduling a Neuro-imaging Application On a Heterogeneous CPU-GPU Cluster

The ever increasing complexity of scientific applications has led to utilization of new HPC paradigms such as Graphical Processing Units (GPUs). However, modifying applications to run on GPU is challenging. Furthermore, the speedup achieved by using GPUs has added a huge heterogeneity to HPC clusters. In this dissertation, we enabled NPAIRS, a neuro-imaging application, to […]
Apr, 25

Direct Communication Methods for Distributed GPUs

Today, GPUs and other parallel accelerators are widely used in high performance computing, due to their high computational power and high performance per watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. Often, a data transfer between […]
Apr, 25

Flexible Software Profiling of GPU Architectures

To aid application characterization and architecture design space exploration, researchers and engineers have developed a wide range of tools for CPUs, including simulators, profilers, and binary instrumentation tools. With the advent of GPU computing, GPU manufacturers have developed similar tools leveraging hardware profiling and debugging hooks. To date, these tools are largely limited by the […]
Apr, 25

A two-fluid finite-volume solver based on OpenCL

In this paper, we propose a new very simple numerical method for solving liquid-gas compressible flows on two dimensional cartesian meshes. For achieving high performance, the scheme is tested on recent multi-core processors and Graphics Processing Units (GPU), using the OpenCL environment. We describe how to install and to run the code CLBUBBLE for computing […]
Apr, 25

2nd International conference on Networks and Information Security (ICNIS 2015), 2015

Submission Deadline: 2015-07-10 Topics: Communications, Information and Network Security Access control Anti-malware Anonymity Applied cryptography Authentication and authorization Biometric security Data and system integrity Database security Distributed systems security Electronic commerce Fraud control Grid security Information hiding and watermarking Intellectual property protection Intrusion detection Key management and key recovery Language-based security Operating system security Network […]
Apr, 25

7th International Conference on Graphic and Image Processing (ICGIP 2015), 2015

Submission Deadline: 2015-07-10 Topics: Image acquisition Detection and Estimation of Signal Parameters Image processing Signal Identification Medical image processing Nonlinear Signals and Systems Pattern recognition and analysis Time-Frequency Signal Analysis Visualization Signal Reconstruction Image coding and compression Spectral Analysis Face Recognition Filter Design and Structures Super-resolution imaging FIR Filters Image segmentation IIR Filters Face recognition […]
Apr, 25

2nd International Conference on Robotics and Computer Vision (ICRCV 2015), 2015

Submission Deadline: 2015-07-10 Topics: • Evolutionary Robotics • Distributed Sensor Networks • Robot Surgery • Search and Rescue Robots • Biorobotics • Humanoid Robotics • Autonomous Vehicles • Entertainment Robots • Rehabilitation Robotics • Micro/Nano Robotics • Underwater Robots • Service Robotics • Sensors and Early Vision • Color and Texture • Segmentation and Grouping […]
Apr, 23

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse […]
Apr, 23

Multi-swarm PSO algorithm for the Quadratic Assignment Problem: a massive parallel implementation on the OpenCL platform

This paper presents a multi-swarm PSO algorithm for the Quadratic Assignment Problem (QAP) implemented on OpenCL platform. Our work was motivated by results of time efficiency tests performed for single-swarm algorithm implementation that showed clearly that the benefits of a parallel execution platform can be fully exploited, if the processed population is large. The described […]
Apr, 23

A High-resolution approach for Tsunami impact simulation on graphics processing units

Having learned a great deal about the problem and also the solutions over the course of this project, it is the opinion of the author that the method undertaken within this report is unsatisfactory for delivering performance enhancement over alternative approaches. Firstly the domain transfers result in reduced performance. For larger simulations these prove to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: