Posts
Apr, 8
A Hardware Multithreaded SpMV Kernel for the Convey HC-2ex
Applications exhibiting irregular behavior through poor memory locality have been a constant challenge for high-performance computing. Architectures supporting hardware multithreading (e.g. Tera MTA and Cray XMT) have been shown to deliver superior performance on such applications by masking memory latency. FPGAs have outperformed traditional architectures on applications that exhibit very large spatial locality and where […]
Apr, 8
Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability
Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that […]
Apr, 8
Development of methods for the processing of mining images using genetic algorithms
In this paper we describe the extension of system FOTOM capabilities with respect to segmentation of specific mining images. We focus on methods that are inherently resistant against noise present in experimental pit at VSB Technical University. Here, we describe procedures employing proven active contours and evolutionary algorithms for recognizing points of interest in the […]
Apr, 8
Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems
We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of […]
Apr, 7
Atomic-free Irregular Computations on GPUs
Atomic instructions are a key ingredient of codes that operate on irregular data structures like trees and graphs. It is well known that atomics can be expensive, especially on massively parallel GPUs, and are often on the critical path of a program. In this paper, we present two high-level methods to eliminate atomics in irregular […]
Apr, 7
Exploring complex quantum systems with a hybrid CPU-GPU computing platform
One of the most striking features of quantum mechanics is the exponential growth of resources, required to find the states of a composite system, with the size of the system. This also is the origin of the two main bottlenecks in numerical studies of complex quantum systems, that are (i) diagonalizations of big matrices and […]
Apr, 7
Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology
Multiplying large integers is an operation that has many applications in Computational Science. Many cryptographic algorithms require operations on very large subsets of the integer numbers. Using Fast Fourier Transforms (FFT) and Graphics Processing Unit (GPU), we can speed up integer multiplication and make an effective multiplication algorithm. CUDA technology used to perform FFT on […]
Apr, 7
Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpMM operation into three, […]
Apr, 7
A new CUDA-based GPU implementation of the two-dimensional Athena code
We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central […]
Apr, 6
23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013
CASCON 2013 is the 23rd annual international conference hosted by CAS Research, IBM Canada Software Lab. Using the motto, “Innovation that matters”, this conference provides an exciting forum for exchanging ideas and experience in the ever-expanding and critical fields of software engineering and computing. The theme of this year, “Ecosystem of Engagement”, highlights the confluence […]
Apr, 6
OpenCL C++
With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these models are low-level, even when considering them as systems programming models. For example, OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. Computer […]
Apr, 6
A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA
The Ability to harness the power of the Graphics Processor Unit (GPU) enables us to show dramatic increases in computing performance using a parallel computing platform and programming model such as Nvidia CUDA. Compute Unified Device Architecture (CUDA) is NVIDIAs graphics programming API to perform General Purpose Graphics Processing Unit Programming (GPGPU). The General Purpose […]