Posts
Jun, 3
Design Tools for Accelerating Development and Usage of Multi-Core Computing Platforms
Multicore computing technologies are critical to high performance embedded systems. Such technologies are advancing rapidly in terms of the diversity of available multicore platforms, and the scale and heterogeneity of computing resources available on multicore-equipped devices. However, development of high performance signal processing software for multicore computing platforms is a complex process. Due to this […]
Jun, 3
Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics
Betweenness Centrality is a widely used graph analytic that has applications such as finding influential people in social networks, analyzing power grids, and studying protein interactions. However, its complexity makes its exact computation infeasible for large graphs of interest. Furthermore, networks tend to change over time, invalidating previously calculated results and encouraging new analyses regarding […]
Jun, 3
Visualization Tool for GPGPU Programming
The running times of some sequential programs could be greatly reduced by converting and running its parallelizable, time dominant code on a massively, parallel processor architecture. Example program application areas include: bioinformatics, molecular dynamics, video and image processing, signal and audio processing, medical imaging, and cryptography. A low cost, low power, parallel computing platform for […]
Jun, 3
Cofactorization on Graphics Processing Units
We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core […]
Jun, 2
Loo.py: transformation-based code generation for GPUs and CPUs
Today’s highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets […]
Jun, 2
Integrated Modelling of Hydrodynamic Processes, Faecal Indicator Organisms and Related Parameters with Improved Accuracy using Parallel (GPU) Computing
Environmental problems and issues are not limited by artificial boundaries created by man. Usually there are different teams or individuals working on the catchments, estuaries, rivers and coastal basins in different countries using different parameters and formulations for various processes. However, the system is a natural one and as such no boundaries exist. When a […]
Jun, 2
Accelerating NTRU based Homomorphic Encryption using GPUs
In this work we introduce a large polynomial arithmetic library optimized for Nvidia GPUs to support fully homomorphic encryption schemes. To realize the large polynomial arithmetic library we convert the polynomial with large coefficients using the Chinese Remainder Theorem into many polynomials with small coefficients, and then carry out modular multiplications in the residue space […]
Jun, 2
Multi-target DPA attacks: Pushing DPA beyond the limits of a desktop computer
Following the pioneering CRYPTO ’99 paper by Kocher et al., differential power analysis (DPA) was initially geared around low-cost computations performed using standard desktop equipment with minimal reliance on device-specific assumptions. In subsequent years, the scope was broadened by, e.g., making explicit use of (approximate) power models. An important practical incentive of so-doing is to […]
Jun, 2
Region Templates: Data Representation and Management for Large-Scale Image Analysis
Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters […]
Jun, 1
Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors
Anomaly detection is an important task for hyperspectral data exploitation. Although many algorithms have been developed for this purpose in recent years, due to the large dimensionality of hyperspectral image data, fast anomaly detection remains a challenging task. In this work, we exploit the computational power of commodity graphics processing units (GPUs) and multicore processors […]
Jun, 1
A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms
Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. There is therefore an urgent […]
Jun, 1
An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix
We present implementation details of a reordering strategy for permuting elements whose absolute value is large to the diagonal of a sparse matrix. This algorithm, based on work by Duff and Koster [9], is a critical component of the SPIKE-based preconditioner provided by the Spike::GPU library [2]. We discuss the four stages required to implement […]