8961

Posts

Feb, 3

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

The QR decomposition with column pivoting (QRP) of a matrix is widely used for numerical rank revealing in applications. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using […]
Feb, 3

Hybrid CPU-GPU Distributed Framework for Large Scale Mobile Networks Simulation

Most of the existing packet-level simulation tools are designed to perform experiments modeling a small to medium scale networks. The main reason of this limitation is the amount of available computation power and memory in quasi mono-process simulation environment. To enable efficient packet-level simulation for large scale scenario, we introduce a new CPUGPU co-simulation framework […]
Feb, 3

JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication

In this paper, we propose a wireless high resolution video transmission system with encryption and authentication. The proposed system is implemented by JPEG 2000 coding. We implement JPEG 2000 coder by GPU in CUDA which is an integrated development environment for GPU, or by JPEG 2000 codec LSI. Moreover, the authentication system can check the […]
Feb, 3

Fast and Maliciously Secure Two-Party Computation Using the GPU

We describe, and implement, a maliciously secure protocol for secure two-party computation, based on Yao’s garbled circuit and an efficient OT extension, in a parallel computational model. The implementation is done using CUDA and yields the fastest results for maliciously secure two-party computation in a realistic and practical setting by using a simple consumer grade […]
Feb, 2

Software Reliability Enhancements for GPU Applications

As the role of highly-parallel accelerators becomes more important in high performance computing, so does the need to ensure their reliable operation. In applications where precision and correctness is a necessity, bit-level reliable operation is required. While there exist mechanisms for error detection and correction, the cost-effective implementation in massively parallel accelerators is still an […]
Feb, 2

Portable Performance on Heterogeneous Architectures

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of the greatest computational resources is now their graphics coprocessors (GPUs), not just their primary CPUs. But GPU programming and memory models differ dramatically from […]
Feb, 2

Heterogeneous GPU and CPU acceleration of a finite volume compressible flow solver for multiblock structured grids

The main objective of this project is to investigate the applications of heterogeneous acceleration to finite volume compressible flow solver for multiblock structured grids. Provided as Fortran source code, the ROTORMBMGS computational fluid dynamics program currently uses domain decomposition and message passing to distribute computation across multiple computers. Winning awards for scaling performance, there is […]
Feb, 2

Improving GPGPU Concurrency with Elastic Kernels

Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 […]
Feb, 2

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes; scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports […]
Feb, 1

Embedding OpenCL in GHC Haskell

OpenCL defines a computation model for data-parallel code, supporting compilation to a variety of platforms, including both conventional x86 CPUs and commodity graphics hardware. OpenCL consists of both a programming language for writing data parallel code, called kernels, and an API, written in C, for interacting with the OpenCL platform and invoking OpenCL kernels. We […]
Feb, 1

Efficient Exploitation of Heterogeneous Platforms for Vertebra Detection in X-Ray Images

Back problems are often related to an abnormal condition of the spine. In this context, conventional X-Ray radiography is the most common modality used in emergency rooms since it is relatively inexpensive and fast. In this paper, we are interested in a method for detecting and extracting vertebrae on X-Ray images. In a medical context, […]
Jan, 31

Validation of the PyGBe code for Poisson-Boltzmann equation with boundary element methods

The PyGBe code solves the linearized Poisson-Boltzmann equation using a boundary-integral formulation. We use a boundary element method with a collocation approach, and solve it via a Krylov-subspace method. To do this efficiently, the matrix-vector multiplications in the Krylov iterations are accelerated with a treecode, achieving O(N log N) complexity. The code presents a Python […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org