10731

Posts

Oct, 12

Dandelion: a Compiler and Runtime for Heterogeneous Systems

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems […]
Oct, 10

A Parallel Intermediate Representation for Embedded Languages

This thesis presents a parallel intermediate representation for embedded languages called PIRE, and its incorporation into the Feldspar language. The original Feldspar backend translates the parallel loops of Feldspar to ordinary for loops, meaning that they are not actually parallel in the generated code. We create an alternate backend for the Feldspar project, where the […]
Oct, 10

CUDA-Accelerated ODETLAP: A Parallel Lossy Compression Implementation

We present an implementation of Overdetermined Laplacian Partial Differentiation Equations (ODETLAP) that uses CUDA directly. This lossy compression technique approximates a solution to an overdetermined system of equations in order to reconstruct gridded, correlated data. ODETLAP can be used to compress a dataset or to reconstruct missing data. Parallelism in CUDA provides speed performance improvements […]
Oct, 10

GALAMOST: GPU-accelerated large-scale molecular simulation toolkit

A new molecular simulation toolkit composed of some lately developed force fields and specified models is presented to study the self-assembly, phase transition, and other properties of polymeric systems at mesoscopic scale by utilizing the computational power of GPUs. In addition, the hierarchical self-assembly of soft anisotropic particles and the problems related to polymerization can […]
Oct, 10

Direct deconvolution of radio synthesis images using L1 minimisation

We introduce an algorithm for the deconvolution of radio synthesis images that accounts for the non-coplanar-baseline effect, allows multiscale reconstruction onto arbitrarily positioned pixel grids, and allows the antenna elements to have directional dependent gains. Using numerical L1-minimisation techniques established in the application of compressive sensing to radio astronomy, we directly solve the deconvolution equation […]
Oct, 10

Accounting for Secondary Uncertainty: Efficient Computation of Portfolio Risk Measures on Multi and Many Core Architectures

Aggregate Risk Analysis is a computationally intensive and a data intensive problem, thereby making the application of high-performance computing techniques interesting. In this paper, the design and implementation of a parallel Aggregate Risk Analysis algorithm on multi-core CPU and many-core GPU platforms are explored. The efficient computation of key risk measures, including Probable Maximum Loss […]
Oct, 9

Scalable Fast Multipole Methods on Heterogeneous Architecture

The N-body problem appears in many computational physics simulations. At each time step the computation involves an all-pairs sum whose complexity is quadratic, followed by an update of particle positions. This cost means that it is not practical to solve such dynamic N-body problems on large scale. To improve this situation, we use both algorithmic […]
Oct, 9

The GASPI API specification and its implementation GPI 2.0

Gaspi (Global Address Space Programming Interface) is an API specification for Partitioned Global Address Spaces. The Gaspi API is focused on three key objectives: scalability, exibility and failure tolerance. Gaspi uses one-sided RDMA driven communication in combination with remote completion in a PGAS environment. As such, Gaspi aims to initiate a paradigm shift from bulk-synchronous […]
Oct, 9

UPC on MIC: Early Experiences with Native and Symmetric Modes

Intel Many Integrated Core (MIC) architecture is steadily being adopted in clusters owing to its high compute throughput and power efficiency. The current generation MIC coprocessor, Xeon Phi, provides a highly multi-threaded environment with support for multiple programming models. While regular programming models such as MPI/OpenMP have started utilizing systems with MIC coprocessors, it is […]
Oct, 9

Performance Analysis of a Large Memory Application on Multiple Architectures

The Graph500 Breadth-First Search benchmark has emerged as a well-documented PGAS-style application that both scales to large data set sizes and has documented implementations on multiple platforms over multiple years. This paper analyzes the reported performance and extracts insight into what are the leading performance limitations in such systems and how they scale with system […]
Oct, 9

High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry

In this paper we describe a numerical algorithm for integrating the multicomponent, reacting, compressible Navier-Stokes equations, targeted for direct numerical simulation of combustion phenomena. The algorithm addresses two shortcomings of previous methods. First, it incorporates an eighth-order narrow stencil approximation of diffusive terms that reduces the communication compared to existing methods and removes the need […]
Oct, 8

Enabling the use of Heterogeneous Computing for Bioinformatics

The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: