10689

Posts

Oct, 8

Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX

We have developed GPU versions for two major high-performance-computing (HPC) applications originating from two different scientific domains. GENE is a plasma microturbulence code which is employed for simulations of nuclear fusion plasmas. VERTEX is a neutrino-radiation hydrodynamics code for "first principles"-simulations of core-collapse supernova explosions. The codes are considered state of the art in their […]
Oct, 7

Advanced 2D Rasterization on Modern CPUs

The graphics processing unit (GPU) has become part of our everyday life through desktop computers and portable devices (tablets, mobile phones, etc.). Because of the dedicated hardware visualization has been significantly accelerated and today’s software uses only the GPU for rasterization. Besides the graphical devices, the central processing unit (CPU) has also made remarkable progress. […]
Oct, 7

Performance evaluation of CUDA programming for machining simulation

5-axis milling simulations in CAM software are mainly used to detect collisions between the tool and the part. They are very limited in terms of surface topography investigations to validate machining strategies as well as machining parameters such as chordal deviation, scallop height and tool feed. Z-buffer or N-Buffer machining simulations provide more precise simulations […]
Oct, 7

GPU Accelerated Conjunction Assessment with Applications to Formation Flight and Space Debris Tracking

The primary purpose of conjunction assessment (CA) is to prevent the collision of objects in space. Typical collision scenarios involve satellites with space debris or a formation of satellites with each other. Users performing orbit propagation and CA on very large scales must judiciously moderate force model fidelity and/or acutely limit the number of objects […]
Oct, 7

Vectorized OpenCL implementation of numerical integration for higher order finite elements

In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the […]
Oct, 7

Numerical integration on GPUs for higher order finite elements

The paper considers the problem of implementation on graphics processors of numerical integration routines for higher order finite element approximations. The design of suitable GPU kernels is investigated in the context of general purpose integration procedures, as well as particular example applications. The most important characteristic of the problem investigated is the large variation of […]
Oct, 5

Measurements of performance of hardware and general purpose classical molecular dynamics simulation software

This note presents different measurements of hardware and software performance in classical molecular dynamics (CMD) simulations from 2001 through 2010 obtained from published literature and the internet. Opinion articles by CMD researchers point out that tools developed during that decade to set-up CMD simulations barely increased human productivity. Massively parallel hardware and CMD software running […]
Oct, 5

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages […]
Oct, 5

Parametric GPU Code Generation for Affine Loop Programs

Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources […]
Oct, 5

Clock Math – A System for Solving SLEs Exactly

In this paper, we present a GPU-accelerated hybrid system that solves ill-conditioned systems of linear equations exactly. Exactly means without rounding errors due to using integer arithmetics. First, we scale floating-point numbers up to integers, then we solve dozens of SLEs within different modular arithmetics and then we assemble sub-solutions back using the Chinese remainder […]
Oct, 5

GPU Based Generation and Real-Time Rendering of Semi-Procedural Terrain Using Features

Generation and real-time rendering of terrain is a complex and multifaceted problem. Besides the obvious trade-offs between performance and quality, many different generation and rendering solutions exist. Different choices in implementation will result in very different visuals, usability and tools for generation. In this thesis, a fast and intuitive terrain generation method based on sketching […]
Oct, 4

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives. Since OpenACC can generate OpenCL and CUDA code, meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler, it is attractive to using OpenACC on hardwares with different underlying microarchitectures. This paper studies how realistic it is to […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org