12770

Posts

Sep, 4

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems

This paper describes a hybrid multicore/GPU solver for the incompressible Navier-Stokes equations with constant coefficients, discretized by the finite difference method. By applying the prediction-projection method, the Navier-Stokes equations are transformed into a combination of Helmholtzlike and Poisson equations for which we describe efficient solvers. As an extension of our previous paper [1], this paper […]
Sep, 4

CUDA method for the FDTD simulation by GPU

The technology of computational devices has been developed over several decades especially graphic processors which not only deal with graphic works but also compute scientific problems. This processor is suitable for parallel computations instead of using expensive high-end devices. Many research groups have implemented parallel computations using the MPI method with multi CPUs to solve […]
Sep, 3

New efficient integral algorithms for quantum chemistry

The contents of this thesis are centered in the developement of new efficient algorithms for molecular integral evaluation in quantum chemistry, as well as new design and implementation strategies for such algorithms aimed at maximizing their performance and the utilization of modern hardware. This thesis introduces the K4+MIRROR algorithm for 2-electron repulsion integrals, a new […]
Sep, 3

GPU-accelerated Database Systems: Survey and Open Challenges

The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of […]
Sep, 3

Detection of retransmissions in 10G Ethernet using GPUs

Traffic analysis is an essential part of capacity planning, quality of service assurance and reinforcement of security in current telecommunication networks. As the network speed increases so does the traffic volume and the analysis of large traffic traces is computationally intensive. This document presents a flow extraction software that allows obtaining TCP flow records at […]
Sep, 3

Searching for a counterexample of Kurepa’s Conjecture

Kurepa’s conjecture states that there is no odd prime p which divides !p=0!+1!+…+(p-1)!. We search for a counterexample of this conjecture for all p<10^10. We introduce new optimization techniques and perform the computation using graphics processing units (GPUs). Additionally, we consider the generalized Kurepa’s left factorial given as !kn=(0!)k+(1!)k+…+((n-1)!)k and show that for all integers […]
Sep, 3

Performance Portability Study of Linear Algebra Kernels in OpenCL

The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. […]
Sep, 2

Directive-Based Compilers for GPUs

General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a […]
Sep, 2

LightPlay: Efficient Replay with GPUs

Previous deterministic replay systems reduce the runtime overhead by either relying on hardware support or by relaxing the determinism requirements for replay. We propose LightPlay that fulfills stricter determinism requirements with low overhead without requiring hardware or OS support. LightPlay guarantees that the memory state after each instruction instance in a replay run is the […]
Sep, 2

Determining the difficulty of accelerating problems on a GPU

General-purpose computation on graphics processing units (GPGPU) has great potential to accelerate many scientific models and algorithms. However, some problems are considerably more difficult to accelerate than others, and it may be challenging for those new to GPGPU to ascertain the difficulty of accelerating a particular problem. Through what was learned in the acceleration of […]
Sep, 2

Optimistic Parallelism on GPUs

We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data parallelism, […]
Sep, 2

Heterogeneous Computing on Mixed Unstructured Grids with PyFR

PyFR is an open-source high-order accurate computational fluid dynamics solver for mixed unstructured grids that can target a range of hardware platforms from a single codebase. In this paper we demonstrate the ability of PyFR to perform high-order accurate unsteady simulations of flow on mixed unstructured grids using heterogeneous multi-node hardware. Specifically, after benchmarking single-node […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: