## Posts

Sep, 5

### HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA […]

Sep, 4

### Analysis of KECCAK Tree Hashing on GPU Architectures

In an effort to provide security and data integrity, hashing algorithms have been designed to consume an input of any length to produce a fixed length output. KECCAK was selected by NIST to become the next Secure Hashing Algorithm SHA-3) after nearly five years of competition. In addition to providing a sequential operating mode, there […]

Sep, 4

### Acceleration of stereo-matching on multi-core CPU and GPU

This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding […]

Sep, 4

### A uniform approach for programming distributed heterogeneous computing systems

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous […]

Sep, 4

### Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems

This paper describes a hybrid multicore/GPU solver for the incompressible Navier-Stokes equations with constant coefficients, discretized by the finite difference method. By applying the prediction-projection method, the Navier-Stokes equations are transformed into a combination of Helmholtzlike and Poisson equations for which we describe efficient solvers. As an extension of our previous paper [1], this paper […]

Sep, 4

### CUDA method for the FDTD simulation by GPU

The technology of computational devices has been developed over several decades especially graphic processors which not only deal with graphic works but also compute scientific problems. This processor is suitable for parallel computations instead of using expensive high-end devices. Many research groups have implemented parallel computations using the MPI method with multi CPUs to solve […]

Sep, 3

### New efficient integral algorithms for quantum chemistry

The contents of this thesis are centered in the developement of new efficient algorithms for molecular integral evaluation in quantum chemistry, as well as new design and implementation strategies for such algorithms aimed at maximizing their performance and the utilization of modern hardware. This thesis introduces the K4+MIRROR algorithm for 2-electron repulsion integrals, a new […]

Sep, 3

### GPU-accelerated Database Systems: Survey and Open Challenges

The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of […]

Sep, 3

### Detection of retransmissions in 10G Ethernet using GPUs

Traffic analysis is an essential part of capacity planning, quality of service assurance and reinforcement of security in current telecommunication networks. As the network speed increases so does the traffic volume and the analysis of large traffic traces is computationally intensive. This document presents a flow extraction software that allows obtaining TCP flow records at […]

Sep, 3

### Searching for a counterexample of Kurepa’s Conjecture

Kurepa’s conjecture states that there is no odd prime p which divides !p=0!+1!+…+(p-1)!. We search for a counterexample of this conjecture for all p<10^10. We introduce new optimization techniques and perform the computation using graphics processing units (GPUs). Additionally, we consider the generalized Kurepa’s left factorial given as !kn=(0!)k+(1!)k+…+((n-1)!)k and show that for all integers […]

Sep, 3

### Performance Portability Study of Linear Algebra Kernels in OpenCL

The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. […]

Sep, 2

### Directive-Based Compilers for GPUs

General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a […]