Posts
Sep, 5
Research on double negative materials by using FDTD method based on GPUs
In recent years, the finite difference time domain (FDTD) method has been prevailed in the simulation of metamaterials widely. As the FDTD method can be suitable for the parallel computing, we apply this method to the Fermi-architecture Graphic Process Units (GPUs) to calculate the electromagnetic simulation of double negative materials in this paper. Finally, both […]
Sep, 5
HISQ inverter on Intel Xeon Phi and NVIDIA GPUs
The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA […]
Sep, 4
Analysis of KECCAK Tree Hashing on GPU Architectures
In an effort to provide security and data integrity, hashing algorithms have been designed to consume an input of any length to produce a fixed length output. KECCAK was selected by NIST to become the next Secure Hashing Algorithm SHA-3) after nearly five years of competition. In addition to providing a sequential operating mode, there […]
Sep, 4
Acceleration of stereo-matching on multi-core CPU and GPU
This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding […]
Sep, 4
A uniform approach for programming distributed heterogeneous computing systems
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous […]
Sep, 4
Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems
This paper describes a hybrid multicore/GPU solver for the incompressible Navier-Stokes equations with constant coefficients, discretized by the finite difference method. By applying the prediction-projection method, the Navier-Stokes equations are transformed into a combination of Helmholtzlike and Poisson equations for which we describe efficient solvers. As an extension of our previous paper [1], this paper […]
Sep, 4
CUDA method for the FDTD simulation by GPU
The technology of computational devices has been developed over several decades especially graphic processors which not only deal with graphic works but also compute scientific problems. This processor is suitable for parallel computations instead of using expensive high-end devices. Many research groups have implemented parallel computations using the MPI method with multi CPUs to solve […]
Sep, 3
New efficient integral algorithms for quantum chemistry
The contents of this thesis are centered in the developement of new efficient algorithms for molecular integral evaluation in quantum chemistry, as well as new design and implementation strategies for such algorithms aimed at maximizing their performance and the utilization of modern hardware. This thesis introduces the K4+MIRROR algorithm for 2-electron repulsion integrals, a new […]
Sep, 3
GPU-accelerated Database Systems: Survey and Open Challenges
The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of […]
Sep, 3
Detection of retransmissions in 10G Ethernet using GPUs
Traffic analysis is an essential part of capacity planning, quality of service assurance and reinforcement of security in current telecommunication networks. As the network speed increases so does the traffic volume and the analysis of large traffic traces is computationally intensive. This document presents a flow extraction software that allows obtaining TCP flow records at […]
Sep, 3
Searching for a counterexample of Kurepa’s Conjecture
Kurepa’s conjecture states that there is no odd prime p which divides !p=0!+1!+…+(p-1)!. We search for a counterexample of this conjecture for all p<10^10. We introduce new optimization techniques and perform the computation using graphics processing units (GPUs). Additionally, we consider the generalized Kurepa’s left factorial given as !kn=(0!)k+(1!)k+…+((n-1)!)k and show that for all integers […]
Sep, 3
Performance Portability Study of Linear Algebra Kernels in OpenCL
The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. […]

