6214

Posts

Nov, 2

Multi GPU Performance of Conjugate Gradient Solver with Staggered Fermions in Mixed Precision

GPU has a significantly higher performance in single-precision computing than that of double precision. Hence, it is important to take a maximal advantage of the single precision in the CG inverter, using the mixed precision method. We have implemented mixed precision algorithm to our multi GPU conjugate gradient solver. The single precision calculation use half […]
Nov, 1

APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients

In this paper we show how it is possible to improve the efficiency of existing holistic forms of object detection by refining detection areas to smaller subsets. Although this method can be applied to any form of object detection, this paper will specifically focus on the topic of pedestrian detection in lowresolution non-stationary video footage.
Nov, 1

A Preliminary Review of Literature on Parallel Constraint Solving

With the ubiquity of multicore computing, and the likely expansion of it, it seems irresponsible for constraints researchers to ignore the implications of it. Therefore, the authors have recently begun investigating the literature in constraints on exploitation of parallel systems for constraint solving. We have been compiling an incomplete, biased, and ill-written review of this […]
Nov, 1

Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine grained computation with communication, etc. In this paper, we present a fine-grained task-based execution […]
Nov, 1

Using DRBL to Deploy MPICH2 and CUDA on Green Computing

In this paper, an energy efficient architecture for Build Energy Efficient GPU and CPU Cluster Using DRBL is proposed. This architecture helps administrator not only to quickly deploy and manage GPU and CPU Cluster environment, but also bring benefit of energy efficiency in scientific computing. The experiment simulates 3 cases to prove energy efficiency. We […]
Nov, 1

Quantum chemical many-body theory on heterogeneous nodes

he iterative solution of the coupled-cluster with single and double excitations (CCSD) equations is a very time-consuming component of the "gold standard" in quantum chemistry, the CCSD(T) method. In an effort to accelerate accurate quantum mechanical calculations, we explore two implementation strategies for the iterative solution of the CC equations on graphics procesing units (GPUs). […]
Nov, 1

An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU

Differential evolution is an efficient meta-heuristic optimization method with solid record of real world applications. In this paper, we present a simple and efficient implementation of the differential evolution using the massively parallel CUDA architecture. We demonstrate the speedup and improvements obtained by the parallelization of this intelligent algorithm on the problem of scheduling of […]
Nov, 1

A Real-Time Computer Vision Library for Heterogeneous Processing Environments

With a variety of processing technologies available today, using a combination of different technologies often provides the best performance for a particular task. However, unifying multiple processors with different instruction sets can be a very ad hoc and difficult process. The Open Component Portability Infrastructure (OpenCPI) provides a platform that simplifies programming heterogeneous processing applications […]
Nov, 1

Fast computation of scattering maps of nanostructures using graphical processing units

Scattering maps from strained or disordered nanostructures around a Bragg reflection can be either computed quickly using approximations and a (fast) Fourier transform or obtained using individual atomic positions. In this article, it is shown that it is possible to compute up to 4*10^10 reflections*atoms*s^-1 using a single graphics card, and the manner in which […]
Nov, 1

Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]
Nov, 1

Extremely large scale simulation of a Kardar-Parisi-Zhang model using graphics cards

The octahedron model introduced recently has been implemented onto graphics cards, which permits extremely large scale simulations via binary lattice gases and bit coded algorithms. We confirm scaling behavior belonging to the 2d Kardar-Parisi-Zhang universality class and find a surface growth exponent: beta=0.2415(15) on 2^17 x 2^17 systems, ruling out beta=1/4 suggested by field theory. […]
Oct, 31

Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org