8375

Posts

Oct, 1

Compute Distance Matrices with GPU

Given a data matrix where the rows are objects and the columns are variables, researchers often want to compute all the pairwise distances among the objects. Due to the design of Nvidia GPU architecture, CUDA code can work with ease data matrices where the numbers of rows and columns are multiples of sixteen. The present […]
Oct, 1

Synthesizing Structured Traversals from Attribute Grammars

We examine how to automatically decompose a program into structured parallel traversals over trees. In our system, programs are declaratively specified as attribute grammars and parallel traversals are defined by a compiler designed to optimize them for both GPUs and multicore CPUs. Our synthesizer automatically finds a correct schedule of the attribute grammar as structured […]
Sep, 30

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs

As the prevalence of general purpose computations on GPU, shared memory programming models were proposed to ease the pain of GPU programming. However, with the demanding needs of more intensive workloads, it’s desirable to port GPU programs to more scalable distributed memory environment, such as multi-GPUs. To achieve this, programs need to be re-written with […]
Sep, 30

Nonperturbative Quantum Field Theory in Astrophysics

The extreme electromagnetic or gravitational fields associated with some astrophysical objects can give rise to macroscopic effects arising from the physics of the quantum vacuum. Therefore, these objects are incredible laboratories for exploring the physics of quantum field theories. In this dissertation, we explore this idea in three astrophysical scenarios.
Sep, 30

ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations

Introduction of Graphical Processing Units (GPUs) and computing using GPUs in recent years opened possibilities for simple parallelization of programs. In this update, we present the modernized version of program ARVO [J. Busa, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorny, J. Skivanek, M.-C. Wu, Comput. Phys. Comm. 165 (2005) 59]. […]
Sep, 30

Real-Time Computer Vision with openCV

Computer vision is a rapidly growing field devoted to analyzing, modifying, and high-level understanding of images. Its objective is to determine what is happening in front of a camera and use that understanding to control a computer or robotic system, or to provide people with new images that are more informative.
Sep, 30

Performance characterization of data-intensive kernels on AMD Fusion architectures

The cost of data movement over the PCI Express bus is one of the biggest performance bottlenecks for accelerating data-intensive applications on traditional discrete GPU architectures. To address this bottleneck, AMD Fusion introduces a fused architecture that tightly integrates the CPU and GPU onto the same die and connects them with a high-speed, on-chip, memory […]
Sep, 29

25th International Conference on Parallel Computational Fluid Dynamics, ParCFD 2013

As in the past years, ParCFD 2013 will include contributed and invited papers. The conference program will mainly consist of contributed lectures to all scientific/technical areas of the conference. ParCFD2013 topics include, but are not limited to: Complex 3D Flow Flows with Moving Interfaces Fluid-Structure Interaction Aerodynamics Hydrodynamics Turbulence Multi-Disciplinary Design Optimization Acoustics Atmospheric & […]
Sep, 28

Optimising Unstructured Mesh Computational Fluid Dynamics Applications on Multicores via Machine Learning and Code Transformation

We show that case-based reasoning (CBR) and deterministic code analysis can be successfully used in optimizing compilers of unstructured mesh applications to obtain better execution times. With the recent shift of CPU architectures towards SIMD capabilities, and of GPU architectures towards general purpose computing, it is no longer clear what optimizations are optimal given a […]
Sep, 28

A Hybrid Parallel Algorithm for Computing and Tracking Level Set Topology

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that […]
Sep, 28

Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs

The race for Exascale computing has naturally led the current technologies to converge to multi-CPU/multi-GPU computers, based on thousands of CPUs and GPUs interconnected by PCI-Express buses or interconnection networks. To exploit this high computing power, programmers have to solve the issue of scheduling parallel programs on hybrid architectures. And, since the performance of a […]
Sep, 28

A fast Texture-by-numbers synthesis method based on texture optimization

The framework of Texture-by-numbers (TBN) synthesizes images of global-varying patterns with intuitive user control. Previous TBN synthesis methods have difficulties in achieving high-quality synthesis results and efficiency simultaneously. This paper proposes a fast TBN synthesis method based on texture optimization, which uses global optimization to solve the controllable non-homogeneous texture synthesis problem. Our algorithm produces […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: