10468

Posts

Sep, 2

Towards a functional run-time for dense NLA domain

We investigate the use of functional programming to develop a numerical linear algebra run-time; i.e. a framework where the solvers can be adapted easily to different contexts and task parallelism can be attained (semi-) automatically. We follow a bottom up strategy, where the first step is the design and implementation of a framework layer, composed […]
Sep, 2

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters

By giving another way to see beneath the Earth, gravimetry refines geophysical exploration. In this paper, we evaluate the gravimetry field in the Chicxulub crater area located in between the Yucatan region and the Gulf of Mexico which shows strong gravimetry and magnetic anomalies. High order finite elements analysis is considered with input data arising […]
Sep, 2

Implementation Details of GPU-based Out-of-Core Many-Lights Rendering

In this document, we provide implementation details of the GPUbased out-of-core many-lights rendering method. First, we introduce the organization of out-of-core data and the graph data used for data management. Then, we introduce the algorithm used in data preparation step. Finally, we give the details of the out-of-core shading step.
Aug, 31

A Scalable, Efficient Scheme for Evaluation of Stencil Computations over Unstructured Meshes

Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. […]
Aug, 31

Bitcoin and The Age of Bespoke Silicon

Recently, the Bitcoin cryptocurrency has been an international sensation. This paper tells the story of Bitcoin hardware: how a group of early-adopters self-organized and financed the creation of an entire new industry, leading to the development of machines, including ASICs, that had orders of magnitude better performance than what Dell, Intel, NVidia, AMD or Xilinx […]
Aug, 31

Particle Swarm Optimization of Model Parameters: Simulation of Deep Reactive Ion Etching by the Continuous Cellular Automaton

As a widespread form of Deep Reactive Ion Etching (DRIE), the Bosch process alternates etching and passivation cycles, typically leading to characteristic scalloping patterns on the sidewalls. Measurements of the etch depth per cycle l_d and undercut length per cycle l_u show a strong dependence of the undercut ratio l_u / l_d on the trench […]
Aug, 31

Computing High Resolution Explicit Corridor Maps using Parallel Technologies

This work investigates the approximated construction of Explicit Corridor Maps (ECMs). An ECM is a type of Navigation Mesh: a geometrical structure describing the walkable space of an environment that is used to speed-up the path-finding and crowd simulation operations occurring in the environment. Additional geometrical routines that take advantage of the GPGPU model are […]
Aug, 31

Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment

Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing […]
Aug, 30

A Feedback Approach to Task Partitioning in Heterogeneous Architectures

Personal Computers of today are based on complex architectures often with multiple high performance computational units for various dedicated purposes. The General Purpose GPU is one such example where Graphic Processing Units are being used for more general purpose computing. In this paper, we target such architectures and focus on Load Balancing and Task Partitioning […]
Aug, 30

Real-Time GPU Path Tracing

In this paper, we present a simple, yet efficient implementation of the path tracing algorithm for GPUs. A reformulation of Russian Roulette is used to achieve high SIMT utilization, which leads to real-time performance in Kajiya’s classic scene, using a single GPU. We apply our scheme to larger scenes in the Brigade system, an experimental […]
Aug, 30

Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation

The use of digital microscopy allows diagnosis through automated quantitative and qualitative analysis of the digital images. Often to evaluate the samples, the first step is determining the number and location of cell nuclei. For this purpose, we have developed a GPGPU based data-parallel region growing algorithm that is equally as accurate as the already […]
Aug, 30

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Achieving high computational performance on large-scale high performance computing (HPC) system demands optimizations to exploit hardware characteristics. Various optimizations and research strategies are implemented to improve performance with emphasis on single or multiple hardware characteristics. Among these approaches, the domain-specific approach involving domain expertise shows its high potential in achieving high performance and maintaining performance […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: