10474

Posts

Sep, 4

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the […]
Sep, 4

Accelerating a Cloud-Based Software GNSS Receiver

In this paper we discuss ways to reduce the execution time of a software Global Navigation Satellite System (GNSS) receiver that is meant for offline operation in a cloud environment. Client devices record satellite signals they receive, and send them to the cloud, to be processed by this software. The goal of this project is […]
Sep, 2

Accurate and Efficient Filtering using Anistropic Filter Decomposition

Efficient filtering remains an important challenge in computer graphics, particularly when filters are spatially-varying, have large extent, and/or exhibit complex anisotropic profiles. We present an efficient filtering approach for these difficult cases based on anisotropic filter decomposition (IFD). By decomposing complex filters into linear combinations of simpler, displaced isotropic kernels, and precomputing a compact prefiltered […]
Sep, 2

Oncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters

Accelerated and in-core implementations of Big Data applications typically require large amounts of host and accelerator memory as well as efficient mechanisms for transferring data to and from accelerators in heterogeneous clusters. Scheduling for heterogeneous CPU and GPU clusters has been investigated in depth in the high-performance computing (HPC) and cloud computing arenas, but there […]
Sep, 2

Towards a functional run-time for dense NLA domain

We investigate the use of functional programming to develop a numerical linear algebra run-time; i.e. a framework where the solvers can be adapted easily to different contexts and task parallelism can be attained (semi-) automatically. We follow a bottom up strategy, where the first step is the design and implementation of a framework layer, composed […]
Sep, 2

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters

By giving another way to see beneath the Earth, gravimetry refines geophysical exploration. In this paper, we evaluate the gravimetry field in the Chicxulub crater area located in between the Yucatan region and the Gulf of Mexico which shows strong gravimetry and magnetic anomalies. High order finite elements analysis is considered with input data arising […]
Sep, 2

Implementation Details of GPU-based Out-of-Core Many-Lights Rendering

In this document, we provide implementation details of the GPUbased out-of-core many-lights rendering method. First, we introduce the organization of out-of-core data and the graph data used for data management. Then, we introduce the algorithm used in data preparation step. Finally, we give the details of the out-of-core shading step.
Aug, 31

A Scalable, Efficient Scheme for Evaluation of Stencil Computations over Unstructured Meshes

Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. […]
Aug, 31

Bitcoin and The Age of Bespoke Silicon

Recently, the Bitcoin cryptocurrency has been an international sensation. This paper tells the story of Bitcoin hardware: how a group of early-adopters self-organized and financed the creation of an entire new industry, leading to the development of machines, including ASICs, that had orders of magnitude better performance than what Dell, Intel, NVidia, AMD or Xilinx […]
Aug, 31

Particle Swarm Optimization of Model Parameters: Simulation of Deep Reactive Ion Etching by the Continuous Cellular Automaton

As a widespread form of Deep Reactive Ion Etching (DRIE), the Bosch process alternates etching and passivation cycles, typically leading to characteristic scalloping patterns on the sidewalls. Measurements of the etch depth per cycle l_d and undercut length per cycle l_u show a strong dependence of the undercut ratio l_u / l_d on the trench […]
Aug, 31

Computing High Resolution Explicit Corridor Maps using Parallel Technologies

This work investigates the approximated construction of Explicit Corridor Maps (ECMs). An ECM is a type of Navigation Mesh: a geometrical structure describing the walkable space of an environment that is used to speed-up the path-finding and crowd simulation operations occurring in the environment. Additional geometrical routines that take advantage of the GPGPU model are […]
Aug, 31

Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment

Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: