8989

Posts

Feb, 9

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Compute-intensive data-analytic (CIDA) applications have become a major component of many different business domains, as well as scientific computing applications. These algorithms stem from domains as diverse as web analysis and social networks, machine learning and data mining, text analysis, bio-informatics, astronomy image analysis, business analytics, large scale graph algorithms, image/video processing and recognition, some […]
Feb, 9

Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis

3D ECT provides a lot of challenging computational issues as image reconstruction requires execution of many basic operations of linear algebra, especially when the solutions are based on Finite Element Method. In order to reach real-time reconstruction a 3D ECT computational subsystem has to be able to transform capacitance data into image in fractions of […]
Feb, 9

A multi-lane traffic simulation model via continuous cellular automata

Traffic models based on cellular automata have high computational efficiency because of their simplicity in describing unrealistic vehicular behavior and the versatility of cellular automata to be implemented on parallel processing. On the other hand, the other microscopic traffic models such as car-following models are computationally more expensive, but they have more realistic driver behaviors […]
Feb, 9

Practical Patient-Specific Cardiac Blood Flow Simulations Using SPH

While recent developments in the field of ventricular blood flow simulations have pushed modeling to increasingly high levels of accuracy, there has been a steep cost in computation time. Current state-of-the-art simulators take days to run, which is impractical for use in a clinical setting. In this paper, we describe novel adaptations of the SPH […]
Feb, 9

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via just-in-time compilation. We extend SEJITS to exploit inter-machine parallelism by targeting clusters of machines via MapReduce. Our work enables […]
Feb, 9

Fast 3D Wavelet Transform on Multicore and Manycore Computing Platforms

Three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking, compression of volumetric medical data, multispectral image coding, 3D model coding and video coding. In this work, we present several strategies to speed-up the 3D-DWT computation through multicore processing. An in depth analysis about […]
Feb, 9

Document Stream Clustering using GPUs

The Web is constantly generating streams of textual information in the form of News articles and Tweets. In order for Information Retrieval systems to make sense of all this data partitional clustering algorithms are used to create groups of similar documents. Traditional clustering algorithms, like K-means, are not well suited for stream processing where the […]
Feb, 9

Effectiveness of program transformations and compilers for directive-based GPU programming models

Accelerator devices like the General Purpose Graphics Computing Units (GPGPUs) play an important role in enhancing the performance of many contemporary scientific applications. However, programming GPUs using languages like C for CUDA or OpenCL requires relatively high investment of time and the resulting programs are often fine-tuned to perform well only on a particular device. […]
Feb, 9

Using Hybrid Shared and Distributed Caching for Mixed-Coherency GPU Workloads

Current GPU computing models support a mixture of coherent and incoherent classes of memory operations. Workloads using these models typically have working sets too large to fit in an economical SRAM structure. Still, GPU architectures have last-level caches to primarily fulfill two functions: eliminate redundant DRAM accesses servicing requests from different L1 caches to the […]
Feb, 9

GPU-based Monte Carlo radiotherapy dose calculation using phase-space sources

A novel phase-space source implementation has been designed for GPU-based Monte Carlo dose calculation engines. Due to the parallelized nature of GPU hardware, it is essential to simultaneously transport particles of the same type and similar energies but separated spatially to yield a high efficiency. We present three methods for phase-space implementation that have been […]
Feb, 8

Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream […]
Feb, 7

Efficient Wave Propagation in Discontinuous Media and Complex Geometry for Many-core Architectures

We present an accelerated numerical solver for the scalar wave equation using one and two GPUs. We consider complex geometry and study accuracy when performing the computation in both single and double precision. The method uses a high-order accurate approximation of the derivatives using summation-by-parts operators. The boundary conditions are imposed using the simultaneous approximation […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org