7511

Posts

Apr, 12

TDDFT in massively parallel computer architectures: the OCTOPUS project

OCTOPUS is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this article we present the ongoing efforts for the parallelisation of OCTOPUS. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has a great […]
Apr, 12

Bonsai: A GPU Tree-Code

We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code […]
Apr, 11

Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods

Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid on massively parallel throughput-oriented processors, such as the GPU, demands algorithms with abundant fine-grained parallelism. In this […]
Apr, 11

RGEM: A Responsive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the […]
Apr, 11

Modular Arithmetic for Solving Linear Equations on the GPU

The linear algebraic equations solution is quite a frequent task within numerical mathematics. One might often find problems while solving problems of the ill-conditioned matrix. The solution stability cannot be ensured for large dense sets of linear equations. Rounding error during the numerical computation cannot be tolerated. There are methods developed that minimize the influence […]
Apr, 11

Real-time Visualization of Streaming Text with Force-Based Dynamic System

Streamit lets users explore visualizations of text streams without prior knowledge of the data. It incorporates incoming documents from a continuous source into an existing visualization context with automatic grouping and separation based on document similarities. A powerful user interface allows in-depth data analysis.
Apr, 11

The 2012 International Conference on Network Computing and Information Security and the 2012 International Conference on Multimedia and Signal Processing, NCIS’12 – CMSP’12

The 2012 International Conference on Network Computing and Information Security (NCIS’12) and the 2012 International Conference on Multimedia and Signal Processing (CMSP’12) will be jointly held at Shanghai, China in December 7-9, 2012. NCIS’12- CMSP’12 aims to provide a high-level international forum for scientists and researchers to present the state of the art of Network […]
Apr, 10

21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013

Parallel, Distributed, and Network-Based Processing has undergone impressive change over recent years. New architectures and applications have rapidly become the central focus of the discipline. These changes are often a result of cross-fertilisation of parallel and distributed technologies with other rapidly evolving technologies such as telecommunications and multimedia. It is of paramount importance to review […]
Apr, 10

Improving Atmospheric Model Performance on a Multi-Core Cluster System

Numerical models have been used extensively in the last decades to understand and predict weather phenomena and the climate. In general, models are classified according to their operation domain: global (entire Earth) and regional (country, state, etc). Global models have spatial resolution of about 0.2 to 1.5 degrees of latitude and therefore cannot represent very […]
Apr, 10

Dynamic Programming with CUDA – Part II

This module is largely stand-alone. It is "Part II" only in the sense that it does not contain the overview of dynamic programming seen in Part I, and does not recapitulate the introduction to CUDA. We will continue to refer the reader to various NVIDIA references where appropriate, particularly the NVIDIA CUDA C Programming Guide, […]
Apr, 10

A Comparative Study of Parallel Algorithms for the Girth Problem

In this paper we introduce efficient parallel algorithms for finding the girth in a graph or digraph, where girth is the length of a shortest cycle. We empirically compare our algorithms by using two common APIs for parallel programming in C++, which are OpenMP for multiple CPUs and CUDA for multi-core GPUs. We conclude that […]
Apr, 10

Hadoop+Aparapi: Making heterogenous MapReduce programming easier

Lately, programmers have started to take advantage of GPU capabilities of cloud-based machines. Using the GPUs can decrease the number of nodes required to perform the computation by increasing the productivity per node. We combine Hadoop, a widely-used MapReduce framework, with Aparapi, a new Java-to-OpenCL conversion tool from AMD. We propose an easy-to-use API which […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: