11688

Posts

Mar, 14

High Performance Non-Blocking Collective Communication for Next Generation Infiniband Clusters

The emergence of multi-/many-core architectures, accelerators and high-speed networks, along with continued reduction in hardware costs make it possible to design highly capable supercomputers that offer sustained petaflop performance. However, merely using modern compute architectures and high-speed networks is not sufficient to achieve exascale science. Parallel applications typically involve explicit communication between processes to exchange […]
Mar, 14

Fast Exact Hyper-Graph Matching with Dynamic Programming for Spatio-Temporal Data

Graphs and hyper-graphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult energy function containing geometric or structural terms, frequently coupled with data attached terms involving appearance information. Traditional methods solve the minimization […]
Mar, 14

Initial condition for efficient mapping of level set algorithms on many-core architectures

In this paper, we investigated the effect of adding more small curves to the initial condition which determines the required number of iterations of a fast level set (LS) evolution. As a result, we discovered two new theorems and developed a proof on the worst case of the required number of iterations. Furthermore, we found […]
Mar, 13

Visualizing Trends on Twitter

With its popularity, Twitter has become an increasingly valuable source of real-time, user-generated information about interesting events in our world. This thesis presents TwitGeo, a system to explore and visualize trending topics on Twitter. It features an interactive map that summarizes trends across different geographical regions. Powered by a novel GPU-based datastore, this system performs […]
Mar, 13

The Flocking Based and GPU Accelerated Internet Traffic Classification

Mainstream attentions have been brought to the issue of Internet traffic classification due to its political, economic, and legal impacts on appropriate use, pricing, and management of the Internet. Nowadays, both the research and operational communities prefer to classify network traffic through approaches that are based on the statistics of traffic flow features due to […]
Mar, 12

Fast hydrodynamics on heterogenous many-core hardware

In this chapter, we present details of a heterogenous and massively parallel GPU library implementation in CUDA C/C++ of a nonlinear free surface water wave model [15]. We describe how flexible-order finite difference approximations to the partial differential equations of the model can be proto- typed using library components provided in an in-house library. In […]
Mar, 12

Development of High-Performance Software Components for Emerging Architectures

Massively parallel processors, such as graphical processing units (GPUs), have in recent years proven to be effective for a vast amount of scientific appli- cations. Today, most desktop computers are equipped with one or more pow- erful GPUs, offering heterogeneous high-performance computing to a broad range of scientific researchers and software developers. Though GPUs are […]
Mar, 12

2014 7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014

Submission Deadline: 2014-06-05 Publication: All accepted papers of ICACTE 2014 will be published in the conference proceedings, under an ISBN reference by ASME Press, which will be included in the ASME Digital Library, and the publisher will send the proceeding to be reviewed by the Ei Compendex, ISI Proceeding and other major indexing services. Call […]
Mar, 12

Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors

Intel Xeon Phi coprocessors allow symmetric heterogeneous clustering models, in which MPI processes are run fully on coprocessors, as opposed to offload-based clustering. These symmetric models are attractive, because they allow effortless porting of CPU-based applications to clusters with manycore computing accelerators. However, with the default software configuration and without specialized networking hardware, peer-to-peer communication […]
Mar, 12

Locality optimization on a NUMA architecture for hybrid LU factorization

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We […]
Mar, 12

Reduced Vlasov-Maxwell simulations

In this paper we review two different numerical methods for Vlasov-Maxwell simulations. The first method is based on a coupling between a Discontinuous Galerkin (DG) Maxwell solver and a Particle-In-Cell (PIC) Vlasov solver. The second method only uses a DG approach for the Vlasov and Maxwell equations. The Vlasov equation is first reduced to a […]
Mar, 12

Genetically Improved CUDA kernels for StereoCamera

Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: