Posts
Mar, 14
Initial condition for efficient mapping of level set algorithms on many-core architectures
In this paper, we investigated the effect of adding more small curves to the initial condition which determines the required number of iterations of a fast level set (LS) evolution. As a result, we discovered two new theorems and developed a proof on the worst case of the required number of iterations. Furthermore, we found […]
Mar, 13
Visualizing Trends on Twitter
With its popularity, Twitter has become an increasingly valuable source of real-time, user-generated information about interesting events in our world. This thesis presents TwitGeo, a system to explore and visualize trending topics on Twitter. It features an interactive map that summarizes trends across different geographical regions. Powered by a novel GPU-based datastore, this system performs […]
Mar, 13
The Flocking Based and GPU Accelerated Internet Traffic Classification
Mainstream attentions have been brought to the issue of Internet traffic classification due to its political, economic, and legal impacts on appropriate use, pricing, and management of the Internet. Nowadays, both the research and operational communities prefer to classify network traffic through approaches that are based on the statistics of traffic flow features due to […]
Mar, 12
Fast hydrodynamics on heterogenous many-core hardware
In this chapter, we present details of a heterogenous and massively parallel GPU library implementation in CUDA C/C++ of a nonlinear free surface water wave model [15]. We describe how flexible-order finite difference approximations to the partial differential equations of the model can be proto- typed using library components provided in an in-house library. In […]
Mar, 12
Development of High-Performance Software Components for Emerging Architectures
Massively parallel processors, such as graphical processing units (GPUs), have in recent years proven to be effective for a vast amount of scientific appli- cations. Today, most desktop computers are equipped with one or more pow- erful GPUs, offering heterogeneous high-performance computing to a broad range of scientific researchers and software developers. Though GPUs are […]
Mar, 12
2014 7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014
Submission Deadline: 2014-06-05 Publication: All accepted papers of ICACTE 2014 will be published in the conference proceedings, under an ISBN reference by ASME Press, which will be included in the ASME Digital Library, and the publisher will send the proceeding to be reviewed by the Ei Compendex, ISI Proceeding and other major indexing services. Call […]
Mar, 12
Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors
Intel Xeon Phi coprocessors allow symmetric heterogeneous clustering models, in which MPI processes are run fully on coprocessors, as opposed to offload-based clustering. These symmetric models are attractive, because they allow effortless porting of CPU-based applications to clusters with manycore computing accelerators. However, with the default software configuration and without specialized networking hardware, peer-to-peer communication […]
Mar, 12
Locality optimization on a NUMA architecture for hybrid LU factorization
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We […]
Mar, 12
Reduced Vlasov-Maxwell simulations
In this paper we review two different numerical methods for Vlasov-Maxwell simulations. The first method is based on a coupling between a Discontinuous Galerkin (DG) Maxwell solver and a Particle-In-Cell (PIC) Vlasov solver. The second method only uses a DG approach for the Vlasov and Maxwell equations. The Vlasov equation is first reduced to a […]
Mar, 12
Genetically Improved CUDA kernels for StereoCamera
Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]
Mar, 12
Efficient Preconditioned Conjugate Gradient Parallelization on GPU
We present a performance analysis of a parallel implementation of both conjugate gradient and preconditioned conjugate gradient solvers using graphic processing units with CUDA parallel programming model. The solvers were optimized for a fast solution of sparse systems of equations arising from Finite Element Analysis (FEA) of electromagnetic phenomena. The preconditioners were Incomplete Cholesky factorization […]
Mar, 12
MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence
Exact short read mapping to whole genomes with the Smith-Waterman algorithm is computationally expensive yet highly accurate when aligning reads with mismatches and gaps. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to mainstream approaches MaxSSmap identifies a local region of the […]