Posts
Jul, 18
Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations
This paper illustrates how GPU computing can be used to accelerate computational fluid dynamics (CFD) simulations. For sparse linear systems arising from finite volume discretization, we evaluate and optimize the performance of Conjugate Gradient (CG) routines designed for manycore accelerators and compare against an industrial CPU-based implementation. We also investigate how the recent advances in […]
Jul, 18
IODA: an Input/Output Deep Architecture for image labeling
In this article, we propose a deep neural network (DNN) architecture called Input Output Deep Architecture (IODA) for solving the problem of image labeling. IODA directly links a whole image to a whole label map, assigning a label to each pixel using a single neural network forward step. Instead of designing a handcrafted a priori […]
Jul, 18
HPC on the Intel Xeon Phi: Homomorphic Word Searching
In this paper, the suitability of implementing parallel homomorphic word searching on Intel Xeon Phi coprocessors is evaluated for the first time. Homomorphic encryption allows to produce a cryptogram that encrypts the result of applying some values to any function, even when the input values are encrypted and without access to the privatekey. For example, […]
Jul, 18
A Kinetic Vlasov Model for Plasma Simulation Using Discontinuous Galerkin Method on Many-Core Architectures
Advances are reported in the three pillars of computational science achieving a new capability for understanding dynamic plasma phenomena outside of local thermodynamic equilibrium. A continuum kinetic model for plasma based on the Vlasov-Maxwell system for multiple particle species is developed. Consideration is added for boundary conditions in a truncated velocity domain and supporting wall […]
Jul, 18
Heterogeneous Computing for Data Stream Mining
Graphical Processing Units are de-facto standard for acceleration of data parallel tasks in high performance computing. They are widely used to accelerate batch machine learning algorithms. High-end discrete GPUs are characterized by a very high number of cores (thousands), high bandwidth memory optimized for the stream access and high power requirements. Integrated GPUs are characterized […]
Jul, 16
9th International Conference on Computer and Electrical Engineering (ICCEE), 2016
Paper publication All paper submissions will be peer reviewed and evaluated based on originality, research content, relevance to conference, contributions, and readability. All accepted papers will be published in one of the indexed Journals after proper registration and presentation. – Journal of Computers(JCP, ISSN: 1796-203X) Indexed by: ULRICH’s Periodicals Directory; Google Scholar; INSPEC; etc. – […]
Jul, 16
9th International Conference on Computer Science and Information Technology (ICCSIT), 2016
Paper Publication: All papers are reviewed using a single-blind review process, the accepted paper will be published in journal. Journal of Communications (JCM, ISSN: 1796-2021); Journal of Software(JSW, ISSN: 1796-217X); Journal of Computers(JCP, ISSN: 1796-203X); International Journal of Future Computer and Communication (IJFCC, ISSN: 2010-3751); International Journal of Computer Theory and Engineering (IJCTE, ISSN: 1793-8201); […]
Jul, 16
2nd International Conference on Signal Processing (ICOSP), 2016
Paper Publication Accepted papers of ICOSP 2016 could be published in: International Journal of Signal Processing Systems (IJSPS)
Jul, 16
2nd International Conference on Mechanical Engineering and Electrical Systems (ICMES), 2016
Publication: All accepted papers will be published in the volume of MATEC Web of Conferences (ISSN: 2261-236X), being indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. Submission Methods: Full Paper (publication and oral presentation) Abstract (oral presentation only) Electronic Submission System (.pdf) https://www.easychair.org/conferences/?conf=icmes2016
Jul, 16
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU
Singular Value QR (SVQR) can orthonormalize a set of dense vectors with the minimum communication (one global reduction between the parallel processing units, and BLAS-3 to perform most of its local computation). As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers, where the communication has become […]
Jul, 16
An investigation of GPU-based stiff chemical kinetics integration methods
A fifth-order implicit Runge-Kutta method and two fourth-order exponential integration methods equipped with Krylov subspace approximations were implemented for the GPU and paired with the analytical chemical kinetic Jacobian software pyJac. The performance of each algorithm was evaluated by integrating thermochemical state data sampled from stochastic partially stirred reactor simulations and compared with the commonly […]
Jul, 16
Finite Element Integration with Quadrature on the GPU
We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call thread transposition to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 […]

