14673

Posts

Oct, 11

IEEE International Conference on Big Data Analysis (ICBDA), 2016

Dear Scholars and Researchers, Warmest Greetings from ICBDA2016! This is 2016 IEEE International Conference on Big Data Analysis (ICBDA 2016) conference committee. We are very pleased to tell you that ICBDA2016 will be held in Hang Zhou, China during March 12-14, 2016. Publication After a careful reviewing process, all accepted papers after proper registration and […]
Oct, 11

6th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2016

General Introduction • Index: All the accepted papers will be published in the volume of MATEC Web of Conferences (ISSN: 2261-236X), which is indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. • Famous professors as Keynote Speakers: Prof. Orawan Siriratpiriya, Chulalongkorn University (ARRIC), Thailand (the oldest university considered the most prestigious […]
Oct, 11

Meta-programming and Multi-stage Programming for GPGPUs

GPGPUs and other accelerators are becoming a mainstream asset for high-performance computing. Raising the programmability of such hardware is essential to enable users to discover, master and subsequently use accelerators in day-to-day simulations. Furthermore, tools for high-level programming of parallel architectures are becoming a great way to simplify the exploitation of such systems. For this […]
Oct, 11

GPU Accelarated Multi-Block Lattice Boltzmann Solver for Viscous Flow Problems

We developed a lattice Boltzmann Solver, which can be used for the solution of low Reynolds number flow problems. Then, we modified it to run on Graphical Processing Unit using Compute Unified Device Architecture, which is a parallel computing platform and programming model created by NVIDIA. Comparison of the results that we obtained on Graphical […]
Oct, 11

Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT […]
Oct, 11

Accelerating the D3Q19 Lattice Boltzmann Model with OpenACC and MPI

Multi-GPU implementations of the Lattice Boltzmann method are of practical interest as they allow the study of turbulent flows on large-scale simulations at high Reynolds numbers. Although programming GPUs, and in general power-efficient accelerators, typically guarantees high performances, the lack of portability in their low-level programming models implies significant efforts for maintainability and porting of […]
Oct, 11

GPU acceleration of preconditioned solvers for ill-conditioned linear systems

In this work we study the implementations of deflation and preconditioning techniques for solving ill-conditioned linear systems using iterative methods. Solving such systems can be a time-consuming process because of the jumps in the coefficients due to large difference in material properties. We have developed implementations of the iterative methods with these preconditioning techniques on […]
Oct, 8

Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit

In this article, we introduce CURRENNT, an open-source parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA’s Computed Unified Device Architecture (CUDA). CURRENNT supports uni- and bidirectional RNNs with Long Short-Term Memory (LSTM) memory cells which overcome the vanishing gradient problem. To our knowledge, CURRENNT is the first publicly […]
Oct, 8

GPU-Based Computation of 2D Least Median of Squares with Applications to Fast and Robust Line Detection

The 2D Least Median of Squares (LMS) is a popular tool in robust regression because of its high breakdown point: up to half of the input data can be contaminated with outliers without affecting the accuracy of the LMS estimator. The complexity of 2D LMS estimation has been shown to be $Omega(n^2)$ where $n$ is […]
Oct, 8

Kinematic Modelling of Disc Galaxies using Graphics Processing Units

With large-scale Integral Field Spectroscopy (IFS) surveys of thousands of galaxies currently under-way or planned, the astronomical community is in need of methods, techniques and tools that will allow the analysis of huge amounts of data. We focus on the kinematic modelling of disc galaxies and investigate the potential use of massively parallel architectures, such […]
Oct, 8

Solving the Quadratic Assignment Problem on heterogeneous environment (CPUs and GPUs) with the application of Level 2 Reformulation and Linearization Technique

The Quadratic Assignment Problem, QAP, is a classic combinatorial optimization problem, classified as NP-hard and widely studied. This problem consists in assigning N facilities to N locations obeying the relation of 1 to 1, aiming to minimize costs of the displacement between the facilities. The application of Reformulation and Linearization Technique, RLT, to the QAP […]
Oct, 8

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

OmpSs is a task-parallel programming model consisting of a reduced collection of OpenMP-like directives, a front-end compiler, and a runtime system. This directive-based programming interface helps developers accelerate their application’s execution, e.g. in a cluster equipped with graphics processing units (GPUs), with a low programming effort. On the other hand, the virtualization package rCUDA provides […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: