Posts
May, 16
A Fast and Rigorously Parallel Surface Voxelization Technique for GPU-Accelerated CFD Simulations
This paper presents a fast surface voxelization technique for the mapping of tessellated triangular surface meshes to uniform and structured grids that provide a basis for CFD simulations with the lattice Boltzmann method (LBM). The core algorithm is optimized for massively parallel execution on graphics processing units (GPUs) and is based on a unique dissection […]
May, 16
Efficient Resource Scheduling for Big Data Processing on Accelerator-based Heterogeneous Systems
The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. In this paper, we examine the heterogeneity in modern computing systems, particularly, how to achieve a good level of resource utilization and fairness, when multiple tasks with different load and computation ratios are […]
May, 16
Multi-GPU Support on Single Node Using Directive-Based Programming Model
Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately […]
May, 16
Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core – MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, […]
May, 16
Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions
We describe a technique for drawing values from discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete […]
May, 15
Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more […]
May, 15
MRCUDA: MapReduce Acceleration Framework Based on GPU
GPU programming model for general purpose computing is complex and difficult to be maintained. A MapReduce acceleration framework named MRCUDA is designed and implemented in this paper. There are four loosely coupled stages in MRCUDA, including Pre-Processing, Map, Group and Reduce, which can support flexible configurations for different applications. In order to take full advantage […]
May, 15
The 3D Flow Field Around an Embedded Planet
Understanding the 3D flow topology around a planet embedded in its natal disk is crucial to the study of planet formation. 3D modifications to the well-studied 2D flow topology have the potential to resolve longstanding problems in both planet migration and accretion. We present a detailed analysis of the 3D isothermal flow field around a […]
May, 15
Adaptive discrete cosine transform-based image compression method on a heterogeneous system platform using Open Computing Language
Discrete cosine transform (DCT) is one of the major operations in image compression standards and it requires intensive and complex computations. Recent computer systems and handheld devices are equipped with high computing capability devices such as a general-purpose graphics processing unit (GPGPU) in addition to the traditional multicores CPU. We develop an optimized parallel implementation […]
May, 15
Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores
This short note regards a comparison of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of a spiking neural network simulator (DPSNN-STDP) distributed on MPI processes when executed either on an embedded platform (based on a dual socket quad-core ARM platform) or a server platform (INTEL-based quad-core dual socket platform). […]
May, 15
5th International Conference on Computer and Communication Devices (ICCCD), 2015
Publication: All accepted papers will be published in one of the indexed Journals after being selected. * International Journal of Future Computer and Communication (IJFCC, ISSN: 2010-3751) Abstracting/ Indexing: Google Scholar, Engineering & Technology Digital Library, and Crossref, DOAJ, Electronic Journals Library, EI (INSPEC, IET). * International Journal of Computer and Communication Engineering (IJCCE, ISSN: […]
May, 15
5th International Conference on Robotics and Automation Sciences (ICRAS, former ICSIA), 2015
Publication: All the papers of ICRAS 2015 will be indexed by Ei Compendex and ISI. Topics: AREA 1: Intelligent Control Systems and Optimization • Genetic Algorithms • Fuzzy Control • Decision Support Systems • Machine Learning in Control Applications • Knowledge-based Systems Applications • Hybrid Learning Systems • Distributed Control Systems • Evolutionary Computation […]