Posts
Nov, 28
International Conference on Advances in Image Processing (ICAIP), 2018
The International Conference on Advances in Image Processing (ICAIP) is mainly organised by International Academy of Computing Technology (IACT). The aim of ICAIP is to become an international forum where researchers, scientists, scholars and students can share their experiences, ideas and research results related to Advances in Image Processing. Publication All accepted and presented papers […]
Nov, 26
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In the training of deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution and stochastic gradient descent (SGD), but the running performance of different frameworks might be different even running the […]
Nov, 26
High Performance Streaming Smith-Waterman Implementation with Implicit Synchronization on Intel FPGA using OpenCL
The Smith-Waterman algorithm is widely used in bioinformatics and is often used as a benchmark of FPGA performance. Here we present our highly optimized SmithWaterman implementation on Intel FPGAs using OpenCL. Our implementation is both faster and more efficient than other current Smith-Waterman implementations, obtaining a theoretical performance of 214 GCUPS. Moreover, due to the […]
Nov, 26
Computing the distance between two finite element solutions defined on different 3D meshes on a GPU
This article introduces a new method to efficiently compute the distance (i.e., L^p norm of the difference) between two functions supported by two different meshes of the same 3D domain. The functions that we consider are typically finite element solutions discretized in different function spaces supported by meshes that are potentially completely unrelated. Our method […]
Nov, 26
Efficient Target and Application Specific Selection and Ordering of Compiler Passes
Programmers usually rely on one from a set of optimizing compiler optimization level flags shipped with the compiler they are using to compile their source code. Those compiler flags represent fixed compiler pass sequences, and therefore in some situations better performance and/or other metrics such as code size can be achieved if using compiler sequences […]
Nov, 26
GPU Pro 7: Advanced Rendering
The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro 7: Advanced Rendering Techniques assembles a […]
Nov, 21
A survey on graphic processing unit computing for large-scale data mining
General purpose computation using Graphic Processing Units (GPUs) is a well-established research area focusing on high-performance computing solutions for massively parallelizable and time-consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high-speed volumes of information in the context of the big data era. GPUs have successfully improved the […]
Nov, 21
Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR
Given their massively parallel computing capabilities heterogeneous architectures comprised of CPUs and accelerators have been increasingly used to speed-up scientific and engineering applications. Nevertheless, programming such architectures is a challenging task for most non-expert programmers as typical accelerator programming languages (e.g. CUDA and OpenCL) demand a thoroughly understanding of the underlying hardware to enable an […]
Nov, 21
Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations
The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The naive approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact […]
Nov, 21
GPU Parallelization for Unstructured Sparse Matrix Problems with OpenMP 4.5 and OpenACC
The effective use of parallelized hardware is an important goal of today’s computer developments. Nvidia GPUs are an important footing in this context. While CUDA implemented algorithms focus on detailed optimized usage of GPU elements the pragma directive parallelization targets GPU computation for a broader community. In this paper we focus on the implementation of […]
Nov, 21
Unified Deep Learning with CPU, GPU, and FPGA Technologies
Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider. This paper explores the challenges […]
Nov, 16
Hydra: a C++11 framework for data analysis in massively parallel platforms
Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA […]