Posts
Oct, 8
cuDNN: Efficient Primitives for Deep Learning
We present a library that provides optimized implementations for deep learning primitives. Deep learning workloads are computationally intensive, and optimizing the kernels of deep learning workloads is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized for new processors, which makes maintaining codebases difficult over time. Similar issues have long been addressed in […]
Oct, 8
Movement Tracking in Terrain Conditions Accelerated with CUDA
The paper presents a solution to the problem of movement tracking in images acquired from video cameras monitoring outside terrain. The solution is resistant to such adverse factors as: leaves fluttering, grass waving, smoke or fog, movement of clouds etc. The presented solution is based on well known image processing methods, nevertheless the key was […]
Oct, 8
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
KBLAS is a new open source high performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory accesses, a double-buffering optimization technique is employed to overlap data motion with computation. After identifying a proper set […]
Oct, 8
A Framework for the Volumetric Integration of Depth Images
Volumetric models have become a popular representation for 3D scenes in recent years. One of the breakthroughs leading to their popularity was KinectFusion, where the focus is on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a truncated signed distance function […]
Oct, 8
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to N^5/3_m where […]
Oct, 8
Redução de Complexidade de Tempo em GPUs
Este artigo aborda a questão da construção de algoritmos paralelos e avaliação dos resultados a partir da redução de complexidade obtida pelo emprego massivo do paralelismo, em contraponto a obtenção de speedups como delineadores da construção de algoritmos paralelos. Mostra-se que, em um problema simples de pesquisa em um vetor, é mais proveitosa.
Oct, 6
International Conference on Computer and Information Technology, ICCIT 2015
Submission Deadline: 2015-02-10 Publications: Accepted papers will be published in the one of the following Journal with ISSN. *International Journal of Computer Theory and Engineering (IJCTE) (ISSN: 1793-8201) Abstracting/Indexing: Index Copernicus, Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET), Cabell’s Directories. *International […]
Oct, 6
Using Graphics Processing Unit to Accelerate Database Query Execution
One of the major problems in database management systems is handling large amounts of data while providing short response time. Problem is not only proper manner of storing records but also efficient way of processing them. In the meantime GPUs developed computational power many times greater than that offered by comparable CPUs. In our research […]
Oct, 6
Real-time Multi-view Depth Generation Using CUDA Multi-GPU
In this paper, we propose a real-time multi-view depth generation method using compute unified device architecture (CUDA) multi-graphics processing units (GPU). The objective is to generate multi-view depth maps in real-time. We employ eight color cameras and three depth cameras. After capturing multi-view color and depth data, we warp the depth information to color camera […]
Oct, 6
Accelerating NTRU Encryption with Graphics Processing Units
Lattice based cryptography is attractive for its quantum computing resistance and efficient encryption/ decryption process. However, the Big Data issue has perplexed most lattice based cryptographic systems since the overall processing is slowed down too much. This paper intends to analyze one of the major lattice-based cryptographic systems, Nth-degree truncated polynomial ring (NTRU), and accelerate […]
Oct, 6
Load Balancing in Data Warehouse – Evolution and Perspectives
The problem of load balancing is one of the crucial features in distributed data warehouse systems. In this article original load balancing algorithms are presented. The Adaptive Load Balancing Algorithms for Queries (ALBQ) and the algorithms that use grammars and learning machines in managing the ETL process. These two algorithms base the load balancing on […]
Oct, 6
Embedding GPU Computations in Hadoop
As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well […]