Posts
Oct, 10
Code Refinement of Stencil Codes
A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: Compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those […]
Oct, 10
Parallel implementation of linear repetitive processes identification using subspace algorithms
This paper presents a new parallel approach to identification of linear repetitive processes based on subspace algorithms. Parallel realizations of these algorithms are tested on various graphic cards that use NVIDIA CUDA technology. The paper describes implementation of subspace identification algorithms and their parallel speedup, efficiency, throughput, and delay. The parallel approach to the identification […]
Oct, 10
Accelerating Protein Coordinate Conversion using GPUs
For modeling proteins in conformational states, two methods of representation are used: internal coordinates and Cartesian coordinates. Each of these representations contain a large amount of structural and simulation information. Different processing steps require one or the other representation. Our goal is to rapidly translate between these coordinate spaces so that a scientist can choose […]
Oct, 10
FDTD on Distributed Heterogeneous Multi-GPU Systems
Finite-Difference Time-Domain (FDTD) is a popular technique for modeling computational electrodynamics, and is used within many research areas, such as the development of antennas, ultrasound imaging, and seismic wave propagation. Simulating large domains can however be very compute and memory demanding, which has motivated the use of cluster computing, and lately also the use of […]
Oct, 8
cuDNN: Efficient Primitives for Deep Learning
We present a library that provides optimized implementations for deep learning primitives. Deep learning workloads are computationally intensive, and optimizing the kernels of deep learning workloads is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized for new processors, which makes maintaining codebases difficult over time. Similar issues have long been addressed in […]
Oct, 8
Movement Tracking in Terrain Conditions Accelerated with CUDA
The paper presents a solution to the problem of movement tracking in images acquired from video cameras monitoring outside terrain. The solution is resistant to such adverse factors as: leaves fluttering, grass waving, smoke or fog, movement of clouds etc. The presented solution is based on well known image processing methods, nevertheless the key was […]
Oct, 8
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
KBLAS is a new open source high performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory accesses, a double-buffering optimization technique is employed to overlap data motion with computation. After identifying a proper set […]
Oct, 8
A Framework for the Volumetric Integration of Depth Images
Volumetric models have become a popular representation for 3D scenes in recent years. One of the breakthroughs leading to their popularity was KinectFusion, where the focus is on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a truncated signed distance function […]
Oct, 8
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to N^5/3_m where […]
Oct, 8
Redução de Complexidade de Tempo em GPUs
Este artigo aborda a questão da construção de algoritmos paralelos e avaliação dos resultados a partir da redução de complexidade obtida pelo emprego massivo do paralelismo, em contraponto a obtenção de speedups como delineadores da construção de algoritmos paralelos. Mostra-se que, em um problema simples de pesquisa em um vetor, é mais proveitosa.
Oct, 6
International Conference on Computer and Information Technology, ICCIT 2015
Submission Deadline: 2015-02-10 Publications: Accepted papers will be published in the one of the following Journal with ISSN. *International Journal of Computer Theory and Engineering (IJCTE) (ISSN: 1793-8201) Abstracting/Indexing: Index Copernicus, Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET), Cabell’s Directories. *International […]
Oct, 6
Using Graphics Processing Unit to Accelerate Database Query Execution
One of the major problems in database management systems is handling large amounts of data while providing short response time. Problem is not only proper manner of storing records but also efficient way of processing them. In the meantime GPUs developed computational power many times greater than that offered by comparable CPUs. In our research […]