Posts
Oct, 28
Matrix inversion speed up with CUDA
In this project several mathematic algorithms are developed to obtain a matrix inversion method – that combines CUDA’s parallel architecture and MATLAB which is actually faster than MATLAB’s built in inverse matrix function. This matrix inversion method is intended to be used for image reconstruction as a faster alternative to iterative methods with a comparable […]
Oct, 28
Parallel Random Numbers: As Easy as 1, 2, 3
Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern […]
Oct, 28
Programming Massively Parallel Processors with CUDA (audio course)
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number […]
Oct, 28
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing
Domain-specific languages offer a solution to the performance and the productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies the process of building embedded parallel DSLs. DSL developers can implement domain-specific operations by extending the DSL framework, which provides static optimizations and code generation for heterogeneous hardware. The Delite runtime automatically schedules and […]
Oct, 28
Dax: Data Analysis at Extreme
Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]
Oct, 28
Parallel Computing the Longest Common Subsequence (LCS) on GPUs: Efficiency and Language Suitability
Sequence alignment is one of the most used tools in bioinformatic to find the resemblance among many sequences like ADN, ARN, amino acids. The longest common subsequence (LCS) of biological sequences is an essential and effective technique in sequence alignment. For solving the LCS problem, we resort to dynamic programming approach. Due to the growth […]
Oct, 27
Techniques to maximize memory bandwidth on the Rigel compute accelerator
The Rigel compute accelerator has been developed to explore alternative architectures for massively parallel processor chips. Currently GPUs that use wide SIMD are the primary implementations in this space. Many applications targeted to this space are performance limited by the memory all, so comparing the memory system performance of Rigel and GPUs is desirable. Memory […]
Oct, 27
Efficient Simulation of Ocean and Land Scenes Based on Digital Earth
Efficient and realistic simulation of ocean and land scenes is one of the hotspot and difficult problems of computer graphic. Most simulation of the recent ocean and land scenes is based on plane and is in a limited region. They didn’t consider the factors of earth curvature, nor the edge between ocean and land, can’t […]
Oct, 27
Sample distribution shadow maps
This paper introduces Sample Distribution Shadow Maps (SDSMs), a new algorithm for hard and soft-edged shadows that greatly reduces undersampling, oversampling, and geometric aliasing errors compared to other shadow map techniques. SDSMs fall into the space between scene-dependent, variable-performance shadow algorithms and scene-independent, fixed-performance shadow algorithms. They provide a fully automated solution to shadow map […]
Oct, 27
Self-calibration of geometric and radiometric parameters for cone-beam computed tomography
Thanks to the advances in parallel processing hardware, iterative algorithms for cone beam reconstruction are now available with computation times acceptable for clinical use. At the same time they are able to accomodate more accurately the physical effects underlying the X-Ray imaging process. Many parameters are involved, which need to be precisely calibrated in order […]
Oct, 27
Development of a volume rendering system using 3D texture compression techniques on general-purpose personal computers
In this paper, we present the development of a highspeed volume rendering system that combines 3D texture compression and parallel programming techniques for rendering multiple high-resolution 3D images obtained with medical or industrial CT. The 3D texture compression algorithm (DXT5) provides extremely high efficiency since it reduces the memory consumption to 1/4 of the original […]
Oct, 27
The CUDA implementation of the method of lines for the curvature dependent flows
We study the use of a GPU for the numerical approximation of the curvature dependent flows of graphs – the mean-curvature flow and the Willmore flow. Both problems are often applied in image processing where fast solvers are required. We approximate these problems using the complementary finite volume method combined with the method of lines. […]