Jul, 11

Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast

PPMLR-MHD is a new magnetohydrodynamics (MHD) model used to simulate the interactions of the solar wind with the magnetosphere, which has been proved to be the key element of the space weather cause-and-effect chain process from the Sun to Earth. Compared to existing MHD methods, PPMLR-MHD achieves the advantage of high order spatial accuracy and […]
Jul, 11

Deep Learning for Mortgage Risk

This paper analyzes multi-period mortgage risk at loan and pool levels using an unprecedented dataset of over 120 million prime and subprime mortgages originated across the United States between 1995 and 2014, which includes the individual characteristics of each loan, monthly updates on loan performance over the life of a loan, and a number of […]
Jul, 11

Fast Predictive Image Registration

We present a method to predict image deformations based on patch-wise image appearance. Specifically, we design a patch-based deep encoder-decoder network which learns the pixel/voxel-wise mapping between image appearance and registration parameters. Our approach can predict general deformation parameterizations, however, we focus on the large deformation diffeomorphic metric mapping (LDDMM) registration model. By predicting the […]
Jul, 11

[Serbian] The Methods and Procedures for Accelerating Operations and Queries in Large Database Systems and Data Warehouse (Big Data Systems)

The research topic of this doctoral thesis is the possibility of establishing a model for big data systems with corresponding software- hardware architectures to support sensor networks and IoT devices. The developed model is based on energy efficient, heterogeneous, massively parallelised SoC hardware platforms, with the support of software application architecture (such as openCL) for […]
Jul, 8

Torchnet: An Open-Source Platform for (Deep) Learning Research

Torch 7 is a scientific computing platform that supports both CPU and GPU computation, has a light-weight wrapper in a simple scripting language, and provides fast implementations of common algebraic operations. It has become one of the main frameworks for research in (deep) machine learning. Torch does, however, not provide abstractions and boilerplate code for […]
Jul, 8

Using the pyMIC Offload Module in PyFR

PyFR is an open-source high-order accurate computational fluid dynamics solver for unstructured grids. It is designed to efficiently solve the compressible Navier-Stokes equations on a range of hardware platforms, including GPUs and CPUs. In this paper we will describe how the Python Offload Infrastructure for the Intel Many Integrated Core Architecture (pyMIC) was used to […]
Jul, 8

TTC: A Tensor Transposition Compiler for Multiple Architectures

We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system’s peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel’s Knights Corner […]
Jul, 8

GPU Based Detection of Topological Changes in Voronoi Diagrams

The Voronoi diagrams are an important tool having theoretical and practical applications in a large number of fields. We present a new procedure, implemented as a set of CUDA kernels, which detects, in a general and efficient way, topological changes in case of dynamic Voronoi diagrams whose generating points move in time. The solution that […]
Jul, 8

Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation

Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a wide range of applications. They are found in most computing systems, and mobile devices are no exception. The recent availability of programming APIs such as OpenCL for mobile GPUs promises to open up new types of applications on these devices. However, producing […]
Jul, 8

A Survey of Techniques for Designing and Managing CPU Register File

Processor register file (RF) is an important microarchitectural component used for storing operands and results of instructions. The design and operation of RF has crucial impact on the performance, energy efficiency and reliability of the processor and hence, several techniques have been recently proposed to manage RF in modern processors. In this paper, we present […]
Jul, 5

1st International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation (TAPEMS), 2016

Performance and an aspect of it, energy efficiency, has become a key issue in both high performance and embedded computing. The objective of the 1st TAPEMS International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation is to bring together researchers and practitioners from academia and industry to discuss current advances and trends in […]
Jul, 5

International Conference on Intelligent Computing and Applications (ICICA), 2017

Publication: Submissions will be peer reviewed and evaluated based on originality, relevance to conference, contributions, and presentation. Accepted papers of ICICA 2017 will be collected in one of the following publications. A: Conference Proceedings Indexing/Abstracting: DBLP, ProQuest, INSPEC, CNKI, EI Compendex, Scopus etc. B: Journal of Computers (ISSN: 1796-203X) Indexing/Abstracting: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, […]
