Applied Mathematics and Computational Science Program
For more information visit: https://cemse.kaust.edu.sa/amcs
Recent Submissions

Weakstrong uniqueness for MaxwellStefan systems(arXiv, 20211011) [Preprint]The weakstrong uniqueness for MaxwellStefan systems and some generalized systems is proved. The corresponding parabolic crossdiffusion equations are considered in a bounded domain with noflux boundary conditions. The key points of the proofs are various inequalities for the relative entropy associated to the systems and the analysis of the spectrum of a quadratic form capturing the frictional dissipation. The latter task is complicated by the singular nature of the diffusion matrix. This difficulty is addressed by proving its positive definiteness on a subspace and using the BottDuffin matrix inverse. The generalized MaxwellStefan systems are shown to cover several known crossdiffusion systems for the description of tumor growth and physical vapor deposition processes.

Efficient importance sampling for large sums of independent and identically distributed random variables(Statistics and Computing, Springer Science and Business Media LLC, 20211011) [Article]We discuss estimating the probability that the sum of nonnegative independent and identically distributed random variables falls below a given threshold, i.e., P(∑i=1NXi≤γ), via importance sampling (IS). We are particularly interested in the rare event regime when N is large and/or γ is small. The exponential twisting is a popular technique for similar problems that, in most cases, compares favorably to other estimators. However, it has some limitations: (i) It assumes the knowledge of the momentgenerating function of Xi and (ii) sampling under the new IS PDF is not straightforward and might be expensive. The aim of this work is to propose an alternative IS PDF that approximately yields, for certain classes of distributions and in the rare event regime, at least the same performance as the exponential twisting technique and, at the same time, does not introduce serious limitations. The first class includes distributions whose probability density functions (PDFs) are asymptotically equivalent, as x→ 0 , to bxp, for p>  1 and b> 0. For this class of distributions, the Gamma IS PDF with appropriately chosen parameters retrieves approximately, in the rare event regime corresponding to small values of γ and/or large values of N, the same performance of the estimator based on the use of the exponential twisting technique. In the second class, we consider the Lognormal setting, whose PDF at zero vanishes faster than any polynomial, and we show numerically that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting IS PDF. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large N and/or small γ.

Statistical learning for fluid flows: Sparse Fourier divergencefree approximations(Physics of Fluids, AIP Publishing, 20210927) [Article]We reconstruct the velocity field of incompressible flows given a finite set of measurements. For the spatial approximation, we introduce the Sparse Fourier divergencefree approximation based on a discrete L2 projection. Within this physicsinformed type of statistical learning framework, we adaptively build a sparse set of Fourier basis functions with corresponding coefficients by solving a sequence of minimization problems where the set of basis functions is augmented greedily at each optimization problem. We regularize our minimization problems with the seminorm of the fractional Sobolev space in a Tikhonov fashion. In the Fourier setting, the incompressibility (divergencefree) constraint becomes a finite set of linear algebraic equations. We couple our spatial approximation with the truncated singularvalue decomposition of the flow measurements for temporal compression. Our computational framework thus combines supervised and unsupervised learning techniques. We assess the capabilities of our method in various numerical examples arising in fluid mechanics.

DTi2Vec: Drug–target interaction prediction using network embedding and ensemble learning(Journal of Cheminformatics, Springer Science and Business Media LLC, 20210922) [Article]AbstractDrug–target interaction (DTI) prediction is a crucial step in drug discovery and repositioning as it reduces experimental validation costs if done right. Thus, developing insilico methods to predict potential DTI has become a competitive research niche, with one of its main focuses being improving the prediction accuracy. Using machine learning (ML) models for this task, specifically networkbased approaches, is effective and has shown great advantages over the other computational methods. However, ML model development involves upstream handcrafted feature extraction and other processes that impact prediction accuracy. Thus, networkbased representation learning techniques that provide automated feature extraction combined with traditional ML classifiers dealing with downstream link prediction tasks may be bettersuited paradigms. Here, we present such a method, DTi2Vec, which identifies DTIs using network representation learning and ensemble learning techniques. DTi2Vec constructs the heterogeneous network, and then it automatically generates features for each drug and target using the nodes embedding technique. DTi2Vec demonstrated its ability in drug–target link prediction compared to several stateoftheart networkbased methods, using four benchmark datasets and largescale data compiled from DrugBank. DTi2Vec showed a statistically significant increase in the prediction performances in terms of AUPR. We verified the "novel" predicted DTIs using several databases and scientific literature. DTi2Vec is a simple yet effective method that provides high DTI prediction performance while being scalable and efficient in computation, translating into a powerful drug repositioning tool.

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods(arXiv, 20210922) [Preprint]In this study, we demonstrate that the norm test and inner product/orthogonality test presented in [1] are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if e2 = θ2 + ν2 with specific choices of θ and ν. Here, controls the relative statistical error of the norm of the gradient while θ and ν control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if θ and ν are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if e2 = θ2 + ν2. Finally, we present two stochastic optimization problems to illustrate our results.

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods(arXiv, 20210922) [Preprint]In this study, we demonstrate that the norm test and inner product/orthogonality test presented in [1] are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if e2 = θ2 + ν2 with specific choices of θ and ν. Here, controls the relative statistical error of the norm of the gradient while θ and ν control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if θ and ν are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if e2 = θ2 + ν2. Finally, we present two stochastic optimization problems to illustrate our results.

ThreeDimensional Electromagnetic Void Space(Physical Review Letters, American Physical Society (APS), 20210917) [Article]Electromagnetic void space is a medium, while geometrically occupying a finite volume of space, optically equivalent to an infinitesimal point, in which electromagnetic waves do not experience any phase accumulation. Here, we report the first realization of threedimensional (3D) electromagnetic void space by an alldielectric photonic crystal possessing vanishing permittivity and permeability simultaneously. The 3D electromagnetic void space offers distinctive functionalities inaccessible to its 2D or acoustic counterparts because of the fundamental changes in topology, which comes from the ascension of dimensionality as well as the transverse nature of electromagnetic waves. In particular, we demonstrate, both theoretically and experimentally, that the transmission through such a 3D void space is unaffected by its inner boundaries, but highly sensitive to the outer boundaries. This enables many applications such as the impurity “antidoping” effect, outerboundarycontrolled switching, and 3D perfect wave steering. Our work paves a road toward 3D exotic optics of an optically infinitesimal point.

Maximum principle preserving space and time flux limiting for Diagonally Implicit RungeKutta discretizations of scalar convectiondiffusion equations(arXiv, 20210917) [Preprint]We provide a framework for highorder discretizations of nonlinear scalar convectiondiffusion equations that satisfy a discrete maximum principle. The resulting schemes can have arbitrarily high order accuracy in time and space, and can be stable and maximumprinciplepreserving (MPP) with no step size restriction. The schemes are based on a twotiered limiting strategy, starting with a highorder limiterbased method that may have small oscillations or maximumprinciple violations, followed by an additional limiting step that removes these violations while preserving high order accuracy. The desirable properties of the resulting schemes are demonstrated through several numerical examples.

Minimizing Depth of Decision Trees with Hypotheses(Springer International Publishing, 20210916) [Conference Paper]In this paper, we consider decision trees that use both conventional queries based on one attribute each and queries based on hypotheses about values of all attributes. Such decision trees are similar to ones studied in exact learning, where membership and equivalence queries are allowed. We present dynamic programming algorithms for minimization of the depth of above decision trees and discuss results of computer experiments on various data sets and randomly generated Boolean functions.

H2Opus: A distributedmemory multiGPU software package for nonlocal operators(arXiv, 20210912) [Preprint]Hierarchical $\mathcal{H}^2$matrices are asymptotically optimal representations for the discretizations of nonlocal operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and operator application makes them particularly suited for largescale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow largescale problems to be represented. In this paper, we present highperformance, distributedmemory GPUaccelerated algorithms and implementations for matrixvector multiplication and matrix recompression of hierarchical matrices in the $\mathcal{H}^2$ format. The algorithms are a new module of H2Opus, a performanceoriented package that supports a broad variety of $\mathcal{H}^2$matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for interprocess communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show nearideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrixvector multiplication, and 670 Gflops/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigridpreconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.

Particle approximation of onedimensional MeanFieldGames with local interactions(arXiv, 20210906) [Preprint]We study a particle approximation for onedimensional firstorder MeanFieldGames (MFGs) with local interactions with planning conditions. Our problem comprises a system of a HamiltonJacobi equation coupled with a transport equation. As we deal with the planning problem, we prescribe initial and terminal distributions for the transport equation. The particle approximation builds on a semidiscrete variational problem. First, we address the existence and uniqueness of a solution to the semidiscrete variational problem. Next, we show that our discretization preserves some previously identified conserved quantities. Finally, we prove that the approximation by particle systems preserves displacement convexity. We use this last property to establish uniform estimates for the discrete problem. We illustrate our results for the discrete problem with numerical examples.

A duality approach to a price formation MFG model(arXiv, 20210904) [Preprint]We study the connection between the AubryMather theory and a meanfield game (MFG) priceformation model. We introduce a framework for Mather measures that is suited for constrained timedependent problems in R. Then, we propose a variational problem on a space of measures, from which we obtain a duality relation involving the MFG problem examined in [36].

An O(N) algorithm for computing expectation of Ndimensional truncated multivariate normal distribution I: fundamentals(Advances in Computational Mathematics, Springer Science and Business Media LLC, 20210901) [Article]In this paper, we present the fundamentals of a hierarchical algorithm for computing the Ndimensional integral ϕ(a,b;A)=∫abH(x)f(xA)dx representing the expectation of a function H(X) where f(xA) is the truncated multivariate normal (TMVN) distribution with zero mean, x is the vector of integration variables for the Ndimensional random vector X, A is the inverse of the covariance matrix Σ, and a and b are constant vectors. The algorithm assumes that H(x) is “lowrank” and is designed for properly clustered X so that the matrix A has “lowrank” blocks and “lowdimensional” features. We demonstrate the divideandconquer idea when A is a symmetric positive definite tridiagonal matrix and present the necessary building blocks and rigorous potential theory–based algorithm analysis when A is given by the exponential covariance model. The algorithm overall complexity is O(N) for Ndimensional problems, with a prefactor determined by the rank of the offdiagonal matrix blocks and number of effective variables. Very high accuracy results for N as large as 2048 are obtained on a desktop computer with 16G memory using the fast Fourier transform (FFT) and nonuniform FFT to validate the analysis. The current paper focuses on the ideas using the simple yet representative examples where the offdiagonal matrix blocks are rank 1 and the number of effective variables is bounded by 2, to allow concise notations and easier explanation. In a subsequent paper, we discuss the generalization of current scheme using the sparse grid technique for higher rank problems and demonstrate how all the moments of kth order or less (a total of O(Nk) integrals) can be computed using O(Nk) operations for k ≥ 2 and O(NlogN) operations for k = 1.

Quantifying uncertainty with a derivative tracking SDE model and application to wind power forecast data(Statistics and Computing, Springer, 20210901) [Article]We develop a datadriven methodology based on parametric Itô’s Stochastic Differential Equations (SDEs) to capture the real asymmetric dynamics of forecast errors, including the uncertainty of the forecast at time zero. Our SDE framework features timederivative tracking of the forecast, timevarying meanreversion parameter, and an improved statedependent diffusion term. Proofs of the existence, strong uniqueness, and boundedness of the SDE solutions are shown by imposing conditions on the timevarying meanreversion parameter. We develop the structure of the drift term based on sound mathematical theory. A truncation procedure regularizes the prediction function to ensure that the trajectories do not reach the boundaries almost surely in a finite time. Inference based on approximate likelihood, constructed through the momentmatching technique both in the original forecast error space and in the Lamperti space, is performed through numerical optimization procedures. We propose a fixedpoint likelihood optimization approach in the Lamperti space. Another novel contribution is the characterization of the uncertainty of the forecast at time zero, which turns out to be crucial in practice. We extend the model specification by considering the length of the unknown time interval preceding the first time a forecast is provided through an additional parameter in the density of the initial transition. All the procedures are agnostic of the forecasting technology, and they enable comparisons between different forecast providers. We apply our SDE framework to model historical Uruguayan normalized wind power production and forecast data between April and December 2019. Sharp empirical confidence bands of wind power production forecast error are obtained for the bestselected model.

Generalized parallel tempering on Bayesian inverse problems(Statistics and Computing, Springer Science and Business Media LLC, 20210830) [Article]AbstractIn the current work we present two generalizations of the Parallel Tempering algorithm in the context of discretetime Markov chain Monte Carlo methods for Bayesian inverse problems. These generalizations use statedependent swapping rates, inspired by the socalled continuous time Infinite Swapping algorithm presented in Plattner et al. (J Chem Phys 135(13):134111, 2011). We analyze the reversibility and ergodicity properties of our generalized PT algorithms. Numerical results on sampling from different target distributions, show that the proposed methods significantly improve sampling efficiency over more traditional sampling algorithms such as Random Walk Metropolis, preconditioned Crank–Nicolson, and (standard) Parallel Tempering.

SpaceFractional Diffusion with Variable Order and Diffusivity: Discretization and Direct Solution Strategies(arXiv, 20210829) [Preprint]We consider the multidimensional spacefractional diffusion equations with spatially varying diffusivity and fractional order. Significant computational challenges are encountered when solving these equations due both to the kernel singularity in the fractional integral operator and to the resulting dense discretized operators, which quickly become prohibitively expensive to handle because of their memory and arithmetic complexities. In this work, we present a singularityaware discretization scheme that regularizes the singular integrals through a singularity subtraction technique adapted to the spatial variability of diffusivity and fractional order. This regularization strategy is conveniently formulated as a sparse matrix correction that is added to the dense operator, and is applicable to different formulations of fractional diffusion equations. We also present a block low rank representation to handle the dense matrix representations, by exploiting the ability to approximate blocks of the resulting formally dense matrix by low rank factorizations. A Cholesky factorization solver operates directly on this representation using the low rank blocks as its atomic computational tiles, and achieves high performance on multicore hardware. Numerical results show that the singularity treatment is robust, substantially reduces discretization errors, and attains the firstorder convergence rate allowed by the regularity of the solutions. They also show that considerable savings are obtained in storage ($O(N^{1.5})$) and computational cost ($O(N^2)$) compared to dense factorizations. This translates to ordersofmagnitude savings in memory and time on multidimensional problems, and shows that the proposed methods offer practical tools for tackling large nonlocal fractional diffusion simulations.

FlowGuided Video Inpainting with Scene Templates(arXiv, 20210829) [Preprint]We consider the problem of filling in missing spatiotemporal regions of a video. We provide a novel flowbased solution by introducing a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images. We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings. This ensures consistency of the frametoframe flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. The template is mapped to the missing regions in the video by a new L2L1 interpolation scheme, creating crisp inpaintings and reducing common blur and distortion artifacts. We show on two benchmark datasets that our approach outperforms stateoftheart quantitatively and in user studies.

H2OPUSTLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation(arXiv, 20210826) [Preprint]Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each offdiagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many datasparse dense operators that appear in practical applications, where substantial compression and a much smaller memory footprint can be achieved. TLR matrices are a compromise between the simplicity of a regular perfectlystrided data structure and the optimal complexity of the unbalanced trees of hierarchically low rank matrices, and provide a convenient performancetuning parameter through their tile size that can be proportioned to take into account the cache size where the tiles reside in the memory hierarchy. There are currently no highperformance algorithms that can generate Cholesky and $LDL^T$ factorizations, particularly on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the adaptive rank distribution of the tiles that causes an irregular work pattern for the processing cores. In this work, we develop a dynamic batching operation and combine it with batched adaptive randomized approximations to achieve high performance both on GPUs and CPUs. Our implementation attains over 1.2 TFLOP/s in double precision on the V100 GPU, and is limited by the performance of batched GEMM operations. The Cholesky factorization of covariance matrix of size $N = 131K$ arising in spatial statistics can be factored to an accuracy $\epsilon=10^{2}$ in just a few seconds. We believe the proposed GEMMcentric algorithm allows it to be readily ported to newer hardware such as the tensor cores that are optimized for small GEMM operations.

Outsmarting the Atmospheric Turbulence for GroundBased Telescopes Using the Stochastic LevenbergMarquardt Method(Springer International Publishing, 20210825) [Conference Paper]One of the main challenges for groundbased optical astronomy is to compensate for atmospheric turbulence in near realtime. The goal is to obtain images as close as possible to the diffraction limit of the telescope. This challenge is addressed on the latest generation of giant optical telescopes by deploying multiconjugate adaptive optics (MCAO) systems performing predictive tomography of the turbulence and multilayer compensation. Such complex systems require a high fidelity estimate of the turbulence profile above the telescope, to be updated regularly during operations as turbulence conditions evolve. In this paper, we modify the traditional LevenbergMarquardt (LM) algorithm by considering stochastically chosen subsystems of the full problem to identify the required parameters efficiently, while coping with the realtime challenge. While LM operates on the full set data samples, the resulting Stochastic LM (SLM) method randomly selects subsamples to compute corresponding approximate gradients and Hessians. Hence, SLM reduces the algorithmic complexity per iteration and shortens the overall time to solution, while maintaining LM’s numerical robustness. We present a new convergence analysis for SLM, implement the algorithm with optimized GPU kernels, and deploy it on sharedmemory systems with multiple GPU accelerators. We assess SLM in the adaptive optics system configurations in the context of the MCAOAssisted Visible Imager & Spectrograph (MAVIS) instrument for the Very Large Telescope (VLT). We demonstrate performance superiority of SLM over the traditional LM algorithm and the classical stochastic firstorder methods. At the scale of VLT AO, SLM finishes the optimization process and accurately retrieves the parameters (e.g., turbulence strength and wind speed profiles) in less than a second using up to eight NVIDIA A100 GPUs, which permits high acuity realtime throughput over a night of observations.

Hazard assessment of oil spills along the main shipping lane in the Red Sea(Scientific Reports, Springer Science and Business Media LLC, 20210823) [Article]AbstractThis study investigates the risk from oil spills along the main shipping lane in the Red Sea based upon oil spill model trajectories forced by the outputs of validated high resolution regional metocean data. Following the intraannual variations in the metocean conditions, the results are presented by classifying the basin into three regions: northern, central and southern Red Sea. The maximum distance traveled by the slick is presented for 1, 2, 5, 10, 14 and 20 days after the commencement of a spill. Different measures of hazard assessment in terms of the concentration of beached oil alongside the corresponding probability maps are also analyzed. The volume fractions of beached, dispersed and evaporated oil, 20 days after the commencement of a spill are quantified. The Red Sea general circulation is characterized by rich mesoscale eddies, which appear to be the most prevailing dynamics in oil transport in the basin. Two case events are analyzed to closely examine the effects of the mesoscale circulations on the fate of spilled oil. The results of this study provide a comprehensive assessment of oil spill hazards in the Red Sea, stemming its main shipping lane and identifies the areas at high risk that require timely mitigation strategies.