Student Presentations – MSSISS 2024

Oral Presentation 1: Applications

March 28th 9:15 AM – 12:00 PM @ East Conference Room

Abstract

Hyperlinks encompass polyadic interactions among entities beyond dyadic relations. Despite the growing research interest in hyperlink modeling, most existing methodologies have significant limitations, including a heavy reliance on uniform restrictions of hyperlink orders and the inability to account for repeated observations of identical hyperlinks. We introduce a novel and general latent embedding approach that tackles these challenges through the integration of latent embeddings, vertex degree heterogeneity parameters, and an order-adjusting parameter. Theoretically, we investigate identification conditions for the latent embeddings and associated parameters and establish convergence rates of their estimators along with asymptotic normality. Computationally, we employ a universal singular value thresholding initialization and a projected gradient ascent algorithm for parameter estimation. A comprehensive simulation study is performed to demonstrate the effectiveness of the algorithms and validate the theoretical findings. Moreover, an application involving a co-citation hypergraph network is used to further illustrate the advantages of the proposed method.

Oral Presentation 3: Methods/Theory

March 28th 1:00 PM – 3:00 PM @ Amphitheatre

Oral Presentation 4: Applications

March 29th 9:00 AM – 12:00 PM @ East Conference Room

Abstract

Multi-fidelity variance-reduction techniques (e.g., multi-fidelity Monte Carlo [1], approximate control variates [2,3], and multilevel BLUEs [4]) have seen considerable attention in recent years, in many cases providing orders-of-magnitude computational savings in estimating statistics of a high-fidelity model. These methods require the covariance matrix across model fidelities, which is usually estimated via pilot sampling or reinforcement-learning techniques [5] in conjunction with the sample covariance formula. Depending on the model ensemble available, this covariance estimation can be costly or inaccurate, leading to suboptimal estimators. Furthermore, most multi-fidelity estimators are not designed with an outer design optimization loop in mind, where covariance information and thus estimator properties may vary substantially from design to design. In this work, we leverage uncertainty information in a parameterization of the covariance matrix to adaptively guide pilot sampling simultaneously as the outer optimization loop converges. In doing so, the overall multi-fidelity optimization process can converge more efficiently. We demonstrate this through applications for multi-fidelity optimal experimental design. References: B. Peherstorfer, K. Willcox, and M. Gunzburger, “Optimal Model Management for Multifidelity Monte Carlo Estimation,” SIAM J. Sci. Comput., vol. 38, no. 5, pp. A3163–A3194, Jan. 2016, doi: 10.1137/15M1046472. G. F. Bomarito, P. E. Leser, J. E. Warner, and W. P. Leser, “On the optimization of approximate control variates with parametrically defined estimators,” Journal of Computational Physics, vol. 451, p. 110882, Feb. 2022, doi: 10.1016/j.jcp.2021.110882. A. A. Gorodetsky, G. Geraci, M. S. Eldred, and J. D. Jakeman, “A generalized approximate control variate framework for multifidelity uncertainty quantification,” Journal of Computational Physics, vol. 408, p. 109257, 2020, doi: https://doi.org/10.1016/j.jcp.2020.109257. D. Schaden and E. Ullmann, “On Multilevel Best Linear Unbiased Estimators,” SIAM/ASA J. Uncertainty Quantification, vol. 8, no. 2, pp. 601–635, Jan. 2020, doi: 10.1137/19M1263534. Y. Xu, V. Keshavarzzadeh, R. M. Kirby, and A. Narayan, “A Bandit-Learning Approach to Multifidelity Approximation,” SIAM J. Sci. Comput., vol. 44, no. 1, pp. A150–A175, Feb. 2022, doi: 10.1137/21M1408312.

Victor Verma

Statistics

Optimal Extreme Event Prediction in Heavy-Tailed Time Series

Abstract

A problem that arises in many areas is predicting whether a time series will exceed a high threshold. One example is solar flare forecasting, which can be done by predicting when a quantity called the X-ray flux will surpass a threshold. We define a predictor to be optimal if it maximizes the precision, the probability of the event of interest occurring given that an alarm has been raised. We prove that in the general case, the optimal predictor is a ratio of two conditional densities. For several time series models, such as MA($\infty$) and AR($d$) models, we obtain a simple, closed-form expression for the optimal predictor. This leads to new methodology for optimal prediction of extreme events in heavy-tailed time series. We establish the asymptotic optimality of the resulting predictors as the training set size goes to infinity using results on uniform laws of large numbers for empirical processes of ergodic time series. Under the assumption of regularly varying tails, we also obtain theoretical expressions for the asymptotic precisions of the optimal predictors as the extreme-event threshold rises. The performance of the optimal predictors and their approximations is demonstrated with simulation studies and the methodology is applied to solar flare forecasting based on the time series of X-ray fluxes obtained from the GOES satellites.

Oral Presentation 6: Methods/Theory

March 29th 2:00 PM – 4:00 PM @ Amphitheatre

Abstract

Predicting phenology, which is the timing of important biological events, is crucial during climate change. Changes in phenological events have significant impacts on ecosystem functioning and human activities, including carbon sequestration, tourism, and agriculture. Diverse models have been used to predict phenology, ranging from process-based to data-driven, each with their pros and cons. While process-based models are informed by and contribute to ecological knowledge, data-driven models can achieve high predictive accuracy at a cost of interpretability. In this study, we compare these two types of models to understand their capabilities in predicting phenology under climate change. We focused on phenology of temperate deciduous forests in the Appalachian and Cumberland Plateau regions from 2000 to 2021. Across 100 randomly selected sites, we extracted a land surface phenology metric, start of the season (SOS), from MODIS Aqua and Terra Vegetation Indices 16-Day L3 Global 500m datasets. We retrieved daily climatic predictors, such as temperature, precipitation, and day length, from the Daymet dataset. We trained three process-based models (Thermal Time (AA), Alternating Time (AT), and Parallel (PA)) and a linear regression model to predict SOS with climatic variables. We evaluated the two types of models from three aspects: 1) root mean square error (RMSE) to evaluate short-term predictive accuracy both in- and out-of-sample, 2) uncertainty of model parameters and their correlation to evaluate parameter stability and identifiability, 3) predictive distribution with simulated climatic variables to evaluate the ability to generate realistic predictions. We showed that linear regression, a simple data-driven model, demonstrated higher out-of-sample accuracy in short-term predictions. Process-based models, although comparable in predictive accuracy, might suffer from parameter identifiability issues. When projecting into the future, linear regression is more likely to generate ecologically unrealistic predictions. Our findings underscore the complexity of phenology modeling and the need for integrative approaches to accurately predict phenology under climate change.