Fine-tuning heat stress algorithms to optimise global predictions of mass coral bleaching

Increasingly severe marine heatwaves under climate change threaten the persistence of many marine ecosystems. Mass coral bleaching events, caused by periods of anomalously warm sea surface temperatures (SST), have led to catastrophic levels of coral mortality globally. Remotely monitoring and forecasting such biotic responses to heat stress is key for effective marine ecosystem management. The Degree Heating Week (DHW) metric, designed to monitor coral bleaching risk, reflects the duration and intensity of heat stress events, and is computed by accumulating SST anomalies (HotSpot) relative to a stress threshold over a 12-week moving window. Despite significant improvements in the underlying SST datasets, corresponding revisions of the HotSpot threshold and accumulation window are still lacking. Here, we fine-tune the operational DHW algorithm to optimise coral bleaching predictions using the 5km satellite-based SSTs (CoralTemp v3.1) and a global coral bleaching dataset (37,871 observations, National Oceanic and Atmospheric Administration). After developing 234 test DHW algorithms with different combinations of HotSpot threshold and accumulation window, we compared their bleaching-prediction ability using spatiotemporal Bayesian hierarchical models and sensitivity-specificity analyses. Peak DHW performance was reached using HotSpot thresholds less than or equal to Maximum Monthly Mean SST and accumulation windows of 4 – 8 weeks. This new configuration correctly predicted up to an additional 310 bleaching observations compared to the operational DHW algorithm, an improved hit rate of 7.9 %. Given the detrimental impacts of marine heatwaves across ecosystems, heat stress algorithms could also be fine-tuned for other biological systems, improving scientific accuracy, and enabling ecosystem governance.


Introduction 36
Anthropocene marine heatwaves are becoming increasingly intense, more frequent and longer lasting disturbances are essential to assess impact, it is also very costly. Accurate monitoring of ecosystem 44 stress remotely and at scale is therefore crucial for effectively managing marine ecosystems and 45 accurately predicting the impacts of climate change on marine biota. While satellite-based remote 46 monitoring and forecasting programmes have been implemented across various biological contexts, 47 we focus this study specifically on remote monitoring and forecasting of coral bleaching. Coral reefs 48 are highly productive ecosystems that provide habitat to over a million marine species and essential 49 ecosystem services (e.g., coastal protection, food, fisheries and tourism livelihoods) to hundreds of 50 millions of people, estimated to be worth over 350,000 USD/ha/yr globally (Costanza et al. 2014; 51 Ferrario et al. 2014). These ecosystems are increasingly faced with mass coral bleaching and mortality 52 events (Hughes et al. 2017). The process of coral bleaching involves a breakdown in the symbiosis 53 between coral hosts and their endosymbiotic phototrophic algae, and can ultimately lead to full or 54 partial colony mortality (Brown 1997) and sub-lethal effects such as reduced growth (Edmunds 2005). 55 Coral bleaching is a stress response with a variety of triggers (e.g., 2003anomalous temperature, both 56 high and low; anomalous increases in the level of light; anomalous levels of salinity, both high and 57 low; reduction in water quality; and diseases; Skirving et al. 2018 Over the past two decades, the National Oceanic and Atmospheric Administration's (NOAA) Coral 63 Reef Watch (CRW) programme has developed a suite of tools for monitoring coral bleaching risk 64 using satellite-based sea surface temperature (SST) products. Specifically, the Degree Heating Week 65 (DHW) metric is used as an indicator of heat stress levels sufficient to induce coral bleaching. DHW 66 is computed as the accumulation of positive temperature anomalies (HotSpot) above a hypothesised 67 coral bleaching stress temperature (i.e., 1⁰C above the Maximum of Monthly Means SST climatology 68 -MMM) over the previous 12 weeks (Liu et al. 2003; Skirving et al. 2020). The DHW algorithm was 69 designed in the 1990s, and the HotSpot threshold of 1⁰C above MMM and accumulation window of 12 70 weeks were chosen based on field and experimental evidence from Panama and the Caribbean (Glynn 71 and D'Croz 1990; Jokiel and Coles 1990). Reflecting the technological advancements in remote-72 sensing capabilities since then, the SST and DHW products have increased in spatial resolution (50 73 km to 5 km) and temporal resolution (twice weekly to daily) (Liu et al. 2014). Despite these 74 improvements, there has not yet been a corresponding revision of the HotSpot threshold and 75 accumulation window used in the operational DHW algorithm. 76 Alternate DHW algorithms have been applied to evaluate associations between heat stress and coral 77 bleaching, mostly at local or regional scales (Weeks et  This study seeks to offer a potential revision to the operational NOAA DHW metric with a view to 119 improving its ability to predict mass coral bleaching. This will require a suitable methodology that is 120 robust to spatiotemporal correlated uncertainties and runs with reasonable computational speed. Here, 121 we apply an alternative approach to modelling bleaching-environment relationships based on 122 Integrated Nested Laplace Approximation (INLA), which explicitly solves spatial and temporal 123 uncertainties with much greater computational speed than MCMC (Rue et al. 2009). We aim to 124 optimise two DHW algorithm parameters, the HotSpot threshold (from MMM -4 to + 4⁰C) and the 125 accumulation window (from 2 to 52 weeks) to improve coral bleaching predictions globally whilst 126 still addressing the common issue of spatial and temporal dependencies. We achieved this by 127 combining recently developed Bayesian hierarchical modelling techniques using INLA with a 128 streamlined parallel-computing workflow on a high-performance computing cluster called "The 129 Rocket". This allowed hundreds of spatiotemporal INLA models to be run in a short time frame (i.e., 130 hours instead of weeks as would be the case using MCMC). 131

Data & Methods 132
Coral Bleaching Data 133 The optimisation study presented here was based on a global dataset of 37,871 bleaching survey surveys, line-intercept transects, belt transects, quadrats, radius plots, rapid visual assessments (e.g., 137 manta tows), ad hoc estimates, and interviews with stakeholders. Since data were collected by 138 hundreds of observers globally over several decades, data collection protocols for these different 139 general methods are not standardised. 140 The original dataset underwent four layers of filtering a priori to ensure its suitability this for 141 analyses. 1) Data were first filtered for errors. This excluded observations that did not have a recorded 142 month or year, as well as observations in which the coordinates provided did not correspond with a 143 coral reef location (5,562 observations excluded). 2) Data were removed if the survey date fell outside 144 the period of peak thermal exposure for that year. As, for the purpose of this study, we are only 145 interested in coral bleaching that results from thermal stress (i.e., not bleaching due to cold-stress, 146 nutrient enrichment etc.), instances of bleaching that cannot be linked to the period of peak thermal 147 exposure may not accurately reflect the status of heat-induced bleaching for that year and location. 148 We we summarised these as representative minimum and maximum percentages. Then, the absence of 155 ecologically significant bleaching was defined as having a maximum estimation of 10% bleaching or 156 less, while the presence of ecologically significant bleaching was defined as having a minimum 157 estimation of 20% or greater. Observations in which the maximum estimation exceeded 10% while 158 the minimum estimation remained below 20% were filtered out to reduce the chance of 159 misrepresenting bleaching and no-bleaching observations ( Fig. S1) (1,452 observations excluded). 4) 160 Lastly, to account for spatiotemporal patchiness a priori, we only retained years which had greater 161 than 100 independent observations, had a qualitatively even global distribution, and were not 162 temporally isolated (i.e., the proceeding years also needed to meet the previous two criteria). This dataset are associated with mass bleaching events, but despite our filtration process, some bleaching 171 observations will inevitably result from small scale local heat stress and other non-heat related factors.

172
Since the models presented in this study are based solely on large scale accumulated heat stress, the 173 model predictions we present reflect the mechanism of mass coral bleaching which is referred to from 174 here on. window. Each daily HotSpot used in the summation is divided by seven a priori, such that 194 As an example, consider a 12-week window ending on April 1 st for a specific survey location. This We computed a total of 234 test DHW metrics (DHWtest), each with unique combinations of HotSpot 200 thresholds (9 levels, from -4 to + 4⁰C relative to MMM) and accumulation windows (26 levels, from 201 2 to 52 weeks). Unlike the operational metric, HotSpots for DHWtest were calculated relative to the 202 MMM after an adjustment for the specific threshold in question (3). In the operational metric only 203 HotSpots > 1⁰C are accumulated, however, in the test metrics all positive HotSpots are accumulated. 204 Therefore, values of DHWtest are numerically different than DHWop but are conceptually the same (see 205 Figure 6). Time series of daily DHWtest were computed as the accumulation of HotSpots (4), where i 206 is the date, n is the earliest date of the accumulation window, and j is the length of the accumulation 207 window in days minus one, such that 208

Statistical Approach 210
The time unit used in the following models is the calendar year. As coral bleaching is more likely at 211 higher levels of heat stress ( current study since no such instances were present in the dataset. The relative performance of DHW 219 metrics for predicting mass coral bleaching were assessed systematically using the following 220 conceptual framework. 221 1) For each DHW metric, the association with coral bleaching was tested using a spatiotemporal 222 Generalised Linear Model (GLM) with a Bernoulli error structure using INLA. 223 2) Sensitivity-specificity analysis was performed on this GLM to optimise predictions, tally 224 model successes and failures, and provide metrics for model comparisons. 225 3) The first two steps were repeated for all DHWtest metrics and DHWop, resulting in 235 226 separate GLMs and sensitivity-specificity analyses, each run in parallel on separate Intel 227 Xeon E5-2699 processors via the high-performance computing cluster "The Rocket". 228 4) Model comparisons were used to determine the best-performing models and hence the 229 optimal HotSpot threshold and accumulation window for predicting coral bleaching globally 230 using DHWs. 231

Model formulation 232
We have adopted a spatiotemporal Bayesian modelling approach to predict mass coral bleaching 233 based on DHWs using the R-INLA package (http://www.r-inla.org) (Rue et al. 2009). Compared to 234 more commonly used frequentist approaches, Bayesian inference allows uncertainty to be more easily 235 interpreted. Moreover, using R-INLA over other Bayesian tools (e.g., Monte Carlo Markov Chains) 236 provides the opportunity to resolve spatiotemporal correlation explicitly and more rapidly (Rue et al. 237 2009). 238 Observations of mass coral bleaching are often spatiotemporally correlated due to large-scale climatic 239 drivers. While basic linear regressions applied to such data ignore these dependencies and lead to 240 pseudoreplication (Hulbert 1984), R-INLA circumvents these issues. In each time point, spatial 241 dependencies are dealt with by implementing the Matérn correlation across a Gaussian Markov 242 random field (GMRF), essentially a map of spatially correlated uncertainty. This is achieved using 243 stochastic partial differential equations (SPDE) solved on a Delaunay triangulation mesh of the study 244 area. The parameters (Ω) that determine the Matérn correlation are the range (rrange at which 245 spatial correlation diminishes) and error (σ). Weakly informative prior estimates of these parameters 246 (r0 and σ0) are recommended when implementing the Matérn correlation (Fuglstad et al. 2019).

247
Temporal dependencies among these GMRFs are dealt with by imposing a first order autoregressive 248 process (AR1), defined by the AR1 parameter (ρ) (9). This allows for correlation in model residuals 249 through time avoiding pseudoreplication. 250 To test the effect of DHW metrics on coral bleaching, a triangular mesh (Fig. 2) was defined with a 251 maximum triangle edge length of 600 km and a low-resolution convex hull (convex = -0.03) around 252 the study sites to avoid boundary effects (1,790 nodes). This mesh was repeated for each year in the 253 time series (26,400 nodes). The probability of coral bleaching for a given observation (CBt,i) in a 254 given year (t) and location (i) was assumed to follow a Bernoulli distribution (πt,i) using the logit-link 255 function for binary data. Bleaching was modelled as a function of the DHW metric in question (fixed  Deviance Information Criterion -DIC) (Nakagawa and Freckleton 2008). Thus, to address patchiness 287 beyond basic filtering, we performed a simulation test (Fig. S3 & Fig. S4). In summary, patchiness 288 did not have an important effect on estimated model parameters (Fig. S5), validating the broader 289 model comparison methods and results of the main study. Full details are described in the 290 Supplementary Materials. 291

Sensitivity-Specificity Analysis 292
To optimise binary predictions from each Bernoulli GLM, sensitivity-specificity analyses were correctly predicted, was also computed at the optimal cut-off level for each model. 304

Model Comparisons 305
Model comparisons were based on the Bayesian DIC and two key metrics from the sensitivity-306 specificity analysis: AUC and hit rate. DIC is a measure of overall model parsimony (Zuur and Ieno 307 2017), but is based on both the DHW fixed effect and the spatiotemporal random effect. Therefore, 308 AUC and bootstrapped confidence intervals were used as the preferred model comparison metric, as 309 this evaluates the overall performance of a binary classifier relative to a perfect predicting model 310 (Robin et al. 2011), based on the fixed effect only. Hit rate is an additional metric that allows easy 311 interpretation of model success. 312

Model Comparisons 314
For predicting coral bleaching based on DHWtest, we identify (1) a group of worst performing models, 315 (2) a group of better performing models, and (3) a suite of best performing models. (1) Poor GLM 316 performance was associated with DHWtest metrics computed on HotSpot thresholds ≥ MMM + 2⁰C or 317 accumulation windows ≥ 22 weeks. This was evident by low AUC values < 0.7 and high DIC values 318 > 7000 (Fig. 3, right and upper regions).
(3) Finer determination of the best 321 models of this subset was made possible by incorporating sensitivity-specificity uncertainty into 322 model comparisons (Fig. 4, 95% bootstrapped confidence intervals). A performance-optima 323 relationship was apparent between AUC and the HotSpot threshold and accumulation window, 324 whereby peak GLM performance was reached when DHW accumulation windows were 4 -8 weeks 325 (Fig. 4). When DHW accumulation windows were outside this range (2 weeks or ≥ 10 weeks), 326 corresponding AUC was significantly lower than the AUC of the best performing GLMs (Fig. 4, blue  327 shaded region). Notably, of all the GLMs that used the same accumulation window (grey and white 328 band groupings, Fig. 4), those models applying lower HotSpot thresholds performed better in terms of 329 AUC and DIC. The 8-week accumulation window resulted in the best overall fit of AUC and DIC 330 combined (max DIC = 6812). In summary, the suite of best-performing models (in terms of bleaching 331 prediction) applied DHWtest metrics based on HotSpot thresholds ≤ MMM and accumulation windows 332 of 4 -8 weeks.  the best GLM is shown as a blue shaded region. Note the DHWop algorithm is slightly different than 347 the DHWtest algorithm (Equation 1-4). 348

Best Model -Validation 349
The GLM based on the DHWtest metric with HotSpot threshold of MMM + 0⁰C and accumulation 350 window of 8 weeks (DHWtest-0C-8wk), was a representative of the suite of best-performing models. The 351 probability of bleaching output from this model (based on DHWtest-0C-8wk and unmeasured 352 spatiotemporally correlated factors) closely matched the observational bleaching record (Fig. 5A).

353
Both the fixed effect (DHWtest-0C-8wk) and the random effect (spatiotemporal uncertainty) provided 354 important contributions to predictions of coral bleaching (Fig. S7). The sensitivity-specificity analysis 355 reflected the high performance for this model, with an AUC value of 0.783 (Fig. 5B). The range 356 parameter (r) of GMRFs showed that drivers of bleaching other than DHWtest-0C-8wk were spatially 357 correlated up to 697 km (Fig. S6), consistent with the spatial scale of climatic and weather systems. 358 The AR1 parameter (ρ) of 0.62 indicated moderate temporal correlation of uncertainty in predicted 359 coral bleaching (i.e., drivers other than DHWtest-0C-8wk), meaning that the uncertainty in bleaching 360 predictions in one year is affected by that of the previous year by a factor of 0.62 (Fig. S6). This can 361 be seen visually on maps of temporally correlated GMRFs (Fig. S7). Sensitivity-specificity analysis is shown for the same GLM without spatiotemporal uncertainty. 368 Sensitivity is defined as the proportion of correctly classified bleaching observations (true positives), 369 and specificity as the proportion of correctly classified no-bleaching observations (true negatives). 370 Area Under the Curve (AUC) and bootstrapped 95% confidence intervals (shown in brackets) reflect 371 the distance to a perfect predicting model (AUC = 1). 372

Best Model -Understanding Heat Stress 373
Even though lowering the HotSpot threshold and reducing the accumulation window improved 374 predictions of mass coral bleaching (Fig. 3, Fig. 4), the DHWop metric still categorised bleaching 375 observations well. DHWop values were greater for bleaching records than for non-bleaching records 376 (Fig. 6). Of the 517 highest heat stress records (> 95 th percentile: > 9.0⁰C-weeks), 78% were 377 associated with coral bleaching observations, highlighting the importance of heat stress as a proximate 378 cause of coral bleaching. Such levels of heat stress relate to NOAA CRW Bleaching Alert Level 2. 379 However, in comparison to DHWop, the test metric DHWtest-0C-8wk showed a higher distribution of heat 380 stress values overall, but lower extremes values (Fig. 6). This is due to a lower HotSpot threshold and 381 shorter accumulation window, respectively. This was characterised by fewer DHW values of zero (1 382 vs. 27%), a higher mean (5.2 vs. 2.5⁰C-weeks), a higher 95 th percentile (9.9 vs. 9.0⁰C-weeks), but a 383 lower 99 th percentile (11.3 vs. 12.5⁰C-weeks). The number of bleaching observations associated with 384 a heat stress of zero was 6 for DHWtest-0C-8wk and 122 for DHWop. Given that DHWtest-0C-8wk had a 385 lower HotSpot threshold, fewer bleaching observations are associated with heat stress values of zero. 386 In other words, reducing the HotSpot threshold increased our ability to predict coral bleaching 387 associated with weak marine heatwaves. however, there has not yet been a corresponding revision of the HotSpot threshold and accumulation 403 window used in this algorithm. Here, we developed 234 different DHW algorithm variants each with 404 a different HotSpot threshold and accumulation window. We assess the performance of these DHWtest 405 metrics for predicting mass coral bleaching globally. Compared to DHWop, it was possible to improve 406 the coral bleaching hit-rate by up to 7.9% by using different HotSpot thresholds and accumulation 407 windows, equating to an additional 310 correctly predicted bleached reefs out of a total of 3895 (also 408 linked to an increased false negative rate of 3%). Simply reducing the HotSpot threshold to MMM (or 409 < MMM) rather than MMM + 1⁰C, resulted in up to 6.8% increases in hit rate, whilst using an 410 accumulation window of 8 weeks instead of 12 weeks maximised this hit rate. Such improvements 411 were also reflected in model comparison metrics from sensitivity-specificity analyses (increased AUC shifting throughout biotic and abiotic marine systems and that rates of adaptation to future 419 environmental conditions are yet unknown, the concepts addressed in this study likely need to be 420 revisited in the future at semi-regular intervals to ensure that the DHW product remains as accurate as 421 possible. 422

Complexities of coral bleaching 423
Coral bleaching is a stress response whereby photosynthetic algal symbionts are lost from the coral 424 host tissues, resulting in the white coral skeleton becoming progressively more visible (Brown 1997;425 Douglas 2003). Given the complexity of this host-symbiont relationship, survey metrics such as 'coral 426 bleaching extent' provide limited information from which to infer biological causes. Coral Here we have refined the ability of a common heat stress metric to predict mass coral bleaching. 442 Ideally, such an optimisation study would be based on coral bleaching data that relate to only heat 443 stress related mechanisms. By filtering the dataset as described, we did our best to achieve this, 444 however, bleaching observations in the dataset may inevitably have been caused by other biotic or 445 abiotic factors, contributing to the noise in our results. Bleaching observations from surveys may also 446 be subject to other inaccuracies such as the assumption that sampling only part of a reef is 447 representative of the entire reef. Despite these points, the model comparisons performed in this study 448 remain valid as model biases were applied to all models equally. Given these facts, the AUC and hit 449 rate from sensitivity-specificity analyses are unlikely to reflect the absolute accuracy of DHW 450 metrics, but rather allow comparisons of relative accuracy to determine optimal HotSpot thresholds 451 and accumulation windows. The optimisation study presented here was performed on a global coral 452 bleaching dataset. For scientists and practitioners aiming to assess global patterns in coral bleaching, 453 we have shown that bleaching predictions can be improved by computing DHW metrics using an 454 optimal HotSpot threshold of the MMM + 0⁰C and accumulation window of 8 weeks. This is just one example of a region that could benefit from a specific regional DHW optimisation. 475 Notably, the methods applied in this study would be easily adapted to develop such regional DHW 476 products. 477

Future outlook 478
Optimising heat stress metrics for specific purposes could also be useful for other marine systems. rapid response actions to such events, it would guide marine protected area design (i.e., focus on 497 conserving thermal refugia) and inform future projections of marine systems and related policy 498 recommendations. 499

Conclusion 500
The Anthropocene is characterised by shifting baselines of biological communities, loss of 501 biodiversity, and increasingly severe and frequent climatic disturbances. Thus, there is growing need 502 to understand and be able to predict climatic and anthropogenic disturbances on habitats, particularly 503 those that provide key ecosystem services to socioecological systems. Here, we have fine-tuned a 504 commonly used heat stress algorithm to a specific purpose (i.e., predicting mass coral bleaching), and 505 have shown that simple changes (compared to the operational algorithm) can result in a considerable 506 improvement in prediction success. The philosophy behind this optimisation study was to remove 507 prior expectations, run the models, and allow the data to reveal the optimal HotSpot threshold and 508 accumulation window for predicting mass coral bleaching globally. In this case, coral bleaching 509 observations were correctly predicted up to 7.9% more often just by reducing the HotSpot threshold 510 and accumulation window of the DHWtest metric. Broadly, improving bleaching prediction success of 511 the operational DHW metric can support stakeholders and end-users such as coral reef managers, 512 inform the design of MPA networks (e.g., including thermal refugia), and provide more accurate 513 information which can lead to better conservation and restoration decision-making (e.g., shifting 514 valuable coral nurseries during heatwaves, assisting with decisions on when to relocate acquarium-515 grown corals to the reef, etc.). Fine-tuning DHWs also has potential for other specific systems, such 516 as predicting planktonic shifts and associated impacts to higher trophic levels. Increasingly under 517 climate change, marine heatwaves are shaping species populations, biological food webs and even