Behind the Model: Nowcasting

At a glance

Up-to-date information is essential to public health decision-making. However, reporting delays can pose challenges to determining recent trends. Nowcasting methods address this challenge by adjusting incomplete data based on historical reporting patterns to estimate current trends for disease metrics—improving situational awareness and aiding decision-making.

What is nowcasting?

Public health decision-makers, practitioners, and the general public need up-to-date disease information to guide their decisions. However, disease tracking systems are subject to lags in data reporting—meaning available data may not accurately represent the current situation. It can be challenging to know whether declines in recently reported data are due to true declines or to delays in reporting. Nowcasting addresses this challenge by using models to produce real-time estimates of key metrics based on currently available but incomplete reported data and historical reporting patterns, potentially revealing changes in disease transmission dynamics before they would otherwise be detected. Nowcasting methods can be applied to many different metrics including case counts, emergency department visits, hospitalizations, or deaths.

Using nowcasts to inform decision-making

Nowcasts provide timely information to decision-makers and the general public by providing real-time estimates of disease surveillance metrics, which may guide resource allocation decisions and inform public health guidance. These metrics can include measures of disease burden and disease incidence, such as hospitalization rates, death rates, and case counts. Nowcast models produce estimates along with bounds of uncertainty to indicate the range of possible values. A common measure of uncertainty used in nowcasts is the prediction interval, which represents the most plausible range of values based on the available data, and prior information from other studies.

Example applications

One of CFA's Insight Net sites, epiENGAGE, collaborated with the to successfully apply nowcasting techniques. MDPH public health experts explored better ways to estimate and communicate the true burden and trajectory of respiratory emergency department (ED) visits captured within the National Syndromic Surveillance Program (NSSP) to enable rapid response and accurate public reporting. Working closely with MDPH, epiENGAGE modeling experts analyzed ED data from past respiratory seasons to estimate baseline reporting completeness for respiratory diseases. The team then created a model that generates a nowcast estimate, with a prediction interval, for respiratory ED visits for the most recent week of the current respiratory season. This robust indicator for the total burden of respiratory ED visits allows decision-makers to evaluate the best interventions to protect public health and deploy a rapid response to respiratory outbreaks. These data are also available on MDPH's public respiratory dashboard.

In addition to providing situational awareness, nowcasted data can also be used as inputs into additional analyses. A common metric derived from nowcasted data is the time-varying reproduction number, Rt . Rt provides estimates of whether an outbreak is growing (Rt > 1), declining (Rt < 1), or not changing (Rt = 1). CFA currently uses nowcasting methods to inform estimates of epidemic trends for COVID-19 and influenza within the United States based on Rt, using NSSP data. Without nowcasting, recent data would almost always appear to be decreasing due to reporting delays—which lead to recent data being incompletely reported. The final reported dataset may show trends that are very different to the trends shown in the preliminary, incomplete data, but this final dataset might not be available for weeks (Figure 2). One approach to address this challenge could be to drop the most recent days from the analysis; however, this would lead to out-of-date information and estimation of past (rather than present) trends. Nowcasting enabled Rt to serve as a real-time leading indicator for increases in COVID-19 during summer 2024, as highlighted in CFA's collaboration with the New Mexico Department of Public Health.

Nowcasting

Overview

Broadly, nowcasting models produce estimates of a disease metric, such as hospitalizations or ED visits, that account for incomplete reporting of recent events. To do this, they use information about the delay between when an event occurs and when it is reported. This delay can be measured by taking snapshots from disease databases over time to identify both the event date and reporting date from the data (Figure 1). These reporting delays are used to estimate how likely it is for an event to be reported on a given day, given how long it has been since the event occurred. This information is then combined with the observed data, to adjust recent reports to account for events which have not yet been reported. For example, if we know that only 30% of ED visits are reported on the same day as the actual visit, then we can estimate the final number of ED visits for that day by adding the expected 70% that has not yet been reported. However, trends are often not this straightforward, and delays can vary over time and between locations. Nowcasting models can help characterize and account for these variations, and they also provide an estimate of real-time uncertainty in the final reported counts.

Diagram illustrating a general nowcasting approach.  Nowcasts produce estimates based on the current reported data and the distribution of historical delays in reporting.
Illustration of a general nowcasting approach. Initially when recent epidemiological data, in this example emergency department (ED) visits, are received, the data appear to be trending downward. However in the coming days as more reports are received and the data are updated, an increasing trend is observed. This is due to delays in reporting, which nowcasting aims to correct for. Nowcasting approaches do this by estimating reporting delay distributions using snapshots of archived datasets over time, and then use this to correct recent estimates based on what is known about the distribution of reporting delays.

Diagram illustrating a general nowcasting approach. On the day that the most recent data are received, the trend in ED visits appears to be decreasing. The next day, more reports come in and the data from the past days are updated. The following day, even more reports come in, and the trend now appears to be decreasing. Using snapshots of epidemiological datasets over time, we can measure how long it usually takes for reports to be recorded and then estimate a statistical distribution of this delay. Nowcasts produce estimates based on the current reported data and this distribution of historical delays in reporting.

Plot of ED visit date against counts of ED visits diagnosed with COVID-19. One set of points show preliminary data. Another set of points show the complete reported dataset.
Reporting delays can mean recent data are not completely observed, as seen in this timeseries of emergency department (ED) visits in New Mexico. When plotting visits reported in July, daily incident visits from late June and early July initially appear to decrease due to reporting delays. However, once the data are more complete in September, no decreasing trend in daily incident ED visits is observed during this period. Nowcasts produced using data reported up to July 2 were able to correctly account for these reporting delays and produce accurate estimates of the complete data before it was fully reported.

Data

The type and amount of data required to produce nowcasts depends on the metric forecasted and the context. However, at a minimum, the following requirements typically apply:

  • A time series for the metric of interest (e.g., hospitalizations or ED visits), including date of event (e.g. hospitalization date or ED visit date)
  • Information on the extent of underreporting, reporting delays and their changes over time and across different locations. A key source of this information is comprehensive archives of the event date and report date for each case, so that we can track how reporting records change over time. Access to archived reporting data is a key component to producing nowcasts (see What challenges do we encounter in producing nowcasts?).

Underlying Models

Models used to produce nowcasts can be mechanistic or statistical. Mechanistic models explicitly model biological components of the underlying disease system, whereas statistical models aim to describe the statistical relationship between the variables of interest. In either case, these models generally adjust for the lag between infection and observation, incomplete observation of recent infection events, and day-of-week reporting effects, in addition to uncertainty within these adjustments. Currently, the primary application of nowcasting used at ÐÇ¿ÕÓéÀÖ¹ÙÍø is to generate nowcasts as part of estimation of the time-varying reproductive number, Rt. Nowcasting is a key component of this analytical pipeline, as it allows us to estimate Rt in real time rather than excluding recent data due to incomplete reporting. Below are the details of two models used by ÐÇ¿ÕÓéÀÖ¹ÙÍø which incorporate nowcasting: EpiNow2 and a generalized additive model approach.

is a model and R package used by ÐÇ¿ÕÓéÀÖ¹ÙÍø to produce nowcasts and estimate the time-varying reproductive number, Rt, among other transmission metrics. It uses , and Bayesian inference to fit the model to data. This method estimates both an unobserved time series of incident infections and Rt from a surveillance dataset (e.g., cases, hospitalizations, deaths, ED visits, or other forms of disease monitoring). There are several key components of the model that adjust for delays and underreporting (the nowcasting component of the model), while allowing for Rt estimation from infection data.

Modeling delays between infection and reporting

Rt reflects transmission on date t. However, in almost all situations, the available data to input into a nowcasting model are not counts of infection, but another more easily observable metric such as hospitalizations, ED visits, or reported cases. In order to infer infections from these data sources, models of the distribution of each delay are required (Figure 3). For example, there is a delay between infection and showing symptoms (the incubation period). A proportion of these symptomatic cases will visit the emergency department (ED) but there is a delay between symptom onset and ED visit. Then, there is another delay between an ED visit and it being reported to disease monitoring systems. Accounting for delays between (unobserved) new infections and (observed) ED visits as well as nowcasting ED visits based on delays in reporting are both essential to producing accurate measures of Rt and epidemic trends.

Diagram showing the timeline of infection to reporting for COVID-19.
Figure 3. Diagram showing delays of days to weeks between phases of a person’s COVID-19 illness; an infection takes days to develop symptoms, then days until a possible emergency room visit, then days to weeks of delays before the visit is reported to ÐÇ¿ÕÓéÀÖ¹ÙÍø.

Accounting for incomplete recent data

Due to delays in reporting, all recent infections may not yet be captured. Therefore, recent data will be "right truncated," or incomplete. To correct for this, a truncation distribution is included in the model, which adjusts results. The parameters for this truncation distribution are estimated by adjusting the most recent reported data using the truncation distribution and iteratively comparing these results to archived snapshots of previously reported data.

Calculating Rt from nowcasted infections over time

The generation interval is the time between someone becoming infected and infecting someone else. While the generation interval for a given transmission event is unknown, the distribution of generation intervals can be estimated from a series of case data or derived from literature. This distribution is used to estimate Rt from inferred nowcasted infections.

In some situations in which models must be implemented quickly, statistical models offer more simplicity and flexibility than mechanistic or semi-mechanistic approaches. Generalized additive models (GAM) are a type of statistical generalized linear model in which the delay distribution is defined by a function fit from the data using statistical smoothing. This approach was applied to produce nowcasts for and in the United Kingdom, and CFA is now for nowcasting hospitalizations for respiratory infections.

A key feature of this model is that it is hierarchical, meaning that it allows for site-specific variation, which helps control for differing reporting patterns between locations. Hierarchical models can even model effects of sub-groups within a larger group, which means that they can account for differences between specific health facilities or smaller geographic locations. This can be useful when delay distributions and their associated uncertainty vary between health facilities, counties, or states.

What challenges do we encounter in producing nowcasts?

For nowcasting models to be effective, they need to accurately capture trends in delays. However, delays can vary depending on the surveillance metric, disease/pathogen and epidemiological context. Delays may also vary over time and between locations. For example, reporting delays may vary due to resource constraints or changes in reporting policies. Some changes in delays can be easily modeled and are predictable. These include day of week and holiday effects, and differences in reporting between population groups or health facilities. However, other delays may be unpredictable and are challenging to account for within the nowcasting model. These include one-off events such as cyber attacks or software updates that impact data storage and sharing, or events that can impact hospital operations or staffing, such as natural disasters. When hierarchical models are used that estimate location-specific effects, we can reduce the impact of these one-off events by drawing information from correlated locations that do have reliable data.

In addition, the data reporting structure can change over time as the surveillance system and epidemic develop. As a result, nowcasting approaches may need to modify the structure of the historical data to match current data structure (for example, changes to variable names or introductions of new variables). Ensuring accurate modifications requires close collaboration with data collectors and data managers.

Nowcasts often benefit from access to data archived at different time points to understand reporting timelines and characterize lags and delay distributions. However, the infrastructure required to store and archive surveillance data is not always available. For example, many systems overwrite previous records as data become more complete, to save digital storage space. Even if data revisions are archived, they are not always shared publicly. Close collaboration with data collectors, data managers, and data engineers is key to addressing these issues. Despite the challenges, there have been promising . As well-maintained data archives are increasingly made available to public health practitioners, researchers, and the public, methods for applied nowcasting in epidemiology will continue to improve and become more widely adopted.