Pre-SLS HSLC Environmental Climatology Update

On October 5, I traveled to Norman, OK and presented an update of my HSLC environmental parameter climatology to Steve Weiss, Dr. Israel Jirak, and Andy Dean from the Storm Prediction Center. They provided feedback on my project and made recommendations for me going forward, including both methodology suggestions and theoretical considerations. One of their main concerns involved the formulations of the SHERB and SHERBE and their lack of a moisture parameter. Given that the parameters were designed for all significant HSLC events, not just HSLC tornadoes, I feel that it is a justifiable exclusion at this time. However, when we begin to focus our efforts on discriminating between significant tornadoes and significant winds, a moisture parameter will be critical.

Andy from the SPC also recently provided me with data for all significant events across the U.S. from 2006 through 2011 in addition to data for all nulls (defined as a severe thunderstorm or tornado warning issued during a convective day in which no severe reports were gathered in the respective CWA) across the U.S. between Oct. 2006 and Dec. 2011. This has provided me with a test dataset for the entire U.S. across all environments, including HSLC.

So far, I have focused on evaluating the performance of the SHERB and SHERBE using the new verification dataset. There are some noteworthy differences between the two datasets and our methods when testing versus when developing the parameter:

  1. For our development null dataset, we used spatial interpolation within GEMPAK to gather archived SPC Mesoanalysis data for the previous hour. In this verification dataset, the SPC provided us with the data for the nearest grid point, which is consistent with the significant reports database. However, through previous testing, this was shown to be an inconsequential difference on the mean. Additional tests will be conducted to verify this is the case.
  2. When developing the SHERB and SHERBE, we used only one report per CWA per hour (ORPC), while in testing, we have so far used all significant reports and all nulls. The latter method will likely identify whether the parameters are weighted towards the widespread significant events or if they are potentially more useful in picking out isolated significant events. The original method may provide more utility for the SHERB and SHERBE, as I would guess that most other diagnostic parameters should light up when a widespread significant event is expected. However, this may not be the case. If there are any comments on this, I would be happy to read them. Regardless, we plan on testing with the ORPC method, as well, to determine if this led to some discrepancies we have identified in our results.
  3. With our development dataset, we considered HSLC “events,” in which we went through each event CWA-by-CWA and assessed how many reports met our HSLC criteria. If over half of the reports met the criteria of <= 500 J/kg of SBCAPE and >= 35 kts of 0-6 km shear, the reports for that day were included in our dataset (and then subject to the ORPC filter). In testing, we have used a strict cutoff at our HSLC criteria; in other words, all HSLC reports must be associated with data points consisting of <= 500 J/kg of SBCAPE and >= 35 kts of 0-6 km shear. This was done because we do not have the non-significant severe reports in our new dataset, though the method could be employed just using significant severe reports.

Given that we received the data just last week, our analyses have been limited. It does appear that, even with the differing methods described above, the SHERB and SHERBE outperform existing composite parameters in our CSTAR domain in discriminating all significant events against nulls (see below). However, it does not perform as well when discriminating just significant tornadoes against nulls. We plan to do a thorough investigation on why this is the case and if it can be attributed to the differences in datasets and methods or if there are other issues to be addressed.

Further, we have started to look at regional comparisons of skill of our parameters against other composite parameters. So far, we have noted that the SHERB and/or SHERBE outperforms all other composite parameters in discriminating significant HSLC severe events from nulls in a substantial amount of the U.S. The plot below shows the best performing parameter for our 11 subjectively defined regions. We plan to investigate why the SHERB and/or SHERBE struggle in some regions in more detail following SLS.

Over the next several weeks, we intend to exhaust the potential of the new dataset provided to us by the SPC. First, however, we must identify the primary cause of differences between the results we are getting with the new dataset against what we found with the development dataset–which of the above possibilities contributes most significantly to these differences, and which results are more representative of the problem we are trying to address? Following this step, we will compile a climatology of HSLC events across the entire U.S., focusing on regional, diurnal, and annual trends. Then, we will determine if we can further improve the parameters we have developed through modifications in the formulation or alternate combinations of parameters, with a focus on improving the skill in our region. Finally, once we are convinced that our parameter-based work is sufficiently thorough, we will transition into an idealized simulation framework in order to address lingering questions regarding the convective-scale features of HSLC events.

If anyone has any comments, suggestions, or questions, feel free to let me know.

This entry was posted in High Shear Low Cape Severe Wx. Bookmark the permalink.

2 Responses to Pre-SLS HSLC Environmental Climatology Update

  1. Jonathan Blaes @ WFO RAH says:

    Hey Keith,

    Thanks for the update. I have a couple of questions.

    I have a question about the null definition used in the development phase – it was defined as a severe thunderstorm or tornado warning issued during a convective day in which no severe reports were gathered in the respective CWA. Does this definition require that the environment is consistent with your defined HSLC environment? Also, if a WFO issues ten warnings and just one verifies in their CWA, will the nine warnings be included as nulls or since there was one event, will there be zero nulls? If so does this keep the overall number of nulls depressed?

    I am not very familiar with the True Skill statistic; can you comment on why the SHERBE and SHERB have a Gaussian distribution with a peak at around 1 and then tail off quickly while the other parameters decrease at a reduced slope?

    Will the addition of a greater number of significant events in the testing database provide an opportunity to differentiate the potential for isolated versus more widespread cases?

    Thanks, JB

  2. Keith Sherburn says:


    Good questions. I’ll address them in order…

    1) The majority of the nulls in my development dataset (about 90%) were HSLC using the 500 J/kg and 35 kt thresholds. I kept about ten marginal cases (e.g., SBCAPE up to 650 J/kg and 0-6 km shear between 25-35 kts) in order to maintain consistency with our reports dataset, in which we kept HSLC “events,” where over half of the reports for a CWA had to meet our HSLC criteria for the event to be included for that WFO. In other words, I wanted to keep some non-HSLC nulls in order to prevent biasing the original comparisons due to the existing non-HSLC reports.

    2) Yes, your statement is correct. There would be zero warnings for that day included in the nulls.

    3) In terms of depressing the number of nulls, I guess it depends on what a given individual would consider a null. If a null is defined as a false alarm warning, then yes, the method that we’ve used will cut a substantial fraction of nulls out of the dataset. However, considering every false alarm warning rather than doing it our way could have some negative implications. I would guess that oftentimes, verified and unverified warnings lie in close proximity to one another. Given that the archived SPC Mesoanalysis data are hourly and at a 40 km grid spacing, it is conceivable that you could be sampling the same point for a null and an event. Our method ensures that you will not have this issue, and it should paint an overall better picture of when you are likely to get a non-severe convective episode versus when there is a higher potential for significant severe weather.

    4) There are a couple of reasons for the differing shapes of the TSS curves for the SHERB/SHERBE vs. other parameters. Let me use the SCP as an example. Given the components of the SCP, there are a wide range of operational values possible for the SCP. Meanwhile, given that the curve is fairly flat regardless of the parameter’s threshold value, this indicates that the skill is approximately uniform for the entire range of thresholds plotted. In other words, there is no one value that clearly discriminates between the HSLC significant events and nulls. Thus, even though the TSS is high, is the parameter really operationally viable? I would argue that it’s not, because there is no clear value that separates between significant events and nulls.

    On the other hand, take the SHERB for our domain. First, the SHERB’s components limit its realistic range from 0 to ~4 in the most extreme cases. Thus, there is naturally a smaller range than you have with the SCP (which could conceivably range from 0 to 30 or higher). Second, as you mention, it has a fairly clear peak TSS near 1. This tells me, with pretty high confidence, that I should use a value of 1 to discriminate between HSLC significant events and nulls.

    Or, explaining a different way: If you used a SHERB threshold of 1.5, for example, your TSS would be lower because your POD would be lower, though your FAR would be about the same as it is using a threshold of 1. If you used a value of 0.5, your POD would be very high, but your FAR would also rise substantially. With the SCP, your POD and FAR fall at about the same rate as the threshold increases, so your skill remains steady.

    5) Yes — with our new dataset, we should be able to test the parameters’ (both the composite parameters and the individual parameters in the relational database) ability at discriminating between less widespread events (say, 5 significant events) and nulls. This could certainly augment our previous analysis, which, given our methods, was primarily designed to more readily identify environments in which *a* significant report was more likely. It did not necessarily attempt to indicate how widespread the threat would be.

    Also, this is not related to any of the questions you asked, but I wanted to note (and I will be editing the post later to address this) that the above analysis with the new dataset included EF1 tornadoes. The EF1 tornadoes were not included in the dataset when we originally developed the SHERB/SHERBE, and we feel that this is likely a major contributing factor to the decrease in skill, particularly when discriminating tornadoes from nulls. When just comparing the *significant* (EF2+) tornadoes against nulls, the SHERB/SHERBE have the highest skill of any composite parameter for our CSTAR region, as they did in our development dataset. Further, the margin between the SHERB/SHERBE and other composite parameters when discriminating all significant events from nulls in our CSTAR region increases when excluding EF1 tornadoes. This raises an additional question that I intend to address in future work: Why do the SHERB/SHERBE struggle with EF1 tornadoes?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s