Monday, February 1, 2021

47. Calculating the monthly anomalies using MRTs

Over the next few posts I intend to return to my investigation of the temperature records in Europe by extending my analysis to other countries of the EU besides those of Belgium and the Netherlands that were studied in Post 40 and Post 41 respectively. Central to all these studies is the concept of the temperature anomaly. In many of my previous posts (including Post 4) I have given explanations of how these have been calculated. However, because the data for each country or region tends to have different temporal distributions of data, this has sometimes necessitated using slightly different methods for determining the anomalies of some countries compared to others. So, as this is the start of a new year, I thought this would be a good time to outline exactly how the anomalies are derived, and what drives the decisions to change the methodology from time to time.

The process I use is broadly similar to that used by the main climate science groups (i.e. NOAA, NASA-GISS, Hadley-CRU and Berkeley Earth). However, there are a number of differences, and given how important the analysis process is in terms of its impact on the resulting temperature trend, it therefore seems important that I describe explicitly what I do, and why I do it, not least so that it can be easily referenced in the future.

When studying climate change it is essential that we are able to compare temperature data from different epochs and different locations. There are essentially two problems here. The first is that not all temperature records are of the same length (in terms of time). The second is that the temperatures from different regions can have massively different mean values and ranges of temperature fluctuation.  

For example, Europe has over 100 temperature records that predate 1850 and have over 1800 months of temperature data. The whole of the Southern Hemisphere has two (Rio de Janeiro and Hobart). 

When it comes to temperature ranges and mean values, there are similar extremes. Station records from near the equator (such as Manaus) can have very high mean temperatures of +27 °C with the monthly mean only varying by about ±1 °C across the seasons (such as they are). In contrast, at the South Pole the mean temperature is about -48 °C and the variation of the monthly mean over the year can be ±15 °C or more.

If all temperature records were of the same length, and if all regions of the planet had the same number of stations, these differences would not matter. But because records have different lengths and are often clustered in certain regions, they do. To see how, consider the following.

i) The mathematical basis of the temperature anomaly

In an ideal world we could calculate the change in temperature just by averaging all the temperatures from all the different stations for a given month. If this average was different from month to month then that would be evidence of climate change. We can represent this mathematically as follows.

If Ti(m) is the mean temperature of station i for month m, and Mi is the mean temperature of station i over all time, then

 Ti(m) = Mi + εi(m

(47.1)

where εi(m) is the variation of the monthly mean temperature for station i for each different month m (see Post 5 for more explanation of temperature anomalies, weather and climate). The index m has a different integer value for each month of data in the temperature record for that station. So, if the temperature record has 1200 months of data, m will take values from 1 to 1200. 

The term εi(m) in Eq. 47.1 is the temperature anomaly. It is the amount by which the temperature in station record i varies from a reference value, usually taken to be the long term mean temperature, Mi. Now consider what happens when we sum the temperatures for month m from all i station records.

 i Ti(m) = i Mi + i εi(m

(47.2)

If we now calculate the average of each term we get the result

 <T(m)> = <M> + <ε(m)>

(47.3)

where <T(m)> is the mean of Ti(m) averaged over all i stations, and repeated for each month m. Similarly, <ε(m)> is the mean of εi(m) averaged over all i stations for each month m, and <M> is the mean of Mi averaged over all i stations for each month m

In an ideal world where all temperature records have valid data in each given month m, the term <M> is just a constant that does not vary with the month m. It then follows that the change in the mean temperature <T(m)> from month to month m will be the same as the change in the anomaly <ε(m)> from month to month. So, in an ideal world (where all temperature records are of the same length), averaging all station temperature records should give us the climate change over time.

But we don't live in an ideal world and all temperature records are not of the same length. This means that <M> will not be the same for every month, but will generally be different for different months, depending on how many of the temperature records have data for that month and how many don't. So <M> will vary from month to month. That means it will contribute to (and possibly dominate) the temperature trend over time, and consequently this means that we can't use the average of all temperatures <T(m)> to determine the temperature change over time. Instead we have to use the average of the anomalies for each month <ε(m)>. That in turn means we need a reliable way of calculating the anomalies.

ii) Defining the temperature anomaly

The anomalies are the amount by which the temperatures in a given temperature record deviate from a reference value. That reference value is usually taken to be a mean of the monthly temperature data over a particular time interval. Ideally that time interval should be as long as possible so that it is as accurate as possible. However, there are two problems here. 

The first is that, if the temperature records have a strong trend, either upwards or downwards, and you use different time periods to calculate the reference mean temperature of the different records, this will distort the mean temperature trend, particularly if the different temperature records have differing amounts of data. So you really need to use the same time period for all temperature records when calculating their reference temperatures. That means that you generally can't use extremely long time time periods that use all the data from that record, but instead must use shorter time periods (usually 20 or 30 years) for which as many station records as possible have sufficient data. And if a particular temperature record has no data or insufficient data in that time period (say for the period 1961-1990) when many other records do, then that record will have to be excluded from the calculation of the mean temperature trend. If a significant number of long datasets are excluded in this way, then there are ways of repeating the calculation of the mean reference temperature using other time periods in order to include them, but that is a subsidiary problem. Essentially you will end up with multiple mean temperature trends which could then be merged into a single trend through a weighted averaging process based on the number of station records incorporated into each one.

The second problem is that as you move away from the equator, the seasonal variation of the mean temperature increases. As I pointed out above, near the equator typical variations of the mean temperature each month can vary by as little as ±1 °C. However, at latitudes of 50°N (i.e. in Europe or North America) typical variations of the mean temperature each month can exceed ±10 °C. This means that it is more accurate to calculate a mean temperature for each month (so twelve in total), rather than just calculating one overall. These mean temperatures I have referred to in previous posts as the monthly reference temperatures or MRTs.

iii) Choosing a time-frame for the MRTs

In order to determine the temperature trend for a country or region you need to average the temperature anomalies from all of its temperature records. So you need to first calculate the anomalies for each station record. In order to calculate the temperature anomalies you need to first calculate the monthly reference temperatures or MRTs for all twelve months of the year for that record. However, in order to do this with the minimum amount of statistical error you need to consider a number of factors that may influence how you choose your reference period for the MRTs.

The first thing you need to ensure is that the MRTs are calculated over the same time-frame for each temperature record. The reason for this is the the same as was expounded in (ii) above. If the temperature records are of differing lengths, then using different MRTs for each one will have the same effect as using different values of <M> for each month in Eq. 47.3. It will distort the mean temperature trend.

Next, you want the time-frame you use to calculate the MRTs to be as long as possible so that it is as accurate as possible. Unfortunately, because the majority of temperature records tend to be fairly recent with less than 40 years of data, this means that you will lose accuracy in your mean temperature trend due to insufficient temperature records qualifying for the averaging process.

So in order to incorporate as much of the available temperature data as possible into the final mean temperature trend you need to reduce the time-frame, but not reduce it so much that the MRTs are no longer sufficiently accurate. This is basically a compromise between maximizing the time-frame you use for the MRT calculations, and maximizing the amount of data that can then be used in the mean temperature trend calculation. The most effective way to do this is to create a frequency histogram for each month m, where the value for each month is the sum of all the record lengths for station data that have valid data for that month m. So if temperature record i has Li months of data, and δi(m) is a binary function for station i that takes the value 1 if month m in record i has valid data and 0 if it does not, then the monthly data frequency function f(m) will be

 f(m) = i Li δi(m)

(47.4)

An example of such a frequency function is shown below in Fig. 47.1 for data from the Netherlands. The peak in the distribution indicates where a time-frame chosen to determine the MRTs is likely to generate the maximum amount of anomaly temperature data for inclusion in the mean temperature trend.



 Fig. 47.1: The amount of temperature data available from the Netherlands when each month is included in the MRT.

 

The data in Fig. 47.1 above suggests that the optimal 30-year time-frame that allows the most data to be included in the final trend is likely to be around 1975-2010. In fact the time-frame 1976-2005 was eventually chosen (see Post 41). However, there are two other considerations that need to be taken into account before finally settling on a time-frame. 

Ideally you want the overall temperature trend for your time-frame to be close to zero. This is to improve accuracy in the MRT calculations. Unfortunately that is not always possible. In fact it rarely is. This is because the majority of data in most temperature data tends to be fairly recent, as illustrated in Fig. 47.1, but recent data also tends to exhibit the greatest warming trend.

The final problem is the issue of missing or incomplete data. Even some of the best temperature records will have several months of missing data within your chosen MRT time-frame. The way to address this is to set minimum thresholds for the number of months of data that need to be present within the MRT time-frame in order for data from that station to be included in the mean temperature trend.

Based on the above conditions I have generally used the following criteria to determine the MRTs.

1) Select a time-frame of 30 years where there is the most data available. Failing that use a 20 year time-frame.

2) For each of the 12 monthly MRTs, only calculate the MRT if there is at least 40% of the data available within the time-frame (i.e. 12 out of 30 years). For a 20 year time frame increase this to 60% (i.e. 12 out of 20 years). 

3) If the MRT cannot be determined for any of the 12 months in a given record, then all data for that month of the year from that record is excluded for all years.

iv) Calculating the MRTs and the anomalies

The MRT for each month of a given record is calculated by first determining the time-frame for its calculation as set out above in (iii), and then averaging all the available temperature readings for that month within the time-frame. 

To see how this works in practice, consider the case of the temperature record from Volken (Berkeley Earth ID: 92832) in the Netherlands. We have already seen in Fig 47.1 above that the optimal time-frame for calculating the MRTs in the Netherlands is around 1976-2005. This will be the same for all station records in the region. The mean monthly temperatures for Volken are shown in Fig. 47.2 below with data in the MRT reference period (1996-2005) shown in yellow.

 

Fig. 47.2: The raw monthly temperature data for Volken with the data in the MRT time-frame highlighted in yellow.

 

We next find the mean temperature values for each of the twelve months January-December using the yellow data in Fig. 47.2 above. These mean temperatures are listed in Table 47.1 below. These are the MRT values described above.

 

Table 47.1
Month Mean Temperature or MRT (°C)
January
 2.4818
February
 2.9020
March
 6.0305
April
 8.6293
May
 12.9136
June
 15.5573
July
 17.5369
August
 17.4261
September
 14.3748
October
 10.6354
November
 6.0696
December
 3.4268



The MRT values are then subtracted from the raw data in Fig. 47.2 to yield the anomalies for each month. This is shown in Fig. 47.3 below, where the MRTs are plotted repeatedly in green and the anomalies are in red. It therefore follows that adding the red curve in Fig. 47.3 to the green curve in Fig. 47.3 will recreate the original data in Fig. 47.2.

 

Fig. 47.3: The MRT values and the temperature anomalies for Volken.


In the next few posts I will look at the temperature data for a number of different countries in Europe. In each case the temperature anomalies for each station will be calculated using the method outlined here. The most significant differences in the method used from country to country will be in the choice of MRT time-frame (this will normally be 1961-1990 but may be later or earlier) and the length of the time-frame (normally 30 years, but sometime 20 years will be required due to a lack of available data). In each case this will be indicated, as will the reasons for the choices.


No comments:

Post a Comment