Here are some notes on my attempts to replicate Juckes’ CVM calculations, together with a script.

I can replicate some reconstructions very closely – e.g. Esper and Jones within less than a tenth of a degree of the archived CVM, but other replications,including the Union reconstruction, are not as close. In each case, I checked the closeness of the CVM-replication by calculating the correlation to the archived CVM series and the range of discrepancies. There are a few other puzzles listed below, including Juckes’ use of the “unadjusted” Mannian PC1. Maybe Juckes will be prepared to clarify some of the problems that I encountered; but, if not, maybe others can solve the Juckesian conundra. The script contains two functions: juckes.cvm – which calculates cvm reconstructions using a network of proxies and a target period; and verification.stats which generates calibration r2,RE and verification r2,RE and CE statistics.

For nearly all cases, I first did a calibration on 1856-1980 following Juckes, and, in addition, did a calibration on 1902-1980 with verificaiton on 1856-1901 following MBH. I presented results last December at AGU on verification statistics for the archived reconstructions noting the high calibration r2 statistics, negligible verification r2 statistics and typically high danger-zone calibration DW statistics.

Juckes archived his data in zipped directories. The files are not particularly large and it would be convenient if they were directly readable without having to manually unzip them. You also need to be familiar with the ncdf format to start using the series. These are not big obstacles, but I don’t understand why the series couldn’t have been saved in text files. To save interested readers the labor of doing these collations, I’ve archived proxy and CVM reconstruction data from Juckes’ website in *.txt files so that interested readers can verify analyses without having to fool around with Juckes’ zip files or download ncdf packages.

**Esper CVM**

The Esper CVM reconstruction is based on only 5 series – of which two are foxtail sites within about one inch of one another. These two series are said to be “independent”, although they obviously aren’t.. (In some chronologies, sites as close as these two foxtails sites would be included in the same chronology (e.g. Tornetrask, Yamal). Both foxtail sites recur in the Union reconstruction again as “independent” sites. Here I got a very close replication – four 9s correlation and within 0.01 deg. I’m not sure why the replication wasn’t exact – one should be able to get exact replication at this stage. However, the closeness of the replication indicated that the emulation method was correct i.e. scale on 1000-1980; average; and re-scale on 1856-1980 target. The Esper CVM, calibrated on 1856-1980, had a high r2 in 1902-1980 and ~0 in 1856-1901. The RE statistic (1856-1901 here as later in this note) was 0.11 (0.06 for 1902-1980 calibration). In my script, I’ve shown results for calibration on 1856-1980 for each case – the results are typicallly high calibration r2, ~0 verification r2 and RE values a little higher than calibration on 1902-1980. HOwever, to summarize the discussion, results in the paragraphs below will be based on 1902-1980 calibration reserving a 1856-1901 calibration period (and readers can consult the script for further details.)

**Jones CVM**

This uses 6 series, including 3 SH series. Again I got pretty much exact replication, although I’d like to get exact replication. The calibration r2 was 0.20; verification r2 was ~0 and verification RE was -0.60. I’ve discussed some particular problems with this reconstruction – the ad hoc Tornetrask “adjustment”, which is carried into the Union reconstruction and the questionable 11th century Polar Urals record, also carried into the Union reconstruction.

**Moberg CVM**

Here my emulation had three 9s correlation, but the scale was somewhat off for unknown reasons leading to discrepancies of up to 0.05 deg C. I have no idea why. Juckes uses 10 of 18 proxies, excluding the Sargasso Sea proxy (which seems like one of the best SST proxies) and the Yakutia/Indigirka proxy – that’s the one that Juckes challenged me to show was a proxy. Once again, a characteristic pattern of statistics – high calibration r2 (0.37), negligible verificaiton r2 (0.0004) and low RE (0.11).

**Hegerl CVM**

Again my emulation had four 9s correlation, but a slight scale discrepancy leading to occasional discrepancies of over 0.12 deg C. This reconstruction only went to 1960 (because of the divergence problem), is heavily smoothed and is 2nd-generation cherry-picking of proxies from prior studies. It has characteristic high calibraiton r2 (0.37), negligible verification r2 (0.002), but highish RE (0.34 for 1856-1980 calibration and 0.48 for 1902-1980 calibration) This higher RE doesn’t mean that it’s “right” merely that, as readers can surely see, the RE statistic is sensitive to tuning the change in mean in the series.

**The Union CVM**

This is another second-generation exercise in which proxies are selected from prior studies, which themselves already contained a significant element of data mining and data snooping. .The use of two foxtail series and the Yamal substitution are symptomatic. Here I got a worse replication than in the cases discussed above. I only got 0.95 correlation and discrepancies ranged as much as 0.20 deg C. This is rather puzzling in view of the other replications. The calibration r2 was 0.49, while the verification r2 was 0.001; the verification RE was 0.22. Upate – Jean S observed that Juckes flipped the Chesapeake Mg/Ca series. Re-doing the calculation with a flipped version of this series improves the emulation correlation to 0.98 and reduces the discrepancies a little, but there’s something still off. Interestingly, the verification RE increased to 0.42 merely by flipping the Chesapeake series. Since there’s a rather well-established physical connection between temperature and Mg/Ca ratisw, the flipping seems a little opportunistic.

**Mannian CVMs**

Here we start getting into some replication puzzles. I had some difficulty sorting out these reconstructions. Juckes twitted me for taking a long time to figure out his rms normalization and indeed that did take a long time, as the methodology was nowhere mentioned in the text and has not previously, to my knowledge, been used in paleoclimate studies. In my opinion, if I have difficulty in figuring these things out, I would assume that any other reader would as well.

A first complication for CVM analysis is that one does not know the exact orientation of any PC series as used in Juckes’ CVM reconstructions from the archive. For example, the unfixed PC1 (series 20) and fixed PC1 (series 32) in Juckes’ archive are opposite. Some series are flipped for the reconstruction. I don’t disagree with flipping PC series when there is a natural interpretation justifying one orientation. To replicate the CVM reconstructions, you have to try to figure what orientation was used for each PC series Juckes used. I ened up doing a reverse engineering (of a type that Jean S and I frequently do with Mannian mysteries) in which I regressed the reconstruction against the candidate proxies and used the sign of the coefficient as an input to the CVM emulation. This worked pretty well for accomplishing an emulation although the signing process needs to be examined. On an earlier occasion, I asked Juckes for details providing exact references for which PC series (including data citations) were used for which reconstruction. Juckes was very unhelpful.

**Base Case:** Series #3 appears to be the Mannian CVM version used in Juckes’ spaghetti graph. Here there was a real surprise – although one should never be surprised when the Team is involved. When I reverse -engineered series 3 against all the candidate series, including both fixed and unfixed proxies, I got a much higher correlation using the unfixed PC1 (0.9999922) – one of the best correlations that I got, combined with a close range. So it appears highly likely that the Mannian CVM was constructed with the unfixed PC1. For this reconstruction, the calibration r2 is 0.29; verification r2 is 0.008; RE a surprisingly low 0.002. (Jean S has recently noticed that the MNH99 reconstruction has a low RE statistic). Update: Juckes does say that the spaghetti graph uses the unadjusted PC1. In MM05 (EE) as discussed in another thread, we observed that the key issue in MBH99 North American network was simply the dominance of bristlecones through having so many of them in the longer data set, rather than the artificial dominance in MBH98 through the mathematical artifice of the Mannian PC methods

**French Extrapolation**: This is a totally irrelevant and inconsequential variation. Here Juckes assesses the impact relevant and inconsiders the impact of extrapolating a French series. This variation uses the unfixed PC1 plus extension by persistence of this one series. Results are very similar to the Base Case, both in terms of closeness of replication and for verification statistics. The French series is really indistinguishable from adding a series of white noise in these reconstructions.

**The “Adjusted PC1” –** The Juckes SI stated that the “pc” reconstruction used the “unadjusted first proxy PC”. This does not appear to be what was done. The reverse-engineering through regression of the reconstruction on the candidate proxies indicates that the adjusted PC1 was used. However, there’s avery interesting twist. In this reconstruction, Jucjes used the adjusted PC1 in its inverse (downward-pointing) orientation. With this particular combination, I get a correlation of 0.996 and range of about 0.05 deg – which is about what I’ve been able to get for any of Juckes’ CVM versions of Mann. So I feel confident that this is what Juckes did. The calibration r2 is low (0.13); the verification r2 is its usual ~0 and it has a negative RE. While these statistics aren’t very good, they aren’t much different than for the Jones reconstruction.

As a caveat: I do not accept that the Mannian “adjustment” to the PC1 is “correct” in any sense as this adjustment is a can of worms which (somewhat surprisingly) I’ve not discussed on the blog. I will try to do so as Jean S is interested in this issue and others should be.

**Mannian PCs – Juckes Edition: **Juckes re-calculated PCs on a slightly reduced version of the MBH AD1000 network and reported reconstructions for several cases. The first variation (suffix “mbh”) uses re-calculated Mannian PCs (despite this method being condemned by all parties) using 25 proxies instead of the 27 in the MBH network. The two exclusions are two bristlecone series which end in the 1970s – Methuselah Walk (lower border) and Upper Timber Creek. Inconsistently, Juckes uses the Methuselah Walk series (ca535) in its upside down version in one of the two versions used in Moberg (Moberg inadvertently used two versions from the same site). These PCs are rather similar to Mannian PCs – Juckes uses an “unfixed” version of the PC1. This version has a calibration r2 of 0.28, verification r2 of 0.008 and an RE of -0.001.

**Mannian PCs – Juckes Variation** – The “mbhx” reconstruction results from another pointless variation. Here Juckes uses Mannian PCs using a calibration of 125 years instead of 79 years. I would presume that the bias would be less in such circumstances. Since the methodology is condemned, I see little purpose in analyzing this variation of the condemned methodology. The variation has virtually identical statistical performance as the prior case : calibration r2 of 0.30, verificaiton r2 of 0.007, verification RE of 0.001.

**Correlation PCs on slightly reduced network (“std”)** – The “std” PCs use the archived “std” PCs, which are equivalent to correlation PCs (the chronologies divided by their standard deviation). Juckes uses the slightly reduced network of 25 series. in Juckes’ AD1400 network, there are versions for the network without closing extrapolations and for the network with closing extrapolations. Although the formalism is carried over to the AD1000 network, PC series 16-20 are *identical *to PC series 1-15 and do not show results with the expanded network with closing fills. The purpose of having both versions on file is unclear. In our EE article, we observed that the emphasis on bristlecones in the AD1000 network arises simply from longevity without a mathematical artifice as in MBH98, so one wouldn’t expect big differences between versions, but there is some difference. Statistics are similar – the calibration r2 for the CVM was 0.30; verification r2 improved by an order of magnitude – from 0.001 to 0.1, and verification RE slightly improved to a low 0.04.

**Covariance PCs on reduced network “cen” **– again I got a decent replication of the “cen” series by using the “cen” PCs without “adjustment other than flipping PC series. . Results were similar – calibration r2 0.27; verification r2 a very “strong” 0.016 and RE of 0.012.

So where does that leave us: first, some puzzling replication anomalies, most of which should be resolvable. The most serious questions are whether the Mannian CVM in the spaghetti graph uses the unfixed PC1 and whether the “pc” variation unintentionally used an inverted PC1. The more serious issue pertains to the handling of verification statistics. Obviously there’s been lots of publicity and discussion of the failure of the verification r2 statistic. Mann told the NAS panel that he did not calculate the verification r2 statistic as that would be a “foolish and incorrect thing” to do (although there is indisputable evidence that he did calculate it.) Wahl and Ammann argued that correlation was not a relevant statistic for the low-frequency of interest to paleoclimate and advocated exclusive use of the RE statistic (a statistic curiously not reported by Juckes). Juckes didn’t report verification r2 statistics? Did Juckes, like Mann, calculate verification r2 statistics, not like the results and not report them? Or was there a little data snooping even on the calculation of verification statistics – with Juckes attempting to avoid an unpleasant answer by not calculating the statistic?

## 34 Comments

Link to AGU says http://www.climateaudit.org/http/www.climateaudit.org/pdf/agu05.ppt when it should say http://data.climateaudit.org/pdf/agu05.ppt

Sorry, Steve, I’m confused. Is the “fixed” PC1 the same as the “adjusted” PC1? And is this “adjustment” the purported correction for CO2 fertilisation in MBH99?

#2. yes. I really should write up this adjustment sometime. It’s good for a chuckle.

Steve,

If Dr. Juckes has archived his code, why don’t you analyze that to answer some of these questions? Unless it’s the typical academia uncommented mess, then I understand why it’s just easier to reverse engineer it.

Thanks, Steve! This is exactly the reason why I haven’t played with Juckes’ data yet. Now I will 🙂

Steve, I wish you would put up a post on this – in fact, I’ve asked you to do so several times. The explanation in MBH99 is mysterious to me, and it would be interesting to see just what they did. More importantly, however, the HT argue that the bcp growth spurt was “handled” in MBH99 with this adjustment, and a rebuttal, or even a discussion of the issue is a missing brick in the wall at CA.

Jean S: You can also use this script I just wrote to download & parse his data into vectors/matricies. You might want to delete the last line so it doesn’t continually re-download the file. I designed it for “turnkey operation” and so that it wouldn’t leave clutter behind, but it’s inconvenient for actual development and experimentation.

Let me know if you have any problems with it.

#6 James, I’m curious about it too. The “correction” is supposed to be the difference between filtered versions of

PC1 and the reconstruction of Jacoby and D’arrigo (1989). However, there are few oddities:

1) The referred reconstruction starts around 1630… but the used reconstruction starts 1400. So I’m guessing

Mann used his own AD1400 NOAMER set to get the reference series.

2) The scaling. I just can’t figure out the scaling of PC1, it is not even zero mean (even in the full period nor in

the calibration period).

3) The correction itself. It sure does not look like the difference between two low pass filtered series. More like a piecewise linear graph.

Moreover, I’d like to hear the explanation (from Mann) why the “correction” was only applied in AD1000 step… AD1400 PC1 closely follows AD1000 PC1, so shouldn’t the correction be used also in AD1400 (and later) steps?! Maybe, just maybe, this has something to do with RE being negative or positive 😉 You may also want to check the submission version of MBH99 (here), it has a different explanation of the procedure.

Nicholas, the link is not working for me.

Re adjusting PCs: The only adjustment done in our work is changing sign, as described in the manuscript.

re #9: Ah, that’s what I thought. So for CVM you NEED to adjust the series such that the correlation is positive with the instrumental series. I see, so the word “standardised” in Appendix A1, includes not only zero mean and unit variance in the calibration period, but also flipping of the negatively correlated series. Did you “remember” to make that correction also for the noise series when creating Table 3?

Martin, in the case you missed it, I still have a few open things here.

That would increase values near 0.6.

re #12. Yes, very expectable, and tells that Martin “forgot”. You can derive the r values for the CVM in terms of proxy r’s in a closed form (assuming that standardization is done in the calib period):

r_{CVM}=cov(X,T)/(std(X)*std(T))=1/(n*std(X))\sum cov(P_k,t)/(std(P_k)*std(T))=1/(n*std(X))\sum r_i,

where P’s are proxies, T is the instrumental series, n is the number of proxies, and X is the CVM reconstruction before variance matching. So negative proxy correlation decreases the CVM correlation and that’s why you need to flip the series. Of course, the same should be done for the reference noise series.

Martin, would you care to comment/correct your Table 3, or is this an additional thing to Willis’ comment to the journal discussion?

Jean S, oops, sorry, I mis-spelled the link. It’s here.

If you’re in Windows you’ll need to install the ncdf plug-in for R from here. If you’re in Linux you’ll need to install the “netcdf” package for your OS, then run R as root and execute the command install.packages("ncdf").

#14. Nicholas, those are very pretty functions. Thanks, Steve

Mr. McIntyre. Thanks. I just realized I didn’t name them very well. “ncdf_to_matrix” should more properly be called “ncdf_to_vectors”. You need to call vector_2d_to_matrix on the element [[4]] to turn it into an actual matrix. I named that function before I understood that a matrix was something other than a set of nested vectors. I’ll rename the function in my script and re-upload it, to avoid too much confusion.

Steve, I’m not fluent in R (so I’m not sure if you already did this), but I think “mob.ches” (4th last in the collection) should be inverted in Union.

Nicholas, thanks!

#17. no I didn’t do this. Maybe that’s why my emulation is a little off. I’ll try it. I wonder why Juckes inverted it. I don’t suppose that he’ll answer. While he’s shown great energy for things like his aborted gotcha on the MM03 diagram, he hasn’t shown much interest in actually responding to questions.

re #18: The reason and the consequence are in #12/#13…

I re-did my Juckes union emulation flipping the Chesapeake series – Juckes definitely inverted this series according to the reverse engineering regression. The replication is improved – correlation improves from 0.95 to .98 and the discrepancies are reduced by are still high.

Here’s something interesting, which is interesting for considering the infamous RE statistic. In this case, by flipping the Chesapeake series, the RE statistic is improved from 0.22 to 0.42.

Now the Chesapeake series is a Mg/Ca series and we’ve discussed Mg/Ca proxies on the blog recently in connection with ocean sediments. They are actually pretty interesting proxies as there is a plausible physical relationship between temperature and Mg/Ca levels in foraminifera – I’m not sure whether the relationship is as marked in the Chesapeake Bay situation.

And doesn’t it seem a bit opportunistic to flip a series where there seems to be an actual physical relationship between the proxy and temperature?

I entirely agree with Jean S’ observation about the impact of flipping on significance level.

Steve M,

I apologize for not having yet read Juckes, et al, but I do have a question regarding the various flipped proxies. Isn’t the Mg/Ca actually a temperature series reconstructed from the Mg/Ca ratios? Or is it strictly the ratios? In either case, the flipping of the series seems to suggest that the selected temperature proxy negatively tracks the submerged global signal that the team is attempting to extract. I share your bewilderment as to the rationale for inverting a temperature proxy to maximize the r2.

In another thread Dr. Juckes mentioned that the global signal they are hunting for may be far lower in magnitude than the local or instrumental noise in a given reconstruction. Assuming arguendo that this hypothesis is valid, why aren’t there more proxies rather than less? More proxies would boost the signal to noise ratio. Also, it seems like good science to test this hypostheis on a broader dataset. If the limited number of millenial proxies is a limiting factor, then the reconstructionists should test this hypothesis going back maybe 500 years.

#21

Yes, more proxies help if proxy noise vectors are independent. There is something interesting in this point as well. If you use INVR (as I understand it), more proxies means more degrees of freedom for the regression. And the result is reconstruction that is overfitted to the calibration temperature. For CVM there shouldn’t this problem. But CVM makes no sense anyway.. Appendix A2 says that:

IOW, it is not optimal for overfitters.

As now Martin has admitted that the flipping of the Chesapeake series was a mistake, and we know that this has a non-negligible effect on the statistics (e.g., RE 0.42-> 0.22), I think it would be appropriate (from the authors) to make a correction with updated figures and statistics to the CPD site.

Interestingly, this “small mistake” has the standard error (RMSE between CVM-Union with the unflipped and flipped Chesapeake series) of about 0.05K in the calibration period, which is one third of the standard error of the whole reconstruction!

Juckes agrees that flipping the Chesapeake series was an “error” in the CVM reconstruction. Jean S asked Juckes to identify the step in his computer code where the flip took place. Thus far, Juckes has failed to respond and based on past experience, I think that Juckes is unlikely to respond, I’ve taken a try and couldn’t locate it at a first pass. Maybe one of the computer specialists can locate the step. IF so, I’d appreciate.

Of course, Juckes should just tell us how he flipped the Chesapeake series. But since he won’t, it should be interesting to see how he did it.

Juckes archived his code at http://home.badc.rl.ac.uk/mjuckes/mitrie_files/software/mitrie_python.tar .

A warning: the code is essentially unannotated so it’s not easy to follow. My guess is that flip takes place either in the program mitrie_regress.py or do_regress.py, but I’m just guessing. It could even have taken place outside a program.

Is it in the do_hockeystick.py function? Kidding.

I found something in the do_regress.py that does some flipping:

The term sum( self.predictor[k,-self.lcal:]*self.predictand ) will have the same sign as the correlation of the proxy to the target. But on its face, this only flips for MBH and doesn’t appear to apply to the Union reconstruction. I can’t tell why it would fail to flip the “adjusted” pc in the suffix “pc” reconstruction. Maybe it’s to do with how the range is defined. Martin, are you there? The flipping in the Union CVM is still unexplained – other than it was an “error”. OK, but how did the error get there?

Now that I think of it, it would be interesting to check whether the Chesapeake series was flipped in the Moberg CVM.

Something else I just noticed in the do_regress code. Juckes smooths the proxy before doing the regression. In R, you can do regressions using the function lm and return all sorts of diagnostics. For anyone planning to do statistical work, it seems like a much more sensible language than Python, where Juckes appears to have written functions from scratch to do elementary operations.

Could have been someone else wrote it for him. Either way, this does shows the, errr, ‘idiocy’ of using Python over R. Why reinvent the wheel?

#25 bender

“Is it in the do_hockeystick.py function?”

Are you sure you didn’t mean

“Is it in the do_hockeystick_CENSORED.py function?

KevinUK

Steve: It appears to smooth only if “self.time_res” is set, and if so prints out a message. Do you know that time_res is set when this function is called? Does the message get printed? I assume time_res stands for “time residuals”?

Python is, as I understand it, an object oriented/string processing language. You can use it for anything, but I don’t think I would select it for mathematical applications. However, if one was an expert in it and no other languages, I can see why one might.

I used to use GAUSS a lot for some econometric work – it is just a relatively high-level programming language and is not specifically a statistics package. But it was more flexible than the, then, alternative RATS. And is still useful in many situations that the current package, EViews, can’t handle. But GAUSS was targeted at the statistical market in that it is optimised for matrix algebra and contained a lot of prewritten optimisation packages – so it was great for maximum likelihood. You use what you know, but some tools are better than others. I, for example, would never contemplate running a panel regression in anything other than a dedicated statistics package – that’s just asking for trouble – but basic regressions can even be run in Excel these days (although this is surely a travesty of a similar order to death by Powerpoint).

Hmmm, expert at Python vs. not expert at statistical innovation.

I know which approach I would choose, if I were a climatologist in that boat and didn’t want my hide tanned during review.

Speaking of stats, does anyone have any pointers to material on the net that covers the types of validation stats discussed here? I’m afraid my

Stats 101course notes are somewhat incomplete…#31 SCA also is widely used for TSA. Have you had a look at that?