Reply on RC1

My main issue with the research as written is that it does not incorporate the recent literature on the subject of modelling ENSO. Recent advances relating to the paleoclimate component of CMIP6 have been included, but there is a raft of publications looking at ENSO in the historical and future scenarios that are not considered. This is particularly noticeable with the discussion of Yeh et al (2009) there are certainly more recent works looking at future changes in ENSO flavour. These are reviewed in the upcoming IPCC AR6 and I strongly advise waiting until that is published before completing your revisions. – This is a fair point, and we will include a number of references to more recent publications on ENSO in the historical and future scenarios, such as Frederiksen et al. (2020), Freund et al. (2020), Beobide-Arsagua et al. (2021) and Jiang et al. (2021):

Specific comments: L22. You give the date for the whole mPWP here, but never explicitly mention that PlioMIP is aimed at an interglacial within this period. KM5c is mentioned in passing later on, but without dates. -We will include the specific years of the KM5c time slice in the Introduction and explicitly mention why this time slice was chosen. L68. (or longer periods) -> (so longer periods) -Will be corrected L85. Brown et al (2020) did not look at future scenarios. Rather they used the idealised warming scenarios of the CMIP DECK. -True, will be corrected L123. Interpolating variables onto a common grid prior to analysis is not best practice. This would act to smooth out spatial variations and lop-off extremes. I do not expect you to re-perform all of your analysis, as I suspect that it will make little difference to your conclusions. You may want to mention why this is the case in your methods section. Try to avoid this approach in future -it should be performed at the last possible moment, as part of the ensemble averaging. -Thanks for mentioning that. We do not expect this to affect the results greatly, as the majority of the models use an ocean resolution that is close to 1°x1°. Only IPSLCM6A-LR uses a 1/3° ocean resolution in the tropics, where the smoothing out of the extremes could play a role. We already clearly state in the methods section that we use the regridded data, and we will add some discussion of why we expect this to not impact our conclusions. Table 1. Why are you not using the model acronyms that are part of the CMIP controlled vocabulary? Will this not prevent your study coming up on Google Scholar searches and the like? -It is chosen to follow the PlioMIP1/2 naming conventions and to be consistent with other PlioMIP2 studies. However, we will include CMIP vocabulary in the Table (when it is different from the PlioMIP naming). L155. Factor 3.0 -> factor of 3.0 -Will be corrected L165. I believe that "standardised" should be "normalised" here. L182. Please mention if the monthly SST anomalies are detrended prior to the PCA. -We did not detrend the SST anomalies prior to the PCA (although it is stated we do in L185). We redid the PCA with a linear trend removed from the SST anomalies and find no significant changes (the error is in the order of 1e-3 on EOF of order 1). The difference in the percentage of variance explained is in the order of 0.1%. We will include the new (detrended) EOFs in the figures of the revised manuscript.
L230. GISS-E2.1-G was not in PlioMIP1 -rather that was GISS-E2-R. Please justify why you consider these to be iterations of the same model, rather than different generations as other studies often do. -Thank you for the clarification, we will remove the mention in L234  Earth3.3, GISS2.1G, HadCM3,  IPSLCM5A, IPSLCM6A, MIROC4m, and MRI2.3), in many cases these peaks do not exceed the threshold for statistical significance. For example, IPSLCM5A has their maximum Eoi400 peak at 9 years, but this specific peak does not exceed the 90%-CI. Consequently, it is difficult to provide robust conclusions on the significant spectral changes per ensemble member. We therefore prefer to stick to the methodology of binning in the 1.5-10 year period range (Fig. 4), focussing on those peaks that are indeed significant according to our analysis. This peak counting procedure is not particularly meaningful when performed for one model, as the number of significant peaks can be low (as low as 1 peak above the 99%-CI for EC-Earth3.3's Eoi400 spectrum), thus making it difficult to provide robust conclusions as well.
L274. Please provide more explanation about the word "normalised" -is the information about the ENSO amplitude (in oC) contained within the EOF or the PC? -Here it means that EOFs are scaled to be positive in the Nino3.4 region. We will clarify this. Next to this, the EOF patterns are scaled with their standard deviation, therefore removing the ENSO amplitude. This is done in order to compare the spatial pattern only.
L282. Cite Fig. S3 to support this. -OK, we will do this.
L289. Please rephrase to only use word "region" to have a single meaning. -We will use 'area' here instead. L325. This sentence reads as if it encompasses the warmer E. Pac coastal temperatures. These are instead a feature of insufficient ocean model resolution to capture the coastal upwelling. -We will correct this: 'along the east Pacific coast'  'in the east Pacific'. Furthermore, we will rephrase the next sentence: 'The 'cold bias' in the east Pacific can be expected since, firstly, the pre-industrial simulations are compared with historical observations and, secondly, the models have insufficient resolution to reproduce the cold conditions of coastal upwelling systems, such as the L333. Choosing a red-green color scheme is unhelpful to readers who are colorblind. -We will make sure to correct this.
L340. The alphabetic indictors have not been introduced earlier. Why do you start at P? Please add letters to Fig 7. -The alphabetic indicators or letters are included in the circles Figure7d. We will enlarge them slightly for increased visibility. We started at 'P' instead of 'A' in order to avoid confusion with the subfigure count.
L352. Warming trends (up to average year of 1970) are less than 1oC globally, let along tropical pacific. Put nuance in your expectation. -We will compute the average equatorial Pacific SST difference between the pre-industrial simulations and the HadISST observations instead of providing an estimate now. Do note that the HadISST data range we have chosen to include (1920-2020) does not cover the full pre-industrial period and may thus show relative larger warming than when including the full historical period.
L358. Please be consistent with your longitude names. Fig 7, showing these boxes, goes 0--360 not -180--180. -We will make sure to be consistent in the full manuscript.
L358. Is there a reference to choose these regions? You later discuss how these regions are inappropriate for 2 models. Maybe using max and min in two larger region would be more helpful? -We have chosen the two regions based on the ensemble mean and HadISST equatorial SSTs as shown in Figure 8, such that we expect most models to have their minimum and maximum SSTs in one of the regions. We did not choose these regions based on a reference. Actually, only for MRI2.3 one can say that the regions are a poor choice. But the reason here is that MRI2.3's equatorial Pacific SSTs in the pre-industrial are an outlier, when comparing to the ensemble mean as well as the HadISST result. We could use a larger region indeed, but this would also cause a smoothing out of the min and max values. L375. I feel that it is worth stressing that Brown et al include many of the models used here. -Agreed, will include this.
Sect 3.2.3 Any lines of best fit would not pass through the origin in either Fig 10a or 10b. What are the implications for that on your interpretation? Are you expecting an external condition to cause a roughly 25% amplitude reduction and then the zonal gradient to control the deviations from that? -To clarify: Fig10a and b show the ENSO