Benchmarking and Assessment


This page exists to document progress of the Benchmarking and Assessment Working Group led by Kate Willett

Contents:

Members (as of 31/5/13):

Kate Willett (UKMO Hadley Centre, UK) (Chair)
Claude Williams (NCDC, USA)
Ian Jolliffe (Exeter Climate Systems, University of Exeter, UK)
Robert Lund (Department of Mathematical Sciences, Clemson University, USA)
Lisa Alexander (Climate Change Research Centre, University of New South Wales, Australia)
Stefan Brönniman (University of Bern, Switzerland)
Lucie A. Vincent (Climate Research Division, Environment Canada, Canada)
Steve Easterbrook (Department of Computer Science, University of Toronto, Canada)
Victor Venema (Meteorologisches Institut, University of Bonn, Germany)
David Berry (National Oceanography Centre, Southampton, UK)
Rachel Warren (College of Engineering, Mathematics and Physical Sciences, University of Exeter, UK)
Giuseppina Lopardo (Istituto Nazionale di Ricerca Metrologica (INRiM), Italy)
Renate Auchmann (Oeschger Center for Climate Change Research & Institute of Geography, University of Bern, Switzerland)
Enric Aguilar (Centre for Climate Change, Universitat Rovira i Virgili, Spain)
Matt Menne (NCDC, USA)
Colin Gallagher (Department of Mathematical Sciences, Clemson University, USA)
Zeke Hausfather (Berkeley Earth,USA)
Thordis Thorarinsdottir (Statistical Analysis, Pattern Recognition, and Image Analysis (SAMBA), Nowegian Computing Centre, Norway)

Ex-officio:
Peter Thorne (NERSC, Norway)

Purpose:

To facilitate use of a robust, independent and useful common benchmarking and assessment system for temperature data-product creation methodologies to aid product intercomparison and uncertainty quantification.


Blogsite for discussion of ideas/thoughts/work in progress:

http://surftempbenchmarking.blogspot.com
This blogsite is open to all and constructive comments are welcome.


RECENT POSTS:


2013/08 Benchmarking Workshop Agenda and Report
2013/05 Call for regional inhomogeneity info

2012/01 Mailing list on homogenisation of climate data
2012/01 New article: Benchmarking homogenization algorithm...benchmarking, climate variability, HOME, Homogenization, open-access publishing
2012/01 Benchmarking of USHCN     

2011/12 Metadata     
2011/11 Team Validation - thoughts from the Homogenisation...     
2011/11 Team Corruption - Thoughts from the Homogenisation...     
2011/11 Team Creation - thoughts from the Homgenisation Me...     
2011/11 2011 Progress Report Now Published     
2011/18 Team Validation     
2011/07 Benchmark for real-world problems     
2011/07 Another radiosonde benchmarking paper     
2011/07 Generating inhomogeneous worlds  
2011/07 Benchmarking temperature networks
2011/06 Homogenization seminar    
2011/06 If I had but one analog I could create ...     
2011/06 Big questions with which to test homogenisation al...     
2011/03 Creating the Benchmark 'Truths'     
2011/02 Assessing the Benchmarks     
2011/02 Review paper references     
2011/02 My first time using blog...     
2011/01 Homogenization aspects that scare me     
2011/01 Kate's Pseudo-worlds work     
2011/01 Benchmarking and Assessment Open Comment - January...

RELATED BLOG POSTS:



July 2013 - Kate Willett's presentation at the Benchmarking workshop, NCDC, USA: presentation

June 2013 - Kate Willett's presentation at the 12th International Meeting on Statistical Climatology (IMSC), Jeju, South Korea: An overview of benchmarking data homogenisation procedures for the ISTI

November 2012 - Kate Willett's presentation at the 5th ACRE Meeting, Toulouse, France: presentation

June 2012 - Peter Thorne's presentation at the Earth Temperature Network workshop, Edinburgh, UK: presentation

May 2012 - Kate Willett's NCDC Visit with Claude Williams and Robert Lund - presentation
                  - Kate Willett's Clemson University/Robert Lund visit - All things CLIMATE DATA and our maths and statistics headaches

December 2011 - Ian Jolliffe's 5th International Verification Methods Workshop, Melbourne, Australia:Benchmarking and Assessment (Verification) of Homogenisation Algorithms for the International Surface Temperature Initiative (ISTI), report.

October 2011 - Steve Easterbrook's WCRP Open Science Conference, Denver, CO, USA:Benchmarking and Assessment of Homogenisation Algorithms for the International Surface Temperature Initiative (ISTI). See Steve Easterbrook's blog.
                        - Kate Willett's COST HOME 7th Seminar for Homogenisation and Quality Control of Climate Databases, Budapest, Hungary: Creating a Global Benchmark Cycle for the International Surface Temperature Initiative.. Meeting Report (see blog too).

May 2011         - Kate Willett's presentation for MARCDATIII, Frascati, Italy, 2011: Is it good enough? Benchmarking homogenisation algorithms and cross-cutting with efforts for land observations

April 2011         - Kate Willett's Poster for EGU 2011: Robust Benchmarking of Homogenisation Algorithms for the Surface Temperature Initiative

February 2011  - Kate Willett's informal presentation at the National Climate Data Center (NC, USA): Devising a Benchmarking System for Homogenisation Methods of Climate Data-Products


Working Group Documents:

ISTI/Benchmarking Glossary - open for editing

White Paper 9 formed the basis for breakout group discussion at the Exeter meeting. Discussion outcomes are summarised in the final session.


Outline for the planned Homogenisation Review Paper to be written by the working group members


Terms of Reference agreed by the Benchmarking and Assessment Working Group (12/12/13)
Terms of Reference agreed by the Benchmarking and Assessment Working Group (15/6/11)

Working draft of Benchmarking and Assessment Paper describing the methodological background to benchmarking and assessment

October 2011 Progress Report of the Benchmarking and Assessment working group submitted to and accepted by the Steering Committee 10/11/2011

October 2012 (submitted Feb 2013) Progress Report of the Benchmarking and Assessment working group submitted to and accepted by the Steering Committee xx/xx/2013

October 2013 (submitted Nov 2013) Progress Report of the Benchmarking and Assessment working group submitted to and accepted by the Steering Committee xx/xx/2013
 

Objectives and Timelines:


Activity

Details

Owner

Due date

Advocacy of the benchmarks and support for users

All group members should be encouraging use of the benchmarks and providing support where necessary

Benchmarking and Assessment working group, Steering Committee

Ongoing

Up to date reference list of work on inhomogeneities in surface temperatures on the website (www.surfacetemperatures.org/benchmarking-and-assessment-working-group)

Ongoing throughout but will have formed the basis for defining error model spread.

Benchmarking and Assessment working group led by Kate Willett

Ongoing

Benchmarking and Assessment working group Terms of Reference

These will fit in with the Implementation Plan and Steering Committee Terms of Reference

Benchmarking and Assessment working group

June 2011

(Completed)

Benchmarking Position paper submitted for peer review

A descriptive paper presenting background concepts and methods for creation of the benchmark programme co-authored by the working group

Benchmarking and Assessment working group led by Kate Willett

August 2013

Analog-known-worlds proof-of-concept

Create software to produce analog-known-worlds and a proof-of-concept scale and submit methods paper

Team Creation lead by Robert Lund and Kate Willett

August 2013

Analog-known-worlds global scale production

Produce analog-known-worlds for as many ISTI land meteorological databank stage 3 stations as possible

Team Creation - code probably run and data hosted by Kate Willett

August 2013

Analog-error-worlds concepts finalised

Decide number and type of error models to create (including how to ensure that these are blind tests for each cycle for some period of time)

Team Corruption - lead by Victor Venema/Claude Williams

July 2013

Validation/Assessment concepts finalised

Decide on number and type of tests with which to perform validation

Team Validation - lead by Ian Jolliffe

August 2013

Analog-error-worlds proof-of-concept

Create software to produce analog-known-worlds and a proof-of-concept scale and submit methods paper (if desired)

Team Corruption - lead by Claude Williams

October 2013

Working Group meet up/code sprint

Attempt to get as many together as possible, possibly a networked code spring with a USA and Europe (Australia?) hub

All - kickstarted organisation by Kate Willett in April 2013

July 2013

Validation/Assessment proof-of-concept

Create software and score system/intercomparison table to run the validation on a proof-of-concept scale and submit methods paper (if desired)

Team Validation - lead by Ian Jolliffe

October 2013

Analog-error-worlds global scale production

Produce analog-error-worlds from the analog-known-worlds ready for distribution

Team Corruption - lead by Claude Williams

November 2013

Benchmark Cycle Official release of analog-error-worlds

Release first official benchmarks, publicise widely.

All - lead by Kate Willett

November/December 2013

Validation/Assessment global scale production

Produce software and framework ready for running on the global scale - automated or manual?

Team Validation - lead by Ian Jolliffe

End 2013

Benchmark platform design

Create webpage showing step-by-step 'How to benchmark' with appropriate links to data, validation and intercomparison tables with registration so that feedback can be provided and contact maintained

All - lead by Kate Willett

December 2014 - ideally earlier but more important to get benchmarks created first

Benchmark cycle release of analog-known-worlds answers

Publish the analog-known-worlds underlying the analog-error-world benchmarks

All - lead by Kate Willett

June 2016

Workshop to discuss results of benchmarking

To include Benchmarking and Assessment working group and all analysts who submitted

Benchmarking and Assessment working group

June 2016 ready for late 2016?

Summary paper submitted to peer reviewed journal

To include assessment of cycle 1 and recommendations for cycle 2

Benchmarking and Assessment working group

2017

Begin cycle 2 – creation of benchmarks and release – monthly and daily


Benchmarking and Assessment working group

2017



Reference Literature:

Papers on benchmarking:

Peter Thorne et al.'s overview of ISTI including the need for benchmarking:
Thorne, P., Willett, K. M., et al., 2011: Guiding the creation of a comprehensive surface temperature resource for 21st century climate science. BAMS, 92 (11), ES40-ES47, doi: 10.1175/2011BAMS3124.1

Kate Willett's work on 'pseudo-worlds' - a set of benchmarks for homogenisation of daily Tmax and Tmin
- please leave comments on the blogsite thread
Example plots to be uploaded shortly

Holly Titchner et al's work on radiosonde error models for validating the homogenisation

Titchner, H. A., Thorne, P. W., McCarthy, M. P. et al. 2009: Critically Reassessing Tropospheric Temperature Trends from Radiosondes Using Realistic Validation Experiments. Journal Of Climate. 22, 465-485. 
 

Claude William et al's work on homogenising USHCN with benchmarking of the methods:

Williams, C. N., Jr., M. J. Menne, and P. Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. J. Geophys. Res., 117, D05116 doi:10.1029/2011JD016761
BLOGPOST

Victor Venema et al's work on benchmarking the COST HOME homogenisation algorithms:
Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma., 2012: Benchmarking homogenization algorithms for monthly data, Climate of the Past, 8, pp. 89-115, 2012.
BLOGPOST

____________________________________

Papers on known sources of inhomogeneity:

Harrison, R. G., 2010: Natural ventilation effects on temperature within Stevenson screens. Quarterly Journal of the Royal Meteorological Society, 136, 253-259, DOI:10.1002/qj.537.

Harrison, R. G., 2011: Lag-time effects on a naturally centilated large thermometer screen, Quarterly Journal of the Royal Meteorological Society, 137, 402-408, DOI:10.1002/qj.745.

Lopardo, G., F. Bertiglia, S. Curci, G. Roggero and A. Merlone, 2013: Comparative analysis of the influence of slar radiation screen ageing on temperature measurements by means of weather stations. International Journal of Climatology, DOI: 10.1002/joc.3765.

---------------------------------------------------------------

Papers on homogenisation:

Begert, M., Zenklusen, E., Haberli, C., et al., 2008: An automated procedure to detect discontinuities; performance assessment and application to a large European climate data set. Meteorologische Zeitschrift. 17, (5), 663-672.
 
DeGaetano, A. T., 2006: Attributes of several methods for detecting discontinuities in mean temperature series. Journal of Climate. 19 (5), 838-853. 
 
Ducré-Robitaille, J.-F., Vincent, L. A. & Boulet, G., , 2003: Comparison of techniques for detection of discontinuities in temperature series. International Journal of Climatology, 23, 1087-1101.
 
Easterling, D. R. & Peterson, T. C., 1995: The effect of artificial discontinuities on recent trends in minimum and maximum temperatures. International Minimax Workshop on Asymmetric Change of Daily Temperature Range, SEP 27-30, 1993 COLLEGE PK, MD. Atmospheric Research. 37, 19-26.
 
Menne, M. J. & Williams, C. N., 2005: Detection of undocumented changepoints using multiple test statistics and composite reference series. Journal Of Climate. 18, 4271-4286.
 
Peterson, T. C., Easterling, D. R., Karl, T. R., et al., 1998: Homogeneity adjustments of in situ atmospheric climate data: A review. International Journal Of Climatology. 18, 1493-1517.
 
Trewin, B., 2010: Exposure, instrumentation, and observing practice effects on land  temperature measurements. WIREs Climate Change. 1. 490-505.
 
Vincent, L.A., 1998: A technique for the identification of inhomogeneities in Canadian temperature series. Journal of Climate, 11, 1094-1104.
 
Wang, X. L., Wen, Q. H., and Wu, Y., 2007: Penalized Maximal t Test for Detecting Undocumented Mean Change in Climate Data Series. Journal of Applied Meteorology and Climatology. 46, 916-931. DOI:10.1175/JAM2504.
 
Wang, X. L., 2008a: Accounting for autocorrelation in detecting mean-shifts in climate data series using the penalized maximal t or F test. Journal of Applied Meteorology and Climatology. 47, 2423–2444. DOI: 10.1175/2008JAMC1741.1
 
Wang, X. L., 2008b: Penalized maximal F test for detecting undocumented mean-shift without trend change. Journal of Atmospheric and Oceanic Technology, 25, 368-384. DOI:10.1175/2007/JTECHA982.1.
 
Wang, X. L., Chen, H., Wu, Y. et al., 2010: New techniques for detection and adjustment of shifts in daily precipitation data series. Journal of Applied Meteorology and Climatology. (accepted)



Links to Related Projects:

www.homogenisation.org - website for the COST HOME action on homogenisation
Last modified by Kate Willett: Jul 19th 2011


Ċ
Kate Willett,
Dec 23, 2011, 6:00 AM
Ċ
Kate Willett,
Nov 11, 2011, 10:46 AM
Ċ
Kate Willett,
Feb 5, 2013, 4:26 AM
Ċ
Kate Willett,
Nov 8, 2013, 7:42 AM
Ċ
Kate Willett,
Nov 11, 2011, 10:15 AM
Ċ
Kate Willett,
Jul 15, 2013, 5:36 AM
Ċ
Kate Willett,
Jan 8, 2013, 4:37 AM
Ċ
Kate Willett,
Aug 19, 2013, 10:36 AM
Ċ
Kate Willett,
Feb 17, 2011, 7:04 PM
Ċ
Kate Willett,
Nov 30, 2011, 8:14 AM
Ċ
Kate Willett,
Nov 11, 2011, 10:46 AM
Ċ
Kate Willett,
Nov 7, 2013, 3:25 AM
Ċ
Kate Willett,
Nov 7, 2013, 3:26 AM
Ċ
Kate Willett,
Nov 7, 2013, 3:26 AM
Ċ
Kate Willett,
Aug 15, 2011, 3:19 PM
Ċ
Kate Willett,
Feb 6, 2014, 4:09 AM
Ċ
Kate Willett,
Jan 28, 2011, 1:19 AM
Ċ
Kate Willett,
Feb 18, 2014, 4:13 AM
Ċ
Kate Willett,
Feb 5, 2013, 4:27 AM
Ċ
Kate Willett,
Jun 21, 2013, 3:27 AM
Ċ
Kate Willett,
Jul 18, 2011, 10:21 AM
Ċ
Kate Willett,
Jun 21, 2013, 3:28 AM
Ċ
Kate Willett,
Jun 18, 2013, 7:07 AM
Ċ
Kate Willett,
Apr 4, 2011, 1:00 AM
Ċ
Kate Willett,
May 30, 2013, 3:36 AM
Ċ
Kate Willett,
Jun 6, 2013, 9:51 AM
Ċ
Kate Willett,
May 13, 2013, 7:08 AM
Ċ
Kate Willett,
Dec 4, 2013, 9:15 AM
Ċ
Kate Willett,
Dec 16, 2013, 7:47 AM
Ċ
Kate Willett,
Nov 19, 2013, 5:04 AM
Ċ
Kate Willett,
Jan 28, 2011, 1:19 AM
Ċ
Kate Willett,
Jan 28, 2011, 1:20 AM
Ċ
Kate Willett,
Jul 18, 2011, 10:45 AM
Ċ
Kate Willett,
Apr 4, 2011, 1:01 AM
Ċ
Kate Willett,
May 24, 2011, 3:14 PM
Ċ
Kate Willett,
Jul 15, 2013, 4:08 AM
Ċ
Kate Willett,
Jan 8, 2013, 4:36 AM
Ċ
Kate Willett,
Jul 15, 2013, 4:07 AM
Ċ
Kate Willett,
Nov 6, 2013, 9:45 AM
Ċ
Kate Willett,
Nov 30, 2011, 8:18 AM
Ċ
Kate Willett,
Jan 8, 2013, 4:42 AM
Ċ
Kate Willett,
Dec 16, 2013, 8:16 AM
Ċ
Kate Willett,
Jul 18, 2011, 10:34 AM
Ċ
Kate Willett,
Jul 19, 2011, 1:07 PM
Comments