The NIST algorithm, as it stood in 2013, is described in two papers by Pintar and colleagues in the ITS9 conference proceedings. Paper 1 , Paper 2
For some existing datasets there exists a range of documentation which may prove useful. Homepages for the current global datasets of land surface air temperature are available at:
This page contains links to a variety of papers and resources which may prove useful to participants in the SAMSI/IMAGe/ISTI summer program event held in July 2014. This meeting benefitted from substantial support from SAMSI and IMAGe.
The basic data holdings have been collated as part of the ISTI databank activities and can be found physically at ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/monthly/stage3/. Data is available in either ASCII or CF-compliant netcdf. There exist a variety of software options for reading netcdf files including Python, R and Java. The essential attributes to be aware of of these data are:
Many further details on these data holdings are available on the databank pages. Note that currently we are in a beta 4 release (June 4th) but we will be going to a full first version release before the workshop. This will alter some records compared to beta 4 but formats and essential data and metadata characteristics will be unchanged.
There will exist some benchmark analogs to at least some subset of the holdings by the workshop commencement. There is a discussion paper on this at http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html. These benchmarks will exactly mimic the data holdings in terms of data formats and data completeness. We will provide links when available to these resources.
There also exist a set of US benchmarks that were used in a previous study. If available they will be linked here.
The COST HOME project applied benchmarking to a broad suite of algorithms. Its homepage is at http://www.homogenisation.org/v_02_15/
Pairwise algorithm DEVELOPMENT breaks returned
Note that this is very much development returned break locations and magnitudes shared here for convenience of participants. They should not be taken to infer anything about future products at this stage. Much work remains to be done still.
and the combined elements sorted by station & change date.
1 - the size of the adjustment
2 - element
4 - station ID
6 & 7 - earlier segment begin/end date (end date == changepoint date)
9 & 10 - later segment begin/end date
16 - (next to the last) number of stations used to determine adjustment
NOTE: if > 100 indicates adjustment based upon station history change date + pairwise inhomogeneity detection. if == 100 indicates adjustment only found by checking station history date (no pha detection).
The following are case studies using a target station with relatively good data in its period of record, and a list of its neighbors within 500 km
link: http://tinyurl.com/samsi-workshop-cases (The zip file is also provided at the bottom of the page)
Each file is one station (ID is the name of the text file). At the top is the metadata of the target station, and below is the metadata for all the neighbors. The format of the metadata follows the ISTI Metadata format that can be found here. For the neighboring stations, the distance from the target is also shown near the end of the line.
The current stations that have been chosen are as follows:
CA002400600 CAMBRIDGE_BAY_ARPT (24 neighbors)
CA003031093 CALGARY_INTL (1,352 neighbors)
CA005063075 WALKER_LAKE (164 neighbors)
USW00024157 SPOKANE_INTL_AP (2,034 neighbors)
USW00023185 RENO_TAHOE_INTL_AP (1,312 neighbors)
SZ000001940 BASEL_BINNINGEN (365 neighbors)
MA000067083 ANTANANARIVO/IVATO (7 neighbors)
Seth has formatted the data that may be easier for some users. Location is here
Each zipfile contains:
network.[region]: the original file listing all stations
within a given distance (500 km) of the target station.
station.data: id, lat, lon, elevation, and first and last timesteps for
each station in the region
timeseries: data files for each station in the region. The data files
are named with the station ID, and each line in the file has year,
month, tmin, tmax, and tmean. -99.99 indicates missing.
synoptic: data files for each timestep over the entire region. The data
files are named by date (YYYYMM), and each line in the file has station
ID, tmin, tmax, and tmean. -99.99 indicates missing.
Climate dataset algorithms
Statistical algorithms library
Relevant papers repository (with links to OA versions where available)
Exposure, instrumentation, and observing practice effects on land temperature measurements, Blair Trewin, WIRES Climate Change, 2010, 1, DOI: 10.1002/wcc.46. pdf
http://www.met.hu/en/omsz/kiadvanyok/idojaras/index.php?id=82 contains links to several methods papers as part of COST HOME. Including:
HOMER : a homogenization software – methods and applications
Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klancar, Brigitte Dubuisson, and Petr Stepanek
Domonkos, P. 2011: Adapted Caussinus-Mestre Algorithm for Networks of Temperature series (ACMANT). Int. J. Geosci, 2, 293-309, doi: 10.4236/ijg.2011.23032.
Caussinus, H. and Mestre, O.: Detection and correction of artificial shifts in climate series. Appl. Statist., 53, part 3, 405-425, DOI: 10.1111/j.1467-9876.2004.05155.x, 2004.
Szentimrey, T.: Development of MASH homogenization procedure for daily data. Proceedings of the fifth seminar for homogenization and quality control in climatological databases. Budapest, Hungary, 2006; WCDMP-No. 71, 123-130, 2008.
Menne, M. J., Williams, C. N. jr., and Vose, R. S.: The U.S. historical climatology network monthly temperature data, version 2. Bull. Am. Meteorol. Soc., 90, no.7, 993-1007, doi: 10.1175/2008BAMS2613.1, 2009.
Blair Trewin. A daily homogenized temperature data set for Australia. pdf
See also related technical document freely available at http://cawcr.gov.au/publications/technicalreports/CTR_049.pdf
Mestre, Olivier, Christine Gruber, Clémentine Prieur, Henri Caussinus, Sylvie Jourdain, 2011: SPLIDHOM: A Method for Homogenization of Daily Temperature Observations. J. Appl. Meteor. Climatol., 50, 2343–2358.
Venema et al. 2012 discusses benchmarking results for a range of algorithms: OA at http://www.clim-past.net/8/89/2012/cp-8-89-2012.html
Williams et al., 2012 discusses results of applying the US benchmarks to USHCN: Available at ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/algorithm-uncertainty/williams-menne-thorne-2012.pdf
Matthews et al., 2012 discusses outcomes of a SAMSI sponsored workshop on uncertainty quantification in climate data record construction. OA at http://journals.ametsoc.org/doi/pdf/10.1175/BAMS-D-12-00042.1
Morice et al., 2012 outlines the uncertainty quantification for HadCRUT4. Available at http://hadobs.metoffice.com/hadcrut4/HadCRUT4_accepted.pdf
There are also examples for other datasets not directly land surface air temperatures but principals may be transferrable:
Mears et al., 2011 discuss uncertainty estimates for upper air temperatures from the satellite Microwave Sounding Unit. Available at http://images.remss.com/papers/rsspubs/Mears_JGR_2011_MSU_AMSU_Uncertainty.pdf . See also http://www.remss.com/measurements/upper-air-temperature#Uncertainty
Thorne et al., 2011 discusses uncertainty estimation, benchmarking and conditional probability recombination of estimation for radiosonde temperatures.
Kennedy et al., 2011 discuss in depth uncertainties in SST measurements. See http://hadobs.metoffice.com/hadsst3/part_1_figinline.pdf and http://hadobs.metoffice.com/hadsst3/part_2_figinline.pdf