Databank structure and availability
The databank is currently being finalized for a version 1 release product of monthly temperature holdings following the acceptance of the methods paper. The databank in the current version (beta4) is available from ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/. The merged monthly databank product is available from ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/monthly/stage3/ and there is a readme describing the merge method here. Any data hosted on the databank server are made available without restriction. Data is, where possible, available in the following four stages:
Analysts wishing to create products from the databank should run their algorithms on at least the recommended merge both in the most recent version and also frozen version 1.0.0 (which will be maintained when superceded). This latter frozen version will form the basis for the analogs created as part of the benchmarking and assessment.
Below is summarized information on the upcoming release. Earlier beta releases can be found at the previous versions page.
Summary of the databank
The first release will consist of a recommended version of the merge of over 50 source decks of data submitted to the databank to date. These sources range from a single station record to compilations of tens of thousands of stations. Most consist of 'raw' data (although unknown processing may have occurred so it is safer to consider these 'basic' data holdings). Where we know sources have been quality controlled or homogenized this is explicitly documented in the flags added in Stage 2. Before proceeding to the merge process some prioritization of the sources is required. The recommended merge version prioritization places GHCN-D raw which is the de facto stage 3 daily deck as top priority. This ensures vertical coherency between daily and monthly holdings allowing a degree of data archeology to finer temporal scales. Remaining decks are ordered based upon a combination of their provenance, whether the data have been quality controlled or adjusted and station length. Priority is given to records of better provenance (where stage 0 exists), that are 'raw' and that have longer records. Some nuancing of the ordering has been performed based upon expert judgement.
The merge is performed pairwise between the master deck and each candidate deck until no data decks remain to be considered.
Between the master deck and each source deck the merge process is a two step process. First geographical metadata (location, height, station name and station record start date) are compared for each candidate station to all master stations. Stations which are sufficiently similar are flagged for further consideration. If no stations are sufficiently similar the station is deemed unique and added to the master database. If stations were flagged as potential metadata matches then further consideration is given as to data similarity over any overlap periods. Here two decisions are possible: it is the same station and a merge is performed; or it is unique and added to the master holdings as a new station. At all points a third option exists to withhold the candidate station. This could arise from any of: metadata conflicts; data redundancy or a lack of sufficient confidence to assign either as unique or a merge station. Effectively this automated process is trying to replicate what a human analyst would do but cannot given the several million pairwise comparisons required. The process is first run on max / min records because artefacts may affect different errors into these two elements. Averages are then calculated and all sources with average temperatures are presented again for a second run through of the merge algorithm. Particularly early in the record for some stations solely monthly average values (calculated in an unknown manner) are available.
The first release will consist of just under 32,000 stations. Some graphical summaries are presented below comparing to the GHCNMv3 product currently used as the basis for the land surface temperature products released by NOAA NCDC and NASA GISS.
The merged product is being released as a recommended product and a number of variants. These variants allow analysts to understand the sensitivity of their results to reasonable choices as to how to do the merge.
The code to convert the stage 1 data decks to a common stage 2 format and to create the merged stage 3 holdings is freely available. This is unsupported code made freely available without restriction. Compilers are made available where this is necessary.
The databank follows the version control protocols set out for GHCNv3. Namely:
The formal designation is glsd.x.y.z[optionally -betan].yyyymmdd
versioning format facilitates documentation and communication of updates
and modifications that occur as a normal part of the envisaged life-cycle of the databank. The frequency of regular updates to the databank has yet to be formally determined.
How to submit data to the databank
Have data or a lead to data? Please email email@example.com
Want help in approaching data holders? Please feel free to use the data request cover letter.
Want to know what data is required and how to submit it? Please follow the data submission guidelines.
Databank effort publicity
A poster for presentation at the World Climate Research Program's Open Science Conference outlines the state of databank devlopment as of October 2011.
A presentation was given at the 9th International Temperature Symposium while the databank was still under development so some aspects have been superceded.
A poster describing the beta release was presented at the GFCS User Conference that preceded WMO Congress Extraodinary Session in October 2012 (original here is in higher resolution but third party hosted)
Other data sources
Marine surface data, necessary to characterize truly global changes, are available through http://icoads.noaa.gov/.
Databank Working Group Membership (as at 1/2/14)
Jay Lawrimore: NOAA National Climatic Data Center
WMO Region I:
Albert Mhanda (ACMAD, Niger)
WMO Region II:
Vyacheslav Razuvaev, Russian Research Institute of Hydrometeorological Information
Kenji Kamiguchi: Japan Meteorological Agency
Vlad Shaimardanov, Russian Research Institute of Hydrometeorological Information
WMO Region III
Matilde Rusticucci: Univ. of Buenos Aires, Argentina
Madeleine Renom: Universidad de la Republica, Montevideo, Uruguay
Waldenio Gambi Almeida: Instituto Nacional de Pesquisas Espaciais, Centro de Previsão de Tempo e Estudos Climáticos, Brazil
WMO Region IV
Matthew Menne: NOAA National Climatic Data Center
Byron Gleason: NOAA National Climatic Data Center
Jared Rennie: CICS NC, North Carolina State University
Steve Worley: National Center for Atmospheric ResearchColin Morice, UK Met Office
John Christy: University of Alabama Huntsville
WMO Region V
Meghan Flannery: Australian Bureau of Meteorology
WMO Region VI
Albert Klein-Tank: KNMI
Jeremy Tandy: UK Met Office
David Lister: CRU, UEA, UK
Peter Thorne: NERSC, Bergen, Norway
Working Group Documentation
Terms of reference (previous terms of reference)
Terms of reference for databank merge task team