Adding a Benchmark Dataset¶
The following tutorial builds on the First Steps tutorial by describing how additional datasets may be
added to our sample benchmark comparison. We will add a Surface Upward
Shortwave Radiation dataset from the the central archive of the
Baseline Surface Radiation Network (BSRN) on the World Radiation
Monitoring Center (WRMC). We have provided a file in the appropriate
format here. We suggest
that you create a directory inside the rsus
directory called
WRMC.BSRN
and place the downloaded file inside. We will show the
appropriate part of the tree here:
DATA/
├── albedo
│ └── CERES
│ └── albedo_0.5x0.5.nc
└── rsus
├── CERES
│ └── rsus_0.5x0.5.nc
└── WRMC.BSRN
└── rsus.nc
To add this dataset to our benchmarks, we only need to add a new line
to sample.cfg
under the h2
heading which corresponds to
Surface Upward Shortwave Radiation. Here we show only the portion of
the configure file which pertains to this variable with the new
dataset addition:
[h2: Surface Upward SW Radiation]
variable = "rsus"
[CERES]
source = "DATA/rsus/CERES/rsus_0.5x0.5.nc"
[WRMC.BSRN]
source = "DATA/rsus/WRMC.BSRN/rsus.nc"
Now if we execute the ilamb-run
script as before:
ilamb-run --config sample.cfg --model_root $ILAMB_ROOT/MODELS/ --regions global
we will see the following output to the screen:
Searching for model results in /home/ncf/sandbox/ILAMB_sample//MODELS/
CLM40cn
Parsing config file sample.cfg...
SurfaceUpwardSWRadiation/CERES Initialized
SurfaceUpwardSWRadiation/WRMC.BSRN Initialized
Albedo/CERES Initialized
Running model-confrontation pairs...
SurfaceUpwardSWRadiation/CERES CLM40cn UsingCachedData
SurfaceUpwardSWRadiation/WRMC.BSRN CLM40cn Completed 1.0 s
Albedo/CERES CLM40cn UsingCachedData
Finishing post-processing which requires collectives...
SurfaceUpwardSWRadiation/CERES CLM40cn Completed 6.4 s
SurfaceUpwardSWRadiation/WRMC.BSRN CLM40cn Completed 6.3 s
Albedo/CERES CLM40cn Completed 6.8 s
Completed in 29.0 s
You will notice that on running the script again, we did not have to perform the analysis step for the confrontations we ran previously. When a model-confrontation pair is run, we save the analysis information in a netCDF4 file. If this file is detected in the setup process, then we will use the results from the file and skip the analysis step. The plotting, however, is repeated.
You will also notice that the new rsus
dataset we added ran much
more quickly than the CERES dataset. This is because the new dataset
is only defined at 55 specific sites as opposed to the whole globe at
half degree resolution. Despite the difference in these datasets, the
interface into the system (that is, the configuration file entry) is
the same. This represents an element of our design philosophy–the
benchmark datasets should contain sufficient information so that the
appropriate commensurate information from the model may be
extracted. When we open the WRMC.BSRN
dataset, we detect that the
desired variable is defined over datasites. From this we can then
automatically sample the model results, extracting information from
the appropriate gridcells.
Weighting Datasets¶
To view the results of the new dataset, look inside the _build
directory and open a file called index.html
in your favorite web
browser. You should see a webpage entitled ILAMB Benchmark Results
and a series of three tabs, the middle of which is entitled Results
Table. If you click on the row of the table which bears the name
Surface Upward SW Radiation you will see that the row expands to
reveal how individual datasets contributed to the overall score for
this variable. Here we reproduce this portion of the table.
Dataset |
CLM40cn |
---|---|
Surface Upward SW Radiation |
0.77 |
CERES (50.0%) |
0.79 |
WRMC.BSRN (50.0%) |
0.74 |
The values you get for scores may vary from this table as our scoring methodology is in flux as we develop and hone it. The main point here is that we have weighted each dataset equally, as seen in the percentages listed after each dataset name. While this is a reasonable default, it is unlikely as you add datasets that you will have equal confidence in their quality. To address this, we provide you with a method of weighting datasets in the configuration file. For the sake of demonstration, let us assume that we are five times as confident in the CERES data. This we can express by modifying the relevant section of the configuration file:
[h2: Surface Upward SW Radiation]
variable = "rsus"
[CERES]
source = "DATA/rsus/CERES/rsus_0.5x0.5.nc"
weight = 5
[WRMC.BSRN]
source = "DATA/rsus/WRMC.BSRN/rsus.nc"
weight = 1
and then running the script as before. This will run quickly as we do
not require a reanalysis for a mere change of weights. Once the run is
complete, open again or reload _build/index.html
and navigate to
the same section of the results table. You should see the change in
weight reflected in the percentages as well as in the overall score
for the variable.
Dataset |
CLM40cn |
---|---|
Surface Upward SW Radiation |
0.78 |
CERES (83.3%) |
0.79 |
WRMC.BSRN (16.7%) |
0.74 |
You may notice that if you apply the weighting by hand based on the output printed in the table, that you appear to get a different result. This is because the HTML table output is rounded for display purposes, but the scores are computed and weighted in full precision.