Custom Confrontations¶
The Confrontation
object we described in the previous tutorial is
implemented as a python class. This tutorial will assume you have some
familiarity with python and classes. We will try to make the concepts
easy to follow, but if you find that you need to learn about class
basics, we recommend the python documentation on them here.
In this tutorial we will explain the implementation of a custom
Confrontation
by way of example. We will detail the code that we
use in the ILAMB system for benchmarking the global net ecosystem
carbon balance. The generic Confrontation
will not work in this
case because:
There is no variable in the model outputs which directly compares to the benchmark datasets. The variable
nbp
must be integrated over the globe for it to be comparable.The analysis we want to perform is different than our standard mean state analysis. We will compare the bias and RMSE of the integrated quantity, but then we would like to also view the accumulation of carbon over the time period.
So we have some special-purpose code to write. We will present here the implementation bit by bit and explain each function and section. However, if you are following along and implementing the class as you read, we recommend you look at the original source which may be found on our github site. This is because the amount of tab space gets shifted in the generation of this document. I will also omit the documentation strings and imports here to keep the code short.
The Constructor¶
The first thing we will do, is define a new class ConfNBP
which
will be derived from the Confrontation
base class. This means that
all the methods and member data of the Confrontation
class will be
part of ConfNBP
automatically. This is helpful, as it means that
the developer only needs to rewrite the functions that must behave
differently in his benchmark. So we define our class by writing:
class ConfNBP(Confrontation):
def __init__(self,**keywords):
# Ugly, but this is how we call the Confrontation constructor
super(ConfNBP,self).__init__(**keywords)
# Now we overwrite some things which are different here
self.regions = ['global']
We place this class in a file which bears the name of the class
itself, ConfNBP.py
. The __init__
function is what is known as
the constructor. A class can be thought of as a template and the
constructor is the function which runs when a new instance is
created. If I were to type:
a = Confrontation()
b = Confrontation()
I would be creating two instances (a
and b
) of the
Confrontation
class and the constructor would run separately for
each of them. The constructor for the Confrontation
class takes in
keywords as arguments. This means that instead of requiring users to
place arguments in a defined order, we allow them to specify arguments
by their names. We did this in the previous tutorial, when we
initialized a Confrontation
in the following way:
c = Confrontation(source = os.environ["ILAMB_ROOT"] + "/DATA/rsus/CERES/rsus_0.5x0.5.nc",
name = "CERES",
variable = "rsus")
The keywords we used here were source
, name
, and
variable
. You have a lot of control over what a Confrontation
does via these keywords. A full list of them is available in the
documentation. For the most
part, we want to use the Confrontation
constructor as it is, and
so we could just leave the __init__
function
unimplemented. However, one of the keywords of the Confrontation
constructor is not valid in our benchmark–the regions
keyword. This is a keyword where a user may specify a list of GFED
regions over which we will perform the analysis. In the case of our
ConfNBP
, this is not a valid option as the benchmark data is
integrated over the globe.
For this reason, we implement our own __init__
function where we
manually call the constructor of the Confrontation
class. This is
handled by using the python function super
. This references the
super object of our ConfNBP
object and allows us to manually call
its constructor. After this constructor has run, we simply overwrite
the value of the regions
member data to be the only valid value.
Staging Data¶
We need to implement our own stageData
functionality as models do
not provide us with the integrated nbp
directly. We will go over
its implementation here in pieces. First we get the observational
dataset:
def stageData(self,m):
# get the observational data
obs = Variable(filename = self.source,
variable_name = self.variable,
alternate_vars = self.alternate_vars)
So you will see first that the function signature has a self
argument. This will be true of all member functions of our class. This
is a special argument which is used to access all the member data of
the class itself. The second argument is m
which is a instance of
a ModelResult
. Just below we use the Variable
constructor to
extract the source data for this benchmark using member data of our
class. The member data source
refers to the full path of the
benchmark dataset, variable
is the name of the variable to extract
from within, and alternate_vars
is a list of alternative names for
variable names which we can accept. By convention, we use the obs
name to refer to the returned Variable
. Next, we need to extract
the same data from the model:
# the model data needs integrated over the globe
mod = m.extractTimeSeries(self.variable,
alt_vars = self.alternate_vars)
mod = mod.integrateInSpace().convert(obs.unit)
obs,mod = il.MakeComparable(obs,mod,clip_ref=True)
# sign convention is backwards
obs.data *= -1.
mod.data *= -1.
return obs,mod
Here we use the ModelResult
instance m
to extract the
variable, and immediately integrate it over all space. We also ensure
that the units match the observations and, again by convention, we
refer to this in a variable we call mod
. Then we use the function
MakeComparable to ensure
that both the obs
and mod
variables are on the same time
frame, trimming away the non-overlapping times. Finally we multiply the
data associated with the observations and models by a negative one
because of a unwanted sign convention.
The main concept of the stageData
function is that you are passed
a ModelResult
and you need to return two Variables
which
represent comparable quantities from the observational and model
datasets. The ILAMB system does not care how you came about these
quantities. Here we have used more of the ILAMB package to create the
quantities we wish to compare. However, you may prefer to use other
tools or even interface to more complex methods of extracting relevant
information. The ILAMB package simply defines an interface which makes
the results of such data manipulation usable in a consistent system.
Confront¶
We also need to implement our own confront
functionality. This is
because most of our mean state is not relevant
for our benchmark, and we would like to study the accumulation of
carbon which is not part of the procedure. As before we will break up
the confront
function we implemented and explain it in sections:
def confront(self,m):
# Grab the data
obs,mod = self.stageData(m)
As with the stageData
function, the confront
function takes in
a ModelResult
instance m
and immediately calls the
stageData
function we just implemented. The observational dataset
and model result are returned as represented as Variables
and
named obs
and mod
, respectively. For both datasets, we want to
study the accumulated amount of carbon over the time period:
obs_sum = obs.accumulateInTime().convert("Pg")
mod_sum = mod.accumulateInTime().convert("Pg")
as well as compare the mean values over the time period:
obs_mean = obs.integrateInTime(mean=True)
mod_mean = mod.integrateInTime(mean=True)
and then the bias and RMSE:
bias = obs.bias(mod)
rmse = obs.rmse(mod)
The functions, accumulateInTime
, convert
, integrateInTime
,
bias
, and rmse
are all member functions of the Variable class. So you can see that
this keeps analysis clean, short, and human readable. This handles the
majority of the analysis which we want to perform in this
confrontation. However, the ILAMB system is geared towards determining
a score from the analysis results. In this case, we will score a model
based on the bias and the RMSE in the following way:
\[S_{\text{bias}} = e^{-\left| \frac{\int \left(obs(t) - mod(t)\right)\ dt }{\int obs(t)\ dt } \right|}\]\[S_{\text{RMSE}} = e^{-\sqrt{ \frac{\int \left(obs(t) - mod(t)\right)^2\ dt }{\int obs(t)^2\ dt } }}\]
This is accomplished in the following way:
obs_L1 = obs.integrateInTime()
dif_L1 = deepcopy(obs)
dif_L1.data -= mod.data
dif_L1 = dif_L1.integrateInTime()
bias_score = Variable(name = "Bias Score global",
unit = "1",
data = np.exp(-np.abs(dif_L1.data/obs_L1.data)))
for the bias score and:
obs_L2 = deepcopy(obs)
obs_L2.data *= obs_L2.data
obs_L2 = obs_L2.integrateInTime()
dif_L2 = deepcopy(obs)
dif_L2.data = (dif_L2.data-mod.data)**2
dif_L2 = dif_L2.integrateInTime()
rmse_score = Variable(name = "RMSE Score global",
unit = "1",
data = np.exp(-np.sqrt(dif_L2.data/obs_L2.data)))
for the RMSE score. The code here is a bit more ugly than the previous
and reflects ways in which the Variable
object needs to grow. At
this point the analysis results are finished and we are ready to save
things into result files. First, we will rename the variables in the
following way:
obs .name = "spaceint_of_nbp_over_global"
mod .name = "spaceint_of_nbp_over_global"
obs_sum .name = "accumulate_of_nbp_over_global"
mod_sum .name = "accumulate_of_nbp_over_global"
obs_mean.name = "Period Mean global"
mod_mean.name = "Period Mean global"
bias .name = "Bias global"
rmse .name = "RMSE global"
We rename the variables because the ILAMB plotting and HTML generation
engine is built to recognize certain keywords in the variable name and
subsequently render the appropriate plots. Since our obs
and
mod
variables represent spatial integrals of nbp
, we name them
with the keyword spaceint
. The accumulate
keyword also will
cause a plot to automatically be generated and placed in the HTML
output in a predetermined location. This feature makes the
presentation of results trivial. The scalar quantities are also
changed such that their names reflect the table headings of the HTML
output.
Finally we dump these variables into netCDF4 files. The first file
corresponds to the current model being analyzed. The dataset is opened
which will be saved into a logical path, with descriptive names. The
Variable
class has support for simply asking that an instanced be
dumped into an open dataset. Any dimension information or units are
automatically recorded:
results = Dataset("%s/%s_%s.nc" % (self.output_path,self.name,m.name),mode="w")
results.setncatts({"name" :m.name, "color":m.color})
mod .toNetCDF4(results)
mod_sum .toNetCDF4(results)
mod_mean .toNetCDF4(results)
bias .toNetCDF4(results)
rmse .toNetCDF4(results)
bias_score.toNetCDF4(results)
rmse_score.toNetCDF4(results)
results.close()
We also write out information from the benchmark dataset as well. However, since confrontations can be run in parallel, only the confrontation that is flagged as the master need write this output:
if self.master:
results = Dataset("%s/%s_Benchmark.nc" % (self.output_path,self.name),mode="w")
results.setncatts({"name" :"Benchmark", "color":np.asarray([0.5,0.5,0.5])})
obs .toNetCDF4(results)
obs_sum .toNetCDF4(results)
obs_mean.toNetCDF4(results)
results.close()
That is it¶
While more involved than simply adding a dataset or model result to
the analysis, that is all we need to implement for our custom
confrontation. As you can see, we managed to encapsulate all of the
relevant code into one file which interfaces seamlessly with the rest
of the ILAMB system. In the case of ConfNBP.py
, we have included
it in the main repository for the ILAMB package. However, users may
create their own confrontations and host/maintain them separately for
use with the system. We see this as a first step towards a more
general framework for community-driven benchmarking.