This section provides a step-by-step guide to running the PACTA for
Supervisors analysis using the pacta.multi.loanbook
package. It includes information on the structure of the workflow, the
required functions, and the interpretation of the results.
The PACTA for Supervisors analysis consists of four main steps:
config.yml file to generate the
production-based alignment analysis.The following diagram illustrates the structure of the workflow:
Fig. 1: Structure of the Workflow
As the diagram shows, there is a logical sequence to how to run the
functions. For any of the functions to work, the previous functions must
have been run already and their outputs must be accessible as inputs to
the next functions. If you want to keep different versions of the
calculations, i.e. you want to avoid overwriting past outputs, you will
have to (1) ensure that each run is done with a new value for the
corresponding output directory set in the config.yml and
(2) that the relevant function refers to the appropriate directories of
upstream outputs. For example, if you want to run the analysis twice and
keep both results, all dir_* entries of the
config.yml should remain identical for both runs, except
for the dir_analysis entry, which should be different for
each run.
The following sub sections will provide detailed information on each
of the steps of the analysis, starting with a brief explanation of the
setup, as each of the functions will require the path to the
config.yml file as an input argument.
If you run PACTA for Supervisors interactively or from a script you
may have prepared, you will likely want to load the
pacta.multi.loanbook package and save the path to the
config.yml file in a variable first:
This allows you passing the relevant config information easily to each of the four main functions.
The first step of the analysis is to prepare your input data sets for
the requirements of the analysis. Your ABCD data will need to be
prepared and you can optionally use a custom sector split, that will
also need to be prepared. The relevant function is
prepare_abcd(), which takes configurations from the
config.yml that you have prepared. The function will store
intermediary files in the directory that you have indicated as the value
corresponding to the key dir_prepared_abcd in the
config.yml. This step only has to be run once for an
analysis. You can run this function as follows:
prepare_abcd() functionThe prepare_abcd() function has a number of options that
can be set in the config.yml file. These options
include:
remove_inactive_companies:
whether or not inactive companies should be removed from the ABCD data
(For more information on the options available, see the relevant
section on preparing the ABCD in the
vignette("config_yml").)sector_split: if
and how a company sector split should be applied in the calculations
(For more information on the options available, see the relevant
section on the sector split in the
vignette("config_yml")). Additionally, see the
documentation of the sector split methodology in
vignette("sector_split").If you want to use the sector split, you can specify which company
identifiers the split should be applied on by providing a CSV file with
the company identifiers in the split_company_ids.csv file
in the input directory. The file should contain the columns
company_id and name_company to identify the
relevant companies. Before deciding to apply the sector split, it is
strongly recommended to read the documentation on the sector split in
vignette("sector_split") first.
The next step in the analysis is to run the matching process.
Assuming you have prepared the raw loan books as explained in the
section on preparing the input data sets, you can now use the
match_loanbooks() function. This will read the raw loan
books from your inputs and attempt to match them to the prepared ABCD
data from the previous step. The function will store matched loan book
files in a directory that you have indicated as the value corresponding
to the key dir_matched_loanbooks in the
config.yml. You can run this function as follows:
After the matching process is complete, you will need to do some manual matching. This means that you will need to manually inspect the suggested matches that the tool has found and decide which ones to keep or to remove. This is especially important when using text based matching, as there is no guarantee that similar company names as identified by the algorithms will actually refer to the same companies in the raw loan books and the ABCD. Thus, a manual validation step is crucial in the analysis, as the quality of the matches will determine the quality of the results of any further calculations.
The manual matching process is not automated and will require some
time and effort on your part. You can find the matched loan books in the
directory that you have specified as the
dir_matched_loanbooks parameter in the
config.yml file. The matched loan books will be stored
in CSV files, one for each raw loan book. You can open these files in a
spreadsheet program to verify the matches. Importantly, you will need to
make a copy for each of the matched loan book files in the same
directory and rename that copy by adding the suffix _manual
to the file name. The following steps of the analysis expect this
pattern, so it is important to follow this naming convention.
You can find more detailed information about the matching process in the training material on the PACTA for Banks website in the section “PACTA for Banks Training Webinar 2” and in the corresponding slide deck.
config.yml file, to see if you can improve the match
success rate. The match success rate can be obtained based on the
manually validated matched loan books and the raw loan books as
described in the
next section on prioritization and diagnostics.match_loanbooks() functionThe match_loanbooks() function has a number of options
that can be set in the config.yml file. These options
include:
params_match_name:
multiple options to specify the approach to matching the raw loan book
with the ABCD relevant
section on matching in the vignette("config_yml")).
Note that these parameters are all based on the
r2dii.match::match_name function and pass the parameters
directly to that function. For more information on the options
available, see the documentation
of the r2dii.match package. This also covers matching based on
unique identifiers, which is the most reliable way to match companies,
but requires that both the raw loan books and the ABCD contain such
identifiers.manual_sector_classification:
whether to use a manually prepared sector classification system for
matching the loan books to in-scope PACTA sectors, see the relevant
section on matching in the vignette("config_yml")), or
not. If there is no need to use a manually prepared sector
classification file, the sector classification systems provided in
r2dii.data::sector_classifications can be used, which
currently cover the following sector classifications: GICS, ISIC, NACE,
NAICS, PSIC, SIC. If it is not possible to map the loans in your loan
books to any of these systems, you can prepare your own mapping file
that follows the same structure as the sector classification files in
r2dii.data::sector_classifications and use the config file
to instruct the code to use this file for matching. Note that this will
only be a promising approach if the classifications you are using are
sufficiently granular to map to PACTA sectors without excessive
ambiguity.There are two ways to appropriately handle misclassified loans that are identified as in-scope in the raw data set but are then not matched.
loans_to_remove.csv to the input
directory. This file should include the columns id_loan and
group_id to indicate the precise mis-classified loan and
the loan book in which it was found. This combination of loan and loan
book will then be excluded from the match success calculation.The reason why it is a good idea to either correct mis-classified loans or disregard them in the calculation of the match success rate is that a mis-classified loan cannot possibly be matched in a given sector. Therefore, no amount of work would be sufficient to improve the sector match success rate, because it is calculated against an incorrect baseline. Technically, the user is not forced to correct misclassifications, and there may be a limit to how much time should be spent on this, but it is recommended to at least correct large mis-classified loans.
If you want to apply the sector split to the loan books, you should keep all relevant sectors in the matched loan book, instead of only one sector. This is because the sector split will be applied to the matched loan books, and the sector split will be based on the sectors in the matched loan books. If you only keep one sector in the matched loan books, the sector split will not be applied correctly and may wrongly appear to reduce overall matched financial exposure. The sector split will be applied to the matched loan books in the next step of the analysis.
The next step is to prioritize the manually verified matched loan books and analyze their coverage, both relative to the raw loan book inputs (the “match success rate”) and to the production capacity in the wider economy (the “loan book production coverage”). Prioritizing the loan books means that you will only keep the best identified match for each loan and use that in the following steps of the analysis.
You will probably want to check the status of your loan book and
production coverage several times, as it is rare to get to the desired
level of matching in one iteration (see the corresponding “Coverage
Diagnostics” section in the next chapter for more details on how to
interpret the coverage values). This means you may want to repeat the
previous step (matching the loan books,
likely using different parameters for different iterations) and this
step (prioritizing the matched loan books and analyzing their match
success rate) a number of times to reach the best possible outcome. To
prioritize your matched loan books and calculate the coverage
diagnostics, you will use the prioritise_and_diagnose()
function. This call will store matched prioritized loan book files and
coverage diagnostics in a directory that you have indicated as the value
corresponding to the key
dir_prioritized_loanbooks_and_diagnostics in the
config.yml. You can run the function as follows:
prioritise_and_diagnose() functionThe prioritise_and_diagnose() function has a number of
options that can be set in the config.yml file. These
options include:
priority: the
option to set a specific order for prioritizing the matches. This is an
option that is passed directly to the
r2dii.match::prioritize function. NULL is a
valid default value and is usually a setting that works well, at least
as a starting point. For more information, see the relevant
section on the prioritization of matched loan books in the
vignette("config_yml") or the documentation
of the r2dii.match::prioritize() function here.by_group: by
which variables to group the loan books to produce grouped results of
the analysis. This parameter is used across multiple steps of the
analysis, both in the diagnostics and in the analysis. This is because
it slices and/or aggregates the loan books such that the analysis will
produce results along the indicated dimension. If no by_group
parameter is passed (i.e. NULL), all loan
books will be aggregated. Otherwise, loan books can either be kept
separate (group_id) or
grouped by any other variable that is provided in each of the raw loan
books. Although by_group is
considered a project parameter mainly relevant to the main section of
the analysis it does affect the split of the prioritzed loan books and
how their coverage metrics are returned, so it is good to be aware of
this parameter at this point. See the relevant
section on the by_group parameter in the documentation of
the config.yml file.The final step is running the analysis based on the parameters you
have set in the config.yml file. This entails both a
standard PACTA for Banks analysis and the calculation of the net
aggregate alignment metric. For both parts of the analysis, outputs will
be stored in the sub-directories ../standard/ (for standard
PACTA for Banks results) and ../aggregated/ for the net
aggregate alignment metric directory - below the directory that you have
indicated as the value corresponding to the key
dir_analysis in the config.yml. Outputs in
these sub directories will comprise tabular outputs and plots. To run
the analysis on all of your previously matched and prioritized loan
books, you will use the analyse() function as follows:
analysis() function and the overall
analysisThe analysis() function has a number of options that can
be set in the config.yml file. These options include:
scenario_source:
which source should be used for allocating climate transition scenario
pathways to the companies and loan books. This refers to the relevant
scenario publication and usually contains the name and the year of the
publication, e.g.: "weo_2023"
or "geco_2023".scenario_select:
which scenario should be used for reference in the net aggregate
alignment metric. This must be a scenario that is included in the scenario_source
indicated above.region_select:
which region to use as a reference for the analysis. This will filter
the underlying production capacity to assets in the relevant region and
will measure alignment against the scenario trajectory for the relevant
region. It must therefore be a region, for which scenario data is
available in the source selected above. Note that usually, "global"
is also a valid region.start_year: the
start year of the analysis. This must be a year that is available both
in the ABCD data and for which the scenario data has been prepared. The
loan book data is assumed to be a snapshot of the end of the same
year.time_frame: the
time frame of the analysis, which refers to the number of forward
looking years after the start year that are to be considered in the
alignment analysis. Usually this time frame is set to 5 years.
Specifically, it must be a time frame for which scenario data values and
ABCD data values are available for all sectors that are to be analyzed.
There are not many cases, in which it is expected to change the time
frame to something else than its default value of 5 years.by_group: by
which variables to group the loan books to produce grouped results of
the analysis. This parameter is used across multiple steps of the
analysis, both in the diagnostics and in the analysis. This is because
it slices and/or aggregates the loan books such that the analysis will
produce results along the indicated dimension. If no by_group
parameter is passed (i.e. NULL), all loan
books will be aggregated. Otherwise, loan books can either be kept
separate (group_id) or
grouped by any other variable that is provided in each of the raw loan
books.All these options are documented in more detail the section
on project parameters in the
vignette("config_yml").
Usually, it will be interesting to run the analysis for more than one
by_group
value, possibly also for multiple combinations of the other
parameters. You will therefore have to run the analysis as many times as
there are combinations of interest that you wish to generate results
for. If you do this, you should take into account that running the same
pacta.multi.loanbooks function multiple times with
different parameters will overwrite the results of the previous run if
you do not use new output directories for each run. This is why it is
recommended to set
up a new output directory in the config.yml file for
each run of the analysis, if you want to keep the results of multiple
runs so that you can compare the outcomes based on different parameters.
The last chapter “Advanced
Use Cases” describes how you could go about that process in more
detail. However, it is recommended going through the standard process of
the analysis completely once, before approaching more advanced use
cases.
PREVIOUS CHAPTER: Preparatory Steps
NEXT CHAPTER: Interpretation of Results