updating readme documentation

Former-commit-id: 64fe286c4ab46b7255d8b6eeea3f6eba8b618858
2026-01-10 07:18:05 -05:00 · 2020-03-13 17:47:06 -04:00
parent 02da9529b7
commit 683d47ef27
5 changed files with 100 additions and 39 deletions
--- a/README.md
+++ b/README.md
@@ -54,22 +54,18 @@ Log out and back in for this to take effect.

 ## [Visualization](viz/README.md)

-SIMoN stores all of the data outputs from the models as documents in a Mongo database (the `simon_mongodb` container).
-
-You can retrieve a document and save it as a JSON file using the `export.sh` bash script in the `viz` directory.
-
-Once you've retrieved a document and saved it as a JSON file, plot the data on a choropleth map using the `plot.py` script in the `viz` directory. (Just make sure to pip install `requirements.txt` first.)
+Use the scripts in the `viz` directory to create a choropleth map visualization of the model data.
 ```
 cd viz/
-./export.sh <model_name> <year>
+./export.sh <model_name> <year> <doc_name>.json
 pip install -r requirements.txt
-python plot.py --data <your_mongo_doc>.json
+python plot.py --data <doc_name>.json
 ```
 A new HTML file will be created. Open this file in a web browser to display the Bokeh visualization.
 ![precipitation](viz/demo/2035_precipitation.png)

 ## [Architecture](broker/README.md)

-SIMoN is written in Python, and uses Docker to manage its models and their integration. In order to increase flexibility and scalability, each model runs in discrete iterations (called increment steps) within its own Docker container. An additional container hosts the system's centralized broker, which orchestrates model runs by receiving each model's data outputs via a ZeroMQ publish-subscribe messaging pattern, then redirecting the data to any models that request it. The models can then use this data as their inputs for the next incremental step in the system’s synchronized run.
+SIMoN is written in Python, and uses Docker to manage its models and their integration. Each model runs in its own Docker container. An additional container hosts the system's centralized broker, which orchestrates model runs and shares data among models using a ZeroMQ publish-subscribe messaging pattern.

 The Docker containers used for the broker and the models are built from the [Ubuntu 18.04](https://hub.docker.com/_/ubuntu/) image, with the [Python 3.6](https://packages.ubuntu.com/bionic-updates/python3-dev) package layered on top. The container used for the database is built from a [MongoDB image](https://hub.docker.com/_/mongo/).
--- a/broker/README.md
+++ b/broker/README.md
@@ -2,7 +2,7 @@

 ## Description

-SIMoN is written in Python, and uses Docker to manage its models and their integration. In order to increase flexibility and scalability, each model runs in discrete iterations (called increment steps) within its own Docker container. An additional container hosts the system's centralized Broker, which orchestrates model runs by receiving each model's data outputs via a ZeroMQ publish-subscribe messaging pattern, then redirecting the data to any models that request it. The models can then use this data as their inputs for the next incremental step in the system's synchronized run.
+SIMoN is written in Python, and uses Docker to manage its models and their integration. In order to increase flexibility and scalability, each model runs in discrete iterations (called increment steps) within its own Docker container. An additional container hosts the system's centralized Broker, which orchestrates model runs by receiving each model's data outputs via a ZeroMQ publish-subscribe messaging pattern, then redirecting the data to any models that request it with their input schemas. The models can then use this data as their inputs for the next incremental step in the system's synchronized run.

 Upon the initialization of a SIMoN run, the broker publishes status messages to the models. Each model connects to the broker, bootstraps on the initialization data provided in its `config` directory, publishes its output data,  then waits for other models to do the same. Once all models have received their necessary data inputs from the published data outputs of other models (from the previous iteration), they will perform their next iteration. In this way, models will run in tandem. Once the final iteration has completed, the Broker and the model containers will close down.

@@ -13,3 +13,10 @@ The Inner Wrappers are interfaces tailored to each model, and support the models
 ## Configuration

 Adjust parameters in the `build/config.json` file.
+
+    * `mongo_port` is the port that the MongoDB container will use. The default Mongo port is 27017.
+    * `boot_timer` is the number of seconds that the broker will wait for all models to initialize, before it sends the shutdown signal. Try extending this time if models will take longer to load and process their configuration data in the custom `configure()` method in their inner wrapper.
+    * `watchdog_timer` is the number of seconds that the broker will wait to receive a status message from a model, before it sends the shutdown signal. If a model crashes, the broker will wait for this number of seconds before stopping the SIMoN run.
+    * `max_incstep` is the number of increments that the SIMoN run should perform before closing down.
+    * `initial_year` is the year corresponding to the configuration data (increment step 0).
+    * `models` lists the ID / unique name of each model that will be included in the SIMoN run.
--- a/graphs/README.md
+++ b/graphs/README.md
@@ -1,24 +1,41 @@
 # SIMoN Granularity Graph Tool

-Build the granularity graphs used for data translation in the SIMoN software application.
-
-Copyright 2020 The Johns Hopkins University Applied Physics Laboratory
-Licensed under the MIT License
-
-## Description
-
-A key difficulty in combining models is resolving their data dependencies and discrepancies. By using the SIMoN software, a modeler is able to join models with disparate geographic definitions together in various combinations, allowing models to run together and exchange data that have heterogeneous definitions of geography.
+This tool constructs the granularity graphs used for data translation in the SIMoN software application.

 ## Granularities

+A key difficulty in combining models is resolving their data dependencies and discrepancies. By using the SIMoN software, a modeler is able to join models with disparate geographic definitions together in various combinations, allowing models to run together and exchange data that have heterogeneous definitions of geography.
+
 SIMoN currently integrates models of population, power systems, water systems, and climate change. These domains each have their own hierarchies of geography, which include political, topographical, regulatory, and latitude-longitude grid boundaries.

-In order to translate data from its models across granularities, SIMoN uses shapefiles to define rigorous geographies in a partially ordered set of geographic partitions (e.g., counties, watersheds, power regions, and latitude-longitude squares). SIMoN creates a corresponding directed acyclic network graph representing all the granularities, their corresponding entities, and their relationships to each other. The individual models feed each other updated data inputs at synchronized time intervals, and traverse the network graph to translate their data from one granularity to another. A sample granularity graph is provided, but modelers can extend it or create a graph of their own, by modifying and using the `graphs/build.py` script.
+In order to translate data from its models across granularities, SIMoN uses shapefiles to define rigorous geographies in a partially ordered set of geographic partitions (e.g., counties, watersheds, power regions, and latitude-longitude squares). The sample shapefiles provided in the `graphs/shapefiles` directory were clipped to the land boundary of the contiguous United States, in order to have consistent scope. The geometries were compressed / simplified using a distance-based method (the Douglas-Peucker algorithm) with a tolerance of 1 kilometer. They use [EPSG:3085](https://epsg.io/3085-1901) NAD83(HARN) / Texas Centric Albers Equal Area as their coordinate reference system.
+
+SIMoN creates a corresponding directed acyclic network graph representing all the granularities, their corresponding entities, and their relationships to each other. The individual models feed each other updated data inputs at synchronized time intervals, and traverse the network graph to translate their data from one granularity to another. A sample granularity graph is provided, but modelers can extend it or create a graph of their own, by modifying and using the `graphs/build.py` script.
+
+The granularities in the provided granularity graph are:
+    * `usa48` (a single region for the contiguous United States)
+    * `state` (49 regions: the lower 48 states plus Washington, DC)
+    * `county` (3108 counties, including Washington, DC)
+    * `nerc` (22 North American Electric Reliability Corporation regions)
+    * `huc8` (2119 HUC 8 regions)
+    * `latlon` (209 latitude-longitude grid squares)

 ## Aggregators and Disaggregators

 The modeler can choose transformation functions, called aggregators and disaggregators, to translate data between compatible geographic definitions in various ways. These aggregators and disaggregators must conform to a set of mathematical axioms, including a partial inverse property, which are designed to create a provable notion of data consistency and reduce the possibility of self-propagating errors.

+Aggregators are functions used to combine data from sibling nodes to their parent node. Disaggregators are functions used to distribute data from a node to its children.
+
+Aggregators:
+* `simple_sum`: the values of the sibling nodes are added together. The sum is the new value of their parent.
+* `simple_average: the parent's new value is the mean of the children's values.
+* `weighted_average: the parent's new value is the mean of the children's values, weighted by each child's geographic area.
+
+Disaggregators:
+* distribute_identically: the parent node's value is assigned to each one of its children.
+* distribute_uniformly: the parent node's value is divided evenly among each of its children.
+* distribute_by_area: each child node is assigned a portion of the parent's value, proportional to the child's geographic area.
+
 ## Basic Usage

 This tool uses shapefiles to generate two JSON graphs: an abstract graph and an instance graph.
@@ -29,7 +46,7 @@ This will start the `simon-graph` Docker container, and create an abstract graph

 ## Advanced Usage

-Adjust parameters in the `graphs/config.json` file.
+Adjust these parameters in the `graphs/config.json` file:

 * `projection` is the EPSG coordinate reference system code that all of the shapefile polygons will be translated to, in order to ensure consistency. For the most precise results, use the original EPSG of the shapefiles.
 * `scale_factor` divides the area of each shapefile's polygons by a scalar, in order to use better units. For example, the provided shapefiles have length units of meters and area units of square meters. The default scale_factor is 1 million, in order to translate the area unit of the provided shapefiles from square meters to square kilometers. Change the scale factor to 1 to preserve the original units.
--- a/models/README.md
+++ b/models/README.md
@@ -2,7 +2,7 @@

 ## Description

-The SIMoN framework is designed to be extensible and flexible, providing tools for modelers to integrate new models, domains, and the corresponding geographic definitions easily. It currently connects predictive resource models from several different domains, including as climate, energy, water, and population. These SIMoN models are low fidelity, designed as proxies for larger models developed by the community.
+The SIMoN framework is designed to be extensible and flexible, providing tools for modelers to integrate new models, domains, and the corresponding geographic definitions easily. It currently connects predictive resource models from several different domains, including climate, energy, water, and population. These SIMoN models are low fidelity, designed as proxies for larger, more sophisticated models to be developed by the community.

 ![example models](models_diagram.png)

@@ -15,9 +15,11 @@ The SIMoN framework is designed to be extensible and flexible, providing tools f
    * water_demand
    * gfdl_cm3

+    For more details on these models, see "Example models" below.
+
    To use a different set of models, see the instructions on how to "Add a new model" and "Remove a model" below.

-2. Optionally, adjust the models' output schemas, in order to change the granularity of their output data. The recognized granularities (all lowercase) are:
+2. Optionally, adjust the models' output schemas, in order to change the granularity of their output data. Open the JSON file in a model's `schemas/output` directory with a text editor. Each variable in the schema has a `granularity` property. Change the `value` field of this property to one of these recognized [granularities](../graphs/README.md) (all lowercase):
    * usa48
    * state
    * county
@@ -25,42 +27,75 @@ The SIMoN framework is designed to be extensible and flexible, providing tools f
    * huc8
    * latlon

+You can also adjust the values of the `agg` and `dagg` properties to use different [aggregators and disaggregators](../graphs/README.md) to perform granularity translations.
+
 ## Add a new model

-1.  Choose the models that you want to run together in the SIMoN framework. Note their interdependencies carefully, and make sure that each model has a source for all of its necessary data inputs. Sample models are provided in the `examples` directory, where each model has its own directory. Each model's dependencies are specified in its `schemas/inputs` directory. For example, the `power_supply` model relies on the `power_demand` model, and the `power_demand` and `water_demand` models both rely on the `population` model. The `population` and `gfdl_cm3` models do not rely on any other models, and can each be run independently.
-2.  Once you have a complete set of models where all dependencies are satisfied, add the unique name of each of the models to the "models" list in `broker/config.json`.
-3.  Create an entry for each model in the "services" section in `build/docker-compose.yml` and specify the path to each model's directory.
-    ```
-    model_name_1:
-        build: ../models/examples/model_name_1/
-        volumes:
-            - ../models/examples/model_name_1:/opt:ro
-4. In the `models` directory, copy the `template` directory, which serves as a blueprint for new models. Rename `template` to the ID (unique name) of your new model.
-5. Within this new directory are several required directories and files that need to be modified:
+1.  Choose the set of models that you want to run together in the SIMoN framework. Note their interdependencies carefully, and make sure that each model has a source for all of its necessary data inputs. Each model's dependencies must be specified in its `schemas/inputs` directory. Sample models are provided in the `examples` directory, where each model has its own directory. For example, the sample `power_supply` model relies on the `power_demand` model, and the `power_demand` and `water_demand` models both rely on the `population` model. The `population` and `gfdl_cm3` models do not rely on any other models, and can each be run independently.
+2. In the `models` directory, make a copy of the `template` directory, which serves as a blueprint for new models. Rename the `template` directory to the ID (unique name) of your new model. This will be the new model's dedicated directory.
+3. Within this new directory are several required directories and files that need to be modified:
    * `src/` stores the model's source code
        * `inner_wrapper.py`
            * This file receives input data from other models, performs operations on it, and returns the output data that will be sent to other models.
            * You must replace the template name with the the model's ID (its unique name).
            * You must implement the `configure()` and `increment()` abstract methods.
                * `configure()` simply loads the initialization data from the `config` directory.
-                * `increment()` performs the model's calculations by calling any of the its custom function(s) (e.g., my_function_1) defined in other scripts.
-        * `my_function_1.py`
-            * additional code that your model uses
-        * `my_function_2.py`
-            * additional code that your model uses
-    * `schemas/input/` stores JSON schemas that incoming JSON data messages must validate against. SIMoN uses the `jsonschema` Python package to validate the data messages against the schemas.
+                * `increment()` performs the model's calculations by calling any of the function(s) defined in its custom modules (e.g., `my_module.py`).
+        * `my_module.py`
+            * any additional code that your model uses
+    * `schemas/input/` stores JSON schemas that incoming JSON data messages must validate against. SIMoN uses the `jsonschema` Python package to validate the data messages against the schemas. There should be one input schema JSON file for each of the other models that this model receives data from. Adjust the `granularity` property in the input schema so that the input data that arrives in the model's inner wrapper will be in the granularity that is needed for your custom `my_module` functions to work.
 	* `*.json`
-        * granularity: specifies the granularity of input data that this model needs. SIMoN will translate incoming data to this granularity before sending it to the model's inner wrapper.
+        * granularity: specifies the granularity of input data that this model needs. SIMoN will translate incoming data to this granularity before sending it to the model's inner wrapper. If your inner wrapper needs the data to be in a different granularity in order to work with it, adjust the granularity value in the input schema accordingly.
    * `schemas/output/` stores JSON schemas that outgoing JSON data messages must validate against. SIMoN uses the `jsonschema` Python package to validate the data messages against the schemas.
        * `*.json`
        * granularity: specifies the granularity of data that this model will output. SIMoN will translate outgoing data to this granularity after receiving it from the model's inner wrapper.
    * `config/` stores JSON objects with the initial data and parameters needed to bootstrap the model and run its first time step.
        * `*.json`
+4.  Once you have a complete set of models where all dependencies are satisfied, add the unique name of each of the models to the "models" list in `broker/config.json`.
+5.  Create an entry for each model in the "services" section in `build/docker-compose.yml` and specify the path to each model's directory.
+    ```
+    model_name_1:
+        build: ../models/examples/model_name_1/
+        volumes:
+            - ../models/examples/model_name_1:/opt:ro
+    ```

 ## Remove a model

 1.  Before removing a model from SIMoN, make sure that no other models rely on it for their dependencies. For example, the `gfdl_cm3` model can safely be removed because no other models depend on it for their data inputs. However, the `power_demand` model cannot be removed without also removing the `power_supply` model, which relies on `power_demand` as an input.
 2.  Remove the name of the model from the "models" list in `broker/config.json`.
 3.  Remove the entry for the model in the "services" section of `build/docker-compose.yml`.
-4.  The model will no longer be included in future SIMoN runs. Note, however, that the model's dedicated directory is left intact.
+4.  The model will no longer be included in future SIMoN runs. Note, however, that the model's dedicated directory is left intact, so that it can be added back in easily.
 5.  To add the model back into SIMoN, simply repeat steps 2 and 3 from "Add a new model."
+
+## Example models
+
+### Population (Holt's linear fit)
+The population model uses Holt's linear regression from the `statsmodel` Python package to predict population per county. It extrapolates US Census Bureau population data from 2000 to 2016 into the future, making a population prediction for each future year. The model gives more weight to the most recent historical data, so the population change from 2015 to 2016 is more significant than the change between 2000 and 2001.
+Config (initialization) data: historical population per county (US Census Bureau, [2000-2010](https://www.census.gov/data/datasets/time-series/demo/popest/intercensal-2000-2010-counties.html), [2010-2016](https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html), version published in 2016).
+Input from other models: none.
+Output: a dictionary that maps each county FIPS code to its population.
+
+### Power Demand
+The power demand model aggregates county population to state population, then multiplies these values to the corresponding data for state consumption per capita, returning power demand per state.
+Config (initialization) data: historical population (2016) per county and state consumption per capita ([US Energy Information Administration](https://www.eia.gov/electricity/data/state/)).
+Input from other models: output from the population model.
+Output: a dictionary that maps each county FIPS code to its power demand per capita, in megawatt hours (Mwh).
+
+### Power Supply
+The power supply model calculates power supply in the contiguous United States by assuming the power demand is met in equilibrium (supply = demand). It aggregates the counties' demand to the state level by aligning every FIPS code to its corresponding state code. This is then compared to the state power supply profiles, and the ratio of the two is used as a scaling factor for each county. This scaling factor is then multiplied to the county level demand to determine power supply per county.
+Config (initialization) data: historical population (2016) per county and state energy profiles ([US Energy Information Administration](https://www.eia.gov/electricity/data/state/)).
+Input from other models: output from the power demand model.
+Output: a dictionary that maps each county FIPS codes to its power supply, in megawatt hours (Mwh).
+
+### Water Demand
+The water demand model calculates water consumption per capita per year by taking irrigation and thermoelectric, total consumptive use, fresh in Mgal/d, subtracting that value by thermoelectric recirculating, total consumptive use, fresh in Mgal/d. It divides that value by the total population for that county, then multiplies the value per day by 365 to get the value per year.
+Config (initialization) data: historical population (2016) per county and water use per county ([United States Geological Survey](https://www.sciencebase.gov/catalog/item/get/5af3311be4b0da30c1b245d8), 2015).
+Input from other models: output from the population model.
+Output: a dictionary that maps each county FIPS codes to its water demand, in millions of gallons (Mgal) per year.
+
+### GFDL CM3
+The [GFDL CM3](https://www.gfdl.noaa.gov/coupled-physical-model-cm3/) climate model, published by the National Oceanic and Atmospheric Administration ([NOAA](https://www.gfdl.noaa.gov/about/)), uses representative concentration pathways to determine atmospheric conditions and consequent effects on various areas including temperature, precipitation, and evaporation. This model does not perform any of the actual calculations, but simply retrieves pre-calculated data from the config file.
+Config (initialization) data: RCP data for temperature, precipitation, and evaporation ([NOAA](ftp://nomads.gfdl.noaa.gov/CMIP5/output1/NOAA-GFDL/GFDL-CM3)).
+Input from other models: none.
+Output: a dictionary that maps each latitude-longitude grid square to its evaporation (mm) and precipitation (mm) values, plus a single scalar value for global temperature (Celsius).
--- a/viz/README.md
+++ b/viz/README.md
@@ -1,5 +1,11 @@
 # SIMoN Visualization

+SIMoN stores all of the data outputs from the models as documents in a Mongo database (the `simon_mongodb` container).
+
+You can retrieve a document and save it as a JSON file using the `export.sh` bash script in the `viz` directory.
+
+Once you've retrieved a document and saved it as a JSON file, plot the data on a choropleth map using the `plot.py` script in the `viz` directory. (Just make sure to pip install `requirements.txt` first.)
+
 At each timestep, each model outputs its data for that timestep as a dictionary. The dictionary maps geographic IDs to values for the corresponding region. The particular IDs will depend on the geographic granularity. This dictionary is stored as a JSON document in the Mongo database (the `simon_mongodb` container). Since the data maps each region to a particular value, it can be easily visualized on a choropleth map.

 You can retrieve a document and save it as a JSON file using the `export.sh` bash script in the `viz` directory.