readme updates

Former-commit-id: 43dd31813b8165ffae41dbea0a6013a5ab26bbaf
2026-01-10 07:18:05 -05:00 · 2020-03-16 13:55:30 -04:00
parent b239363936
commit a45b2ac377
4 changed files with 67 additions and 74 deletions
--- a/README.md
+++ b/README.md
@@ -19,15 +19,10 @@ To address this challenge, the SIMoN modeling framework integrates independently
 ## Requirements

 Supported operating systems:
- - Linux
- - macOS
+ - Linux and macOS
+   - install [Docker](https://docs.docker.com/install/) and [Docker Compose](https://docs.docker.com/compose/install/)
 - Windows 10
-
-Software:
- - [Python](https://www.python.org/downloads/) >= 3.6
- - [Docker](https://docs.docker.com/install/) >= 18.09.6
- - [Docker Compose](https://docs.docker.com/compose/install/) >= 1.23.2
- - [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows/)
+   - install [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows/)

 ## Setup

--- a/broker/README.md
+++ b/broker/README.md
@@ -14,11 +14,11 @@ The Inner Wrappers are interfaces tailored to each model, and support the models

 Adjust parameters in the `build/config.json` file.

-    * `mongo_port` is the port that the MongoDB container will use. The default Mongo port is 27017.
-    * `boot_timer` is the number of seconds that the broker will wait for all models to initialize, before it sends the shutdown signal. Try extending this time if models will take longer to load and process their configuration data in the custom `configure()` method in their inner wrapper.
-    * `watchdog_timer` is the number of seconds that the broker will wait to receive a status message from a model, before it sends the shutdown signal. If a model crashes, the broker will wait for this number of seconds before stopping the SIMoN run.
-    * `max_incstep` is the number of increments that the SIMoN run should perform before closing down.
-    * `initial_year` is the year corresponding to the configuration data (increment step 0).*
-    * `models` lists the ID / unique name of each model that will be included in the SIMoN run.
+  * `mongo_port` is the port that the MongoDB container will use. The default Mongo port is 27017.
+  * `boot_timer` is the number of seconds that the broker will wait for all models to initialize, before it sends the shutdown signal. Try extending this time if models will take longer to load and process their configuration data in the custom `configure()` method in their inner wrapper.
+  * `watchdog_timer` is the number of seconds that the broker will wait to receive a status message from a model, before it sends the shutdown signal. If a model crashes, the broker will wait for this number of seconds before stopping the SIMoN run.
+  * `max_incstep` is the number of increments that the SIMoN run should perform before closing down.
+  * `initial_year` is the year corresponding to the configuration data (increment step 0).
+  * `models` lists the ID / unique name of each model that will be included in the SIMoN run.

-* Because SIMoN runs predictive models, each increment step, and the data published at that increment step, corresponds to a point in time in the future. Currently, each increment corresponds to a year, since the data in the example models is annual. The `initial_year` parameter is used to specify the initial year assigned to increment step 0, and translate each subsequent increment step to its corresponding year. For models that do not have annual data, the reported "year" can be ignored by the user. In the future, SIMoN may be expanded to support multiple definitions of time (such as year, month, and fiscal quarter), just like it currently supports multiple definitions of geography (such as state, county, and watershed).
+Because SIMoN runs predictive models, each increment step, and the data published at that increment step, corresponds to a point in time in the future. Currently, each increment corresponds to a year, since the data in the example models is annual. The `initial_year` parameter is used to specify the year assigned to the initial increment step 0, and translates each subsequent increment step to its corresponding year. For models that do not have annual data, the reported "year" can be ignored by the user. In the future, SIMoN may be expanded to support multiple definitions of time (such as year, month, and fiscal quarter), just like it currently supports multiple definitions of geography.
--- a/graphs/README.md
+++ b/graphs/README.md
@@ -1,40 +1,6 @@
 # SIMoN Granularity Graph Tool

-This tool constructs the granularity graphs used for data translation in the SIMoN software application.
-
-## Granularities
-
-A key difficulty in combining models is resolving their data dependencies and discrepancies. By using the SIMoN software, a modeler is able to join models with disparate geographic definitions together in various combinations, allowing models to run together and exchange data that have heterogeneous definitions of geography.
-
-SIMoN currently integrates models of population, power systems, water systems, and climate change. These domains each have their own hierarchies of geography, which include political, topographical, regulatory, and latitude-longitude grid boundaries.
-
-In order to translate data from its models across granularities, SIMoN uses shapefiles to define rigorous geographies in a partially ordered set of geographic partitions (e.g., counties, watersheds, power regions, and latitude-longitude squares). The sample shapefiles provided in the `graphs/shapefiles` directory were clipped to the land boundary of the contiguous United States, in order to have consistent scope. The geometries were compressed / simplified using a distance-based method (the Douglas-Peucker algorithm) with a tolerance of 1 kilometer. They use [EPSG:3085](https://epsg.io/3085-1901) NAD83(HARN) / Texas Centric Albers Equal Area as their coordinate reference system.
-
-SIMoN creates a corresponding directed acyclic network graph representing all the granularities, their corresponding entities, and their relationships to each other. The individual models feed each other updated data inputs at synchronized time intervals, and traverse the network graph to translate their data from one granularity to another. A sample granularity graph is provided, but modelers can extend it or create a graph of their own, by modifying and using the `graphs/build.py` script.
-
-The granularities in the provided granularity graph are:
-    * `usa48` (a single region for the contiguous United States)
-    * `state` (49 regions: the lower 48 states plus Washington, DC)
-    * `county` (3108 counties, including Washington, DC)
-    * `nerc` (22 North American Electric Reliability Corporation regions)
-    * `huc8` (2119 HUC 8 regions)
-    * `latlon` (209 latitude-longitude grid squares)
-
-## Aggregators and Disaggregators
-
-The modeler can choose transformation functions, called aggregators and disaggregators, to translate data between compatible geographic definitions in various ways. These aggregators and disaggregators must conform to a set of mathematical axioms, including a partial inverse property, which are designed to create a provable notion of data consistency and reduce the possibility of self-propagating errors.
-
-Aggregators are functions used to combine data from sibling nodes to their parent node. Disaggregators are functions used to distribute data from a node to its children.
-
-Aggregators:
-* `simple_sum`: the values of the sibling nodes are added together. The sum is the new value of their parent.
-* `simple_average`: the parent's new value is the mean of the children's values.
-* `weighted_average`: the parent's new value is the mean of the children's values, weighted by each child's geographic area.
-
-Disaggregators:
-* `distribute_identically`: the parent node's value is assigned to each one of its children.
-* `distribute_uniformly`: the parent node's value is divided evenly among each of its children.
-* `distribute_by_area`: each child node is assigned a portion of the parent's value, proportional to the child's geographic area.
+This tool constructs the granularity graphs used for data translation in the SIMoN software application. By using SIMoN, a modeler is able to join models with disparate geographic definitions together in various combinations, allowing models to run together and exchange data that have heterogeneous definitions of geography.

 ## Basic Usage

@@ -44,26 +10,58 @@ Run `make graph` from the top-level `simon` directory.

 This will start the `simon-graph` Docker container, and create an abstract graph / instance graph pair in the `graphs/out` directory. The container will exit once the graph pair has been built. To use the generated graphs for the next SIMoN run, rename the abstract graph to abstract-graph.geojson, and the instance graph to instance-graph.geojson.

+## Granularities
+
+SIMoN currently integrates models of population, power systems, water systems, and climate change. These domains each have their own hierarchies of geography, which include political, topographical, regulatory, and latitude-longitude grid boundaries.
+
+In order to translate data from its models across granularities, SIMoN uses shapefiles to define rigorous geographies in a partially ordered set of geographic partitions (e.g., states, counties, watersheds, power regions, and latitude-longitude grid squares). The sample shapefiles provided in the `graphs/shapefiles` directory were clipped to the land boundary of the contiguous United States, in order to have consistent scope. Their geometries were compressed / simplified using a distance-based method (the Douglas-Peucker algorithm) with a tolerance of 1 kilometer. They use [EPSG:3085](https://epsg.io/3085-1901) NAD83(HARN) / Texas Centric Albers Equal Area as their coordinate reference system.
+
+SIMoN creates a corresponding directed acyclic network graph representing all the granularities, their corresponding entities, and their relationships to each other. The individual models feed each other updated data inputs at synchronized time intervals, and traverse the network graph to translate their data from one granularity to another. A sample granularity graph is provided, but modelers can extend it or create a graph of their own, by modifying and using the `graphs/build.py` script.
+
+The granularities in the provided granularity graph are:
+  * `usa48` (a single region for the contiguous United States)
+  * `state` (49 regions: the lower 48 states plus Washington, DC)
+  * `county` (3108 counties, including Washington, DC)
+  * `nerc` (22 North American Electric Reliability Corporation regions)
+  * `huc8` (2119 HUC 8 regions)
+  * `latlon` (209 latitude-longitude grid squares)
+
+## Aggregators and Disaggregators
+
+The modeler can choose transformation functions, called aggregators and disaggregators, to translate data between compatible geographic definitions in various ways. These aggregators and disaggregators must conform to a set of mathematical axioms, including a partial inverse property, which are designed to create a provable notion of data consistency and reduce the possibility of self-propagating errors.
+
+Aggregators are functions used to combine data from sibling vertices in the granularity graph into a value for their parent vertex. Disaggregators are functions used to distribute data from a vertex to its children.
+
+Aggregators:
+* `simple_sum`: the parent vertex's new value is the sum of its children's values.
+* `simple_average`: the parent's new value is the mean of the children's values.
+* `weighted_average`: the parent's new value is the mean of the children's values, weighted by each child's geographic area.
+
+Disaggregators:
+* `distribute_identically`: the parent vertex's value is assigned to each one of its children.
+* `distribute_uniformly`: the parent vertex's value is divided evenly among each of its children.
+* `distribute_by_area`: each child vertex is assigned a portion of the parent's value, proportional to the child's geographic area.
+
 ## Advanced Usage

 Adjust these parameters in the `graphs/config.json` file:

 * `projection` is the EPSG coordinate reference system code that all of the shapefile polygons will be translated to, in order to ensure consistency. For the most precise results, use the original EPSG of the shapefiles.
-* `scale_factor` divides the area of each shapefile's polygons by a scalar, in order to use better units. For example, the provided shapefiles have length units of meters and area units of square meters. The default scale_factor is 1 million, in order to translate the area unit of the provided shapefiles from square meters to square kilometers. Change the scale factor to 1 to preserve the original units.
-* `minimum_intersection_area` sets the minimum area of an instance wedge node (a node that results from intersecting nodes from two different branches of the granularity graph). Because of precision errors, a minimum intersection area of 0 could result in the creation of many tiny, spurious nodes that clutter the instance graph. The default minimum intersection area is set to 1 length unit, where length unit is the length unit of the shapefiles after any scaling from the `scale_factor`.
-* `abstract_edges` is the list of edges in the abstract graph, where each edge is represented by a tuple in the form [source, target]. Adjust the items in this list to create a new abstract graph. The `build.py` script will generate the corresponding instace graph by finding the corresponding shapefiles in the `graphs/shapefiles` directory.
-* `save_shapes` specifies whether to create a third file that saves the large polygon shapes of the instance graph nodes.
-* `tag` is the suffix attached to the abstract graph and instance graph filenames.
+* `scale_factor` divides the area of each shapefile's polygons by a scalar, in order to convert units. For example, the provided shapefiles have length units of meters and area units of square meters. The default scale_factor is 1 million, in order to translate the area unit of the provided shapefiles from square meters to square kilometers. Change the scale factor to 1 to preserve the original units.
+* `minimum_intersection_area` sets the minimum area of an instance wedge vertex (a vertex that results from intersecting vertices from two different branches of the granularity graph). Because of precision errors, a minimum intersection area of 0 could result in the creation of many tiny, spurious vertices that clutter the instance graph. The default minimum intersection area is set to 1 length unit, where length unit is the length unit of the shapefiles *after* any scaling from the `scale_factor` has been performed.
+* `abstract_edges` is the list of edges in the abstract graph, where each edge is represented by a tuple in the form of [source, target]. Adjust the items in this list to create a new abstract graph. The `build.py` script will generate the corresponding instace graph by finding the corresponding shapefiles in the `graphs/shapefiles` directory. Each of the vertices implicitly defined by these edges must have a corresponding shapefile with the same name and the `.shp` extension.
+* `save_shapes` specifies whether to create an additional, much larger instance graph file, which saves the polygon shapes of the instance graph vertices.
+* `tag` is a label / suffix attached to the abstract graph and instance graph filenames.

 Both JSON graphs have 3 key attributes:
-* `nodes` maps to a list of the graph's vertices
-* `links` maps to a list of the graph's edges
-* `graph` maps to a dictionary of the graph's metadata
-    * `id` is a UUID for the abstract-instance graph pair, that both the abstract graph and the corresponding instance graph share
-    * `projection` is the coordinate reference system that the shapefile geometries are defined on
-    * `granularities` are the granularities that the graphs connect
-    * `min_intersect_area` is the minimum area of a wedge node in the instance graph (a node made by intersecting disparate geographic granularities)
-    * `nodes` is the number of vertices in the graph
-    * `links` is the number of edges in the graph
-    * `counts` is the number of vertices in the graph, categorized by granularity
-    * `areas` is the total area of each granularity's scope, that is, the sum of all the node areas of each granularity. Ideally, these areas should be equal so that the graph will have a consistent scope.
+* `nodes` maps to a list of the graph's vertices.
+* `links` maps to a list of the graph's edges.
+* `graph` maps to a dictionary of the graph's metadata.
+    * `id` is a UUID for the abstract-instance graph pair, that both the abstract graph and the corresponding instance graph share.
+    * `projection` is the coordinate reference system that the shapefile geometries are defined on.
+    * `granularities` are the granularities that the graphs connect.
+    * `min_intersect_area` is the minimum area of a wedge vertex in the instance graph (a vertex made by intersecting disparate geographic granularities).
+    * `nodes` is the number of vertices in the graph.
+    * `links` is the number of edges in the graph.
+    * `counts` is the number of vertices in the graph, categorized by granularity.
+    * `areas` is the total area of each granularity's scope, that is, the sum of all the vertex areas of each granularity. Ideally, these areas should be equal so that the graph will have a consistent scope.
--- a/models/README.md
+++ b/models/README.md
@@ -19,7 +19,7 @@ The SIMoN framework is designed to be extensible and flexible, providing tools f

    To use a different set of models, see the instructions on how to "Add a new model" and "Remove a model" below.

-2. Optionally, adjust the models' output schemas, in order to change the granularity of their output data. Open the JSON file in a model's `schemas/output` directory with a text editor. Each variable in the schema has a `granularity` property. Change the `value` field of this property to one of these recognized [granularities](../graphs/README.md) (all lowercase):
+2. Optionally, adjust the models' output schemas, in order to change the granularity of their output data. Open the JSON file in a model's `schemas/output` directory with a text editor. Each variable in the schema has a `granularity` property. Change the `value` field of this property to one of these recognized [granularities](../graphs/README.md#granularities) (all lowercase):
    * usa48
    * state
    * county
@@ -27,7 +27,7 @@ The SIMoN framework is designed to be extensible and flexible, providing tools f
    * huc8
    * latlon

-You can also adjust the values of the `agg` and `dagg` properties to use different [aggregators and disaggregators](../graphs/README.md) to perform granularity translations.
+You can also adjust the values of the `agg` and `dagg` properties to use different [aggregators and disaggregators](../graphs/README.md#aggregators-and-disaggregators) to perform granularity translations.

 ## Add a new model

@@ -45,11 +45,11 @@ You can also adjust the values of the `agg` and `dagg` properties to use differe
            * any additional code that your model uses
    * `schemas/input/` stores JSON schemas that incoming JSON data messages must validate against. SIMoN uses the `jsonschema` Python package to validate the data messages against the schemas. There should be one input schema JSON file for each of the other models that this model receives data from. Adjust the `granularity` property in the input schema so that the input data that arrives in the model's inner wrapper will be in the granularity that is needed for your custom `my_module` functions to work.
 	* `*.json`
-        * granularity: specifies the granularity of input data that this model needs. SIMoN will translate incoming data to this granularity before sending it to the model's inner wrapper. If your inner wrapper needs the data to be in a different granularity in order to work with it, adjust the granularity value in the input schema accordingly.
+        * granularity: specifies the granularity of input data that this model needs. The model's outer wrapper will translate incoming data to this granularity before sending it to the model's inner wrapper. If your inner wrapper needs the data to be in a different granularity in order to work with it, adjust the granularity value in the input schema accordingly.
    * `schemas/output/` stores JSON schemas that outgoing JSON data messages must validate against. SIMoN uses the `jsonschema` Python package to validate the data messages against the schemas.
        * `*.json`
-        * granularity: specifies the granularity of data that this model will output. SIMoN will translate outgoing data to this granularity after receiving it from the model's inner wrapper.
-    * `config/` stores JSON objects with the initial data and parameters needed to bootstrap the model and run its first time step.
+        * granularity: specifies the granularity of data that this model will output. The model's outer wrapper will translate outgoing data to this granularity after receiving it from the model's inner wrapper.
+    * `config/` stores JSON objects with the initial data and parameters needed to bootstrap the model and perform the initial increment step.
        * `*.json`
 4.  Once you have a complete set of models where all dependencies are satisfied, add the unique name of each of the models to the "models" list in `broker/config.json`.
 5.  Create an entry for each model in the "services" section in `build/docker-compose.yml` and specify the path to each model's directory.
@@ -63,10 +63,10 @@ You can also adjust the values of the `agg` and `dagg` properties to use differe
 ## Remove a model

 1.  Before removing a model from SIMoN, make sure that no other models rely on it for their dependencies. For example, the `gfdl_cm3` model can safely be removed because no other models depend on it for their data inputs. However, the `power_demand` model cannot be removed without also removing the `power_supply` model, which relies on `power_demand` as an input.
-2.  Remove the name of the model from the "models" list in `broker/config.json`.
-3.  Remove the entry for the model in the "services" section of `build/docker-compose.yml`.
+2.  Remove the name of the model from the `models` list in `broker/config.json`.
+3.  Remove the entry for the model in the `services` section of `build/docker-compose.yml`.
 4.  The model will no longer be included in future SIMoN runs. Note, however, that the model's dedicated directory is left intact, so that it can be added back in easily.
-5.  To add the model back into SIMoN, simply repeat steps 2 and 3 from "Add a new model."
+5.  To add the model back into SIMoN, simply repeat steps 4 and 5 from "Add a new model."

 ## Example models