-
Notifications
You must be signed in to change notification settings - Fork 5
Make ACCESS-NRI Intake Catalog more portable to other computing environments (sort-of) #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
7b77144
eafa553
d67b231
d6d5e87
b482a8f
f12f64a
8f035ac
5ba6d46
5f2ee0e
f582ab0
070906e
77b8f20
27c4563
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| .. _external: | ||
|
|
||
| Non-Gadi use | ||
| ============ | ||
|
|
||
| ACCESS-NRI Intake Catalog is designed for managing the collection Intake-ESM datastores on the | ||
| `NCI Gadi <https://nci.org.au/our-systems/hpc-systems>`_ environment. | ||
| However, it is able to be modified to manage Intake-ESM datastores stored on other systems. | ||
|
|
||
| .. warning:: | ||
| Adapting ACCESS-NRI Intake Catalog to run on another computer system is an advanced task, that requires code | ||
| modification, the ability to install this code on your target machine, and an intimate knowledge of the | ||
| file system on your target system. It is not recommended for the faint-hearted! | ||
|
|
||
| To adapt ACCESS-NRI Intake Catalog to run on a non-Gadi system, check out a copy of the source code and consider | ||
| making the following modifications: | ||
|
|
||
| 1. The constants in the top level :code:`__init__.py` file describe where the input data are stored, the patterns | ||
| and regular expressions used to match related file paths, and the location of the final catalog file. | ||
| These will need to be updated to reflect the target file system structure. | ||
|
|
||
| a. Gadi stores data in a file system that is arrayed by projects. Projects are denoted by an alphanumeric code of | ||
| one to two letters, followed by one to two digits, e.g., :code:`hh5`, :code:`io10`, etc. Data is then stored | ||
| at targets like :code:`/g/data/<project code>/`. | ||
|
|
||
| The :code:`catalog-build` command, which invokes the function :code:`cli.build`, has a sequence of calls to determine | ||
| the projects that are involved in a catalog build, and checks that the build user has access to those project | ||
| storage locations (this code section currently starts at line 473 of :code:`cli.py`). | ||
| The build will be aborted if these checks fail. Therefore, if your storage does not | ||
| use a similar directory structure to Gadi (that is, a group of directories all situated at one root location), you may need | ||
| to modify or remove these calls to achieve a successful catalog build. | ||
|
|
||
| 2. The existing YAML files in :code:`config/` refer to the datastores/raw data stored on Gadi. You will need to | ||
| remove these, and replace them with similarly-structured YAML files denoting your own data setup. (Note that the | ||
| contents of :code:`config/metadata-sources` are archival copies of live experiment :ref:`metadata`; | ||
| you will not need to replace these on your system.) | ||
|
|
||
| 3. The command-line scripts in :code:`bin/` contain PBS commands and file paths specific to Gadi. You will need | ||
| to modify these scripts to reflect your computing system. | ||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -8,4 +8,16 @@ | |||||||||||
| __version__ = _version.get_versions()["version"] | ||||||||||||
|
|
||||||||||||
| CATALOG_LOCATION = "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml" | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a trick we use for managing config files which is common in Django/Web Development which might be relevant/useful here: import os
CATALOG_LOCATION = os.environ.get("CATALOG_LOCATION", "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml")Basically, it lets us override the default with an environment variable, if found, falling back to the hardcoded default. It strikes me that it might be useful here, although we'd probably want to prefix all the environment variable names as a namespacing strategy, eg.: CATALOG_LOCATION = os.environ.get(
"ACCESS_NRI_INTAKE_CATALOG_LOCATION", "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml"
) |
||||||||||||
| """Location for 'live' master catalog YAML.""" | ||||||||||||
|
|
||||||||||||
| USER_CATALOG_LOCATION = str(Path.home() / ".access_nri_intake_catalog/catalog.yaml") | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| """Location where user can place a master catalog YAML to override standard 'live' version.""" | ||||||||||||
|
|
||||||||||||
| STORAGE_ROOT = "/g/data" | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| """Root storage location for catalog experiments""" | ||||||||||||
|
|
||||||||||||
| STORAGE_FLAG_PATTERN = r"gdata/[a-z]{1,2}[0-9]{1,2}" | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| """Pattern for matching 'storage flags' - related to Gadi file access system""" | ||||||||||||
|
|
||||||||||||
| STORAGE_LOCATION_REGEX = r"^/g/data/(?P<proj>[a-z]{1,2}[0-9]{1,2})/.*?$" | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| """Regular expression for matching the file path to experiments, and extracting a project ID""" | ||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should then be able to just update all this to reflect that the defaults can be overridden with environment variables.