Skip to content

Updated Interpolation for ROMS Rutgers#1090

Open
mgharamti wants to merge 3 commits intoNCAR:mainfrom
mgharamti:roms_interp
Open

Updated Interpolation for ROMS Rutgers#1090
mgharamti wants to merge 3 commits intoNCAR:mainfrom
mgharamti:roms_interp

Conversation

@mgharamti
Copy link
Copy Markdown
Contributor

@mgharamti mgharamti commented Apr 1, 2026

Description:

This PR updates the interpolation routine in the ROMS_rutgers interface. The changes, which are confined to the model_mod, now use the regular grid (with non-uniform spacing) option of the quad_utils, consistent with the ROMS_rutgers grid. The previous implementation incorrectly used the irregular grid option which made the interface a lot more memory intensive requiring storage of large static data.

Other changes include:

  • Getting rid of the staggered grid lat and lon arrays. New helper functions are used to find the location from the rho grid.
  • Similarly, the U and V masks are removed. New helper functions are used instead.
  • Turning on state_buffer_io in the input.nml as it leads to better filter performance

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

No

Tests

I have tested the new changes to the model_mod with different processor (and obs) count. The code change does not alter the results but rather makes running filter a lot more efficient, requiring less computational resources. This might be crucial for running filter on HPC systems that are not as big as Derecho.

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

All of my tests are available on Derecho @ /glade/derecho/scratch/gharamti/inacawo/DART/models/ROMS_rutgers

Use grid interpolation options from quad_utils
that are consistent with the ROMS grid. Previously,
it used the irregular grid option. Now, it uses
the regular one with non-uniform spacing.

lat and lon arrays for staggered grid are now removed.
Helper functions are used instead to compute the staggered
lat and lon using the rho grid.

U and V masks have also been removed. The rho mask is used.

The model_interpolate code is modified to use the
right quad_utils routines.
Turns on state_buffer_io to improve memory issues
using a high resolution grid.

Also, turned on vertical localization and tested
different radii.
integer :: varid, lstatus
integer :: qstatus, vstatus
integer :: varid, lstatus, vstatus
integer :: qstat(ens_size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem with this code, but I put an issue in #1095 that only the
quad_lon_lat_evaluate_ir_array checks if missing_r8 is allowed in the state, but quad_lon_lat_evaluate_ii_array.

qstaus -> qstat(ens_size) when using ir

@hkershaw-brown
Copy link
Copy Markdown
Member

Just having a look at the test data in:
/glade/derecho/scratch/gharamti/inacawo/DART/models/ROMS_rutgers/work

I'm not sure if this is the case used in the memory measurements, if it is it would be worth changing the ensemble manager to use a round robin layout.

I noticed that the ensemble manager is using the default:
&ensemble_manager_nml
/

which has the 1st n tasks read the ensemble members

layout 2 will be round robin so the ensemble member readers will be spread out round robin across the nodes.

&ensemble_manager_nml
layout = 2
tasks_per_node = 128 ! Note this should match your tasks per node in #PBS -l ... :ncpus=128
/

@mgharamti
Copy link
Copy Markdown
Contributor Author

Just having a look at the test data in: /glade/derecho/scratch/gharamti/inacawo/DART/models/ROMS_rutgers/work

I'm not sure if this is the case used in the memory measurements, if it is it would be worth changing the ensemble manager to use a round robin layout.

I noticed that the ensemble manager is using the default: &ensemble_manager_nml /

which has the 1st n tasks read the ensemble members

layout 2 will be round robin so the ensemble member readers will be spread out round robin across the nodes.

&ensemble_manager_nml layout = 2 tasks_per_node = 128 ! Note this should match your tasks per node in #PBS -l ... :ncpus=128 /

@hkershaw-brown, the test data in /glade/derecho/scratch/gharamti/inacawo/DART/models/ROMS_rutgers/work is the one used in testing for memory and time performance. By the way, I was finally able to run filter with 3 nodes. As suggested during stand-up, I turned off the member writing for the preassim stage and the DA cycle finished. I think this is the most that I can tune in the interface at this point.
I didn't see any tangible benefits from switching the PE layout to 2. The performance was similar to the layout of 1.

@hkershaw-brown
Copy link
Copy Markdown
Member

note from Standup today. ROMS does have curvilinear and rotated grids -> compile time choice for this optimized model_mod.f90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants