Stream: MARBL

Topic: roms driver


view this post on Zulip Dafydd Stephenson (Dec 05 2023 at 23:57):

@Keith Lindsay @Michael Levy
as an update log_20231205.txt here is output from ROMS with print statements on the MARBL end showing the various contributions to Jint_Ntot and print statements on the ROMS end showing the tracer and interior tendencies values at each level (for NO3,NH4,DON,DONR) before and after the call to interior_tendency_compute

The Jint_Ntot contributions are :

nfix=-0.1303031320222460E-012
no3=0.2114891178390254E-004
nh4=0.1116407491877611E-004
don=-0.3932067222230340E-005
donr=0.3553308574834122E-008
zooc=-0.1175304232702107E-005
autoc=-0.2722524839079615E-004
denitrif=0.0000000000000000E+000
sed_denitrif=0.1342360290901569E-007
PON_sed_loss=0.4047631781123396E-009

for a total of 0.246E-008 (exceeding the threshold of 0.434E-012)

view this post on Zulip Michael Levy (Dec 06 2023 at 04:24):

@Dafydd Stephenson is this the same column that had a bunch of zeros before, and now you've gotten non-zero values but they still violate the threshold criteria?

view this post on Zulip Keith Lindsay (Dec 07 2023 at 20:49):

@Dafydd Stephenson , what MARBL code base are you running with?
If it has mods from MARBL repo, has the MARBL test suite been run on the code base?
I'm asking to help separate between MARBL code issues and ROMS/MARBL driver issues.

view this post on Zulip Dafydd Stephenson (Dec 07 2023 at 20:52):

Hi Keith, sorry I didn't post an update straight away but I found out late yesterday that indeed the issue stemmed from being on the wrong MARBL branch. Good catch! Will be pushing a first version of the driver in this example configuration to GitHub today or tomorrow

view this post on Zulip Dafydd Stephenson (Dec 07 2023 at 20:58):

Michael Levy said:

Dafydd Stephenson is this the same column that had a bunch of zeros before, and now you've gotten non-zero values but they still violate the threshold criteria?

and yes Mike this is the same column, we caught an issue where halo effects were leaking in during the handover to MARBL and patched it but the error continued after (though has now been caught).

view this post on Zulip Dafydd Stephenson (Jan 26 2024 at 00:05):

I've updated the driver to include MARBL saved state i/o (was previously just populating the saved_state components of the MARBL instance with 0) but this is causing a crash out in marbl_co2calc_mod/drtsafe:
(There are a lot more WARNINGs before this but they're effectively the same thing at different levels). The model runs for about 11 model days before the warnings start.

 Task 7) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) it = 4
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) x1,f =  0.4340482E-008-0.3731634E-004
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) x2,f =  0.6879200E-005-0.2497997E-002
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): bounding bracket for pH solution not found
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) dic =  0.2371843E+004
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) ta =  0.2902752E+004
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) pt =  0.0000000E+000
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) sit =  0.0000000E+000
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) temp =  0.1578623E+002
 Task 7) Message from (i,j) (4,4) at level 10
 Task 7) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) salt =  0.3543604E+002
 Task 7) MARBL ERROR (marbl_co2calc_mod:comp_htotal): Error reported from drtsafe
 Task 7) MARBL ERROR (marbl_co2calc_mod:marbl_co2calc_interior): Error reported from comp_htotal()
 Task 7) MARBL ERROR (marbl_interior_tendency_mod:compute_carbonate_chemistry): Error reported from marbl_co2calc_interior() with dic
 Task 7) MARBL ERROR (marbl_interior_tendency_mod:marbl_interior_tendency_compute): Error reported from compute_carbonate_chemistry()
 Task 7) MARBL ERROR (marbl_interface:interior_tendency_compute): Error reported from marbl_interior_tendency_compute()
 Task 7) MARBL ERROR (MARBL_tracers_column_physics): Error reported from marbl_instance%interior_tendency_compute()

(The driver can be seen at https://github.com/dafyddstephenson/ROMS_MARBL_BATS/blob/saved_state/code/marbl_driver.F if needed)

view this post on Zulip Dafydd Stephenson (Jan 29 2024 at 22:57):

Some other things I've noticed are:

Digging further again, I've removed the variables from the restart files (ROMS behaviour is then to initialise them to 0). This leads to an abort during the first call to interior_tendency_compute.

Interestingly, if I go to the point in the code where the MARBL instance is populated with these values and instead hardcode them to 0, the run completes fine. It seems like they're getting junk values from somewhere but anywhere I've had the values printed out they seem normal. I will look into this further.

view this post on Zulip Michael Levy (Jan 29 2024 at 22:59):

Dafydd Stephenson said:

Digging further again, I've removed the variables from the restart files (ROMS behaviour is then to initialise them to 0). This leads to an abort during the first call to interior_tendency_compute.

Interestingly, if I go to the point in the code where the MARBL instance is populated with these values and instead hardcode them to 0, the run completes fine. It seems like they're getting junk values from somewhere but anywhere I've had the values printed out they seem normal. I will look into this further.

So if you remove saved state from the restart file, you expect marbl_ss_3d to be 0 but setting

MARBL_instance%interior_tendency_saved_state%state(m)%field_3d(:,1)=marbl_ss_3d(i,j,nz:1:-1,m)

aborts while

MARBL_instance%interior_tendency_saved_state%state(m)%field_3d(:,1)=0

is okay? that definitely sounds like marbl_ss_3d is getting corrupted

view this post on Zulip Dafydd Stephenson (Jan 29 2024 at 23:02):

Yes, that's exactly it. I'll go back to getting a full dump of the values because I can't think of anything else that tracks with this behaviour

view this post on Zulip Michael Levy (Jan 29 2024 at 23:03):

In step3d_t_ISO.F, you are passing marbldrv_column_physics marbl_saved_state_2d(istr:iend,jstr:jend,:) and marbl_saved_state_3d(istr:iend,jstr:jend,:,:), so your (i,j) dimensions will be remapped to (1:iend-istr+1, 1:jend-jstr+1). In marbldrv_column_physics you are looping from istr and jstr to iend and jend, so the last (istr-1) and (jstr-1) items are out of the bounds of the array. Try passing marbl_saved_state_2d(:,:,:) and marbl_saved_state_3d(:,:,:,:) instead

view this post on Zulip Dafydd Stephenson (Jan 30 2024 at 23:56):

I now remember why the call is written like this: arrays are defined to be padded around the outside. In this example, indices -1 and 0 point to the first two rows and columns of the array, so passing the full array remaps these to row/column 1 and 2. Dumping the values shows the ones in the restart file are correctly passed into MARBL in the right places. I don't think this exact format for the call to MARBL is ideal, but it is functional in the simple configuration where istr=jstr=1,iend=jend=4, and each CPU is working with a local ~4x4 subset of the global array (sometimes with some additional padding (rows/columns <1 or >4) that we don't want to show to MARBL).

However, I've found a bug in the ROMS source code that initialises variables absent in the restart file to 0 _only in a single layer_ which explains why removing saved state from the restart file led to a rapid corruption with junk values in that particular case. I have now patched this and can confirm that whether or not the saved_state values are in the restart file, the model errors out at the same point in time.

view this post on Zulip Dafydd Stephenson (Feb 01 2024 at 21:45):

OK, this turned out to be triggered independently by the above mentioned bug in ROMS (now patched) and a physics problem with the test configuration (now fixed). It now runs without the issue.

view this post on Zulip Michael Levy (Feb 01 2024 at 21:54):

awesome, glad you got it up and running! I'm a little surprised that passing the subarrays kept the subarray index, but then I had no idea what else to suggest :)

view this post on Zulip Dafydd Stephenson (Feb 01 2024 at 22:00):

It happens that the indices coincide with the size of the subarray but this formulation at best ugly and at worst asking for trouble... I think it'd be better to pass the subarray and then loop over its full shape (not an easy change to make currently, but definitely due at some point). Thanks for your help!


Last updated: May 16 2025 at 17:14 UTC