I've run into my first set big issue with the high resolution run -- trying to use shr_stream
to read the tracer restoring fields blows memory. I talked with @Keith Lindsay about it on Friday, and the best we can figure is that the share code reads the stream file on master task and then distributes it. From cesm.log
in /glade/scratch/mlevy/SMS_Ld1.TL319_t13.G1850ECOIAF_JRA_HR.cheyenne_intel.pop-highres_JRA_cice.019/run
:
828:MCT::m_AttrVect::init_: allocate() error, stat =41 828:33C.MCT(MPEU)::die.: from MCT::m_AttrVect::init_() 828:MPT ERROR: Rank 828(g:828) is aborting with error code 2. ... 828:MPT: #6 0x00000000010e6a2f in m_dropdead_mp_die__ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/externals/mct/mpeu/m_dropdead.F90:87 828:MPT: #7 0x00000000010e5bff in m_die_mp_die2__ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/externals/mct/mpeu/m_die.F90:165 828:MPT: #8 0x000000000106d57d in m_attrvect_mp_init__ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/externals/mct/mct/m_AttrVect.F90:346 828:MPT: #9 0x000000000106d1c0 in m_attrvect_mp_initv__ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/externals/mct/mct/m_AttrVect.F90:434 828:MPT: #10 0x000000000109b5d8 in m_generalgrid_mp_initgg__ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/externals/mct/mct/m_GeneralGrid.F90:853 828:MPT: #11 0x0000000000dfba30 in shr_dmodel_mod_mp_shr_dmodel_readgrid_ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/share/streams/shr_dmodel_mod.F90:433 828:MPT: #12 0x0000000000eae6d3 in shr_strdata_mod_mp_shr_strdata_init_streams_ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/share/streams/shr_strdata_mod.F90:480 828:MPT: #13 0x0000000000eaa309 in shr_strdata_mod_mp_shr_strdata_create_oldway_ () 828:MPT: at /glade/work/mlevy/codes/CESM/cesm2_2_beta04+GECO_JRA_HR/cime/src/share/streams/shr_strdata_mod.F90:219 828:MPT: #14 0x0000000000ac1582 in strdata_interface_mod_mp_pop_strdata_create_ () 828:MPT: at /glade/scratch/mlevy/SMS_Ld1.TL319_t13.G1850ECOIAF_JRA_HR.cheyenne_intel.pop-highres_JRA_cice.019/bld/ocn/source/strdata_interface_mod.F90:184 828:MPT: #15 0x000000000095a1a4 in ecosys_forcing_mod_mp_ecosys_forcing_set_interior_time_varying_forcing_data_ () 828:MPT: at /glade/scratch/mlevy/SMS_Ld1.TL319_t13.G1850ECOIAF_JRA_HR.cheyenne_intel.pop-highres_JRA_cice.019/bld/ocn/source/ecosys_forcing_mod.F90:1576
There is an allocate()
call in m_AttrVect.F90
that is failing, and additional tests is showing that nRA = 7 n = 535680000
(so n = 3600*2400*62
). Working under the assumption that "just don't restore" is not a good plan, I think there are three paths forward:
shr_stream
ecosys_forcing_mod.F90
in POP to allow restoring to a constant value, and compute average value over the marginal seas in the file we are currently trying to restore toI meant to add that aV%rAttr
is ~28 GB in size, and allocate() error, stat =41
indicates "[i]nsufficient virtual memory"
Option 4. Don't apply the restoring. I think this is worth considering.
I asked about this on the CIME slack board, and Jim said "We are developing new esmf based data models with issues like this in mind. Are you willing to try some bleeding edge code in your experiment?" That won't be available in 2.2.0, so I think proceeding without restoring for now and then trying the new infrastructure seems like a good path forward.
Last updated: May 16 2025 at 17:14 UTC