Stream: ESDS

Topic: Diagnostics


view this post on Zulip Will Wieder (Apr 27 2021 at 11:50):

Not sure to or where to start this conversation, but following the ESDS discussion yesterday it sounded like we should continue the conversation either here or on ESDS github page https://github.com/NCAR/esds/ . I'd recommend making this move sooner rather than later, as I find github issues and project easier to organize and track than threads on this forum.

I've also looped in some users who may have more interest in this topic, but the list is not exhaustive. (I'm also assuming that others in Zulip can see this)?

I thought this would be under #ESDS , but was not successful... suggestions here are welcome

view this post on Zulip Will Wieder (Apr 27 2021 at 11:57):

I liked @Matt Long and @Max Grover suggestion to build a generalized workflow for CESM model diagnostics packages. Broadly this should include:

There's a lot complexity in these points, but maybe some of these can be accomplished in a focused sprint, especially to generate time series.

view this post on Zulip Matt Long (Apr 27 2021 at 12:34):

Thanks @Will Wieder.

I think we might consider putting a design document together. We could do this on a google doc or HackMD, for example. We will soon have a repo together for the timeseries generation, I hope. That will provide a venue enabling discussion of the specifics of that piece.

Regridding is something we've been working on a bit too. xESMF provides a partial solution, but other pieces include some curation of grid and mapping files.

view this post on Zulip Brian Medeiros (Apr 27 2021 at 15:12):

I'd like to get @Jesse Nusbaumer included here too. From what Max and Matt have described, it sounds like we are all doing pretty similar things for time series generation, but I agree we should get some kind of requirements document put together so we are all on the same page.

view this post on Zulip Max Grover (Apr 27 2021 at 15:20):

I went ahead and created a repository for the history --> timeseries generation. This is based on @Matt Long 's scripts. We can move discussion on this "component" of the diagnostics workflow to this repository https://github.com/NCAR/cesm-hist2tseries. Feel free to add issues/comments on what we should name this, how we should separate the workflow, improvements on the api/documentation. Looking forward to moving forward on this.

view this post on Zulip Max Grover (Apr 27 2021 at 15:28):

Again - this is a prototype for now, collaboration on improving this will be helpful and it is not ready to be used yet.

view this post on Zulip Allison Baker (Apr 27 2021 at 16:54):

We don't have anything for creating time series data from history data, but in our ldcpy package (https://github.com/NCAR/ldcpy), we have utilities for aggregating time series data over time and space (and both) and calculating various statistical quantities. (We have been using CAM-FV data and POP data from CESM-LENS) There are a number of sample notebooks to see the capabilities, but we would love feedback and are happy to add more features that would be useful to you all. We don't yet support CAMSE data but its on the "to do" list.

view this post on Zulip Jesse Nusbaumer (Apr 27 2021 at 18:58):

I am happy with a sprint for time-series generation, but before we start should we make a concerted effort to list/organize all of our needs/wants/requirements for this new package? I can already think of a few aspects I would personally like to see (e.g. meta-data conserving, able to scale in parallel but still runnable in serial). My fear is that otherwise we'll get ~80% of the way done before realizing there is some major requirement we forgot about that won't be easily implementable in the routine as written. I can start an issue in https://github.com/NCAR/cesm-hist2tseries to get that conversation going if that is preferred.

Also, although I am happy to contribute to any design discussion, I sadly will be MIA coding-wise until about mid-May, as all of the AMP engineers have been explicitly directed to work on implementing new infrastructure in CAM/SIMA. So if you want to exclude me from this particular sprint that is totally fine. However, once this SIMA project is done I should be able to actually contribute coding help to whatever we collectively want to focus on then. Thanks!

view this post on Zulip Matt Long (Apr 27 2021 at 19:34):

@Jesse Nusbaumer, I wholeheartedly agree that we should develop a design before jumping in!

view this post on Zulip Andrew Gettelman (Apr 27 2021 at 19:55):

I think the concept of moving forward with common pieces is a great idea. But I would like to see some overall coordination. Having volunteers from the different sections to coordinate would be good. Fine if we use a google doc or something else to put things down. It would be good (as I think @Jesse Nusbaumer suggested below) to make sure we have some requirements for any pieces before we do a sprint. Not that a sprint would be required to do everything, but at least we don't miss major points. The list of items: (Timeseries, Regridding, Visualization) are I think a great initial list, almost in priority order.

I'm happy to help. We have documentation for what we are aiming for in AMP, and that can feed into this discussion.

view this post on Zulip Matt Long (Apr 27 2021 at 19:56):

Thanks @Andrew Gettelman

view this post on Zulip Matt Long (Apr 27 2021 at 19:57):

We'll share a design doc for timeseries tool soon. I think it's really important that we think about assembling the workflow from modular components—these can serve needs as standalone entities in ad hoc workflows too.

view this post on Zulip Max Grover (May 04 2021 at 13:55):

@Andrew Gettelman and @Jesse Nusbaumer we collected some of our thoughts here in regards to the design/dependencies for the history to timeseries tool - feel free to check out the documents linked in that issue, I think it will be good to continue this discussion there

view this post on Zulip Sheri Mickelson (May 04 2021 at 14:52):

For CMIP6 we used this code https://github.com/ncar/pyreshaper to convert to timeseries and it might be important to understand why there's a need to develop something new so that you don't run into the same issues. A couple issues that I would like to point out would be that the code relies on mpi4py which can be difficult to port. Another issue is that the code is slow converting high frequency output and you'll have to be careful about how you parallelize over the problem or a dask implementation could run into the same issue. The bottleneck is writing out the file in netcdf and there's not a great option to do this in parallel unless you're writing out in zarr. And my 2 cents ... if at all possible, this operation should really be done by PIO coming out of the model. Ignoring legacy runs or developement runs that turn into production runs, this operation creates duplicate data when disk space is already at a premium and this operation is hard to do without an email to cisl asking for more disk space. You might also want to talk to Gary Strand for his advice on how this should be developed, especially because this task usually falls onto his to-do list.

view this post on Zulip Max Grover (Jun 04 2021 at 20:59):

I just posted this week's blog post detailing how to create intake-esm catalogs from CESM history file output... I think this will be helpful in generalizing data ingestion throughout the different model diagnostic workflows. If you are interested, here is the post https://ncar.github.io/esds/posts/ecgtools-history-files-example/

view this post on Zulip Max Grover (Jun 07 2021 at 14:26):

There will be a group discussion focused on CESM diagnostics at the CESM workshop during the Software Engineering Working Group (SEWG) session. We put together an agenda for the session, with links to the presentations we will be using to spark discussion/gather information. Feel free to check it out, and leave feedback. https://docs.google.com/document/d/1wpqMhOEXnYM7WcXpKzQH7TU8B8LA7ZNsWR0D44ym8dk/edit?usp=sharing


Last updated: May 16 2025 at 17:14 UTC