Stream: ESDS

Topic: Blog Posts


view this post on Zulip Max Grover (Mar 19 2021 at 20:10):

@all I worked with @Anderson Banihirwe and @Michael Levy to put together an example of an interactive dashboard workflow as the first post. Looking forward to collaborating with more people throughout CGD/more generally NCAR to put relevant content together. https://ncar.github.io/esds/posts/Interactive_Dashboard/

view this post on Zulip Max Grover (Mar 26 2021 at 21:15):

@all last week's discussion at the town hall has now been summarized/posted on the ESDS blog, removing names and condensing into more defined topics https://ncar.github.io/esds/posts/esds-blog/. Be sure to take a look! Also, this a reminder that we will hold a work-in-progress meeting Monday afternoon, where there will be more open discussion/small presentations.

view this post on Zulip Matt Long (Mar 30 2021 at 23:31):

Idea for a post based on a conversation with @Keith Lindsay:
What is the best approach for developing new functionality in a Python package (e.g., pop-tools) in the context of developing an application of that functionality?

view this post on Zulip Kevin Paul (Mar 31 2021 at 00:30):

What do you mean?

view this post on Zulip Max Grover (Mar 31 2021 at 15:17):

Could you expand on this?

view this post on Zulip Michael Levy (Mar 31 2021 at 15:28):

I wasn't part of the conversation yesterday, but I have had similar conversations... say I want to add a feature to pop-tools to use in the high-res analysis tools. There are several possible ways to do this, but here are just two examples:

  1. pip install -e $LOCALDIR/pop-tools, and try to develop directly in pop-tools PRO: when my feature is ready, it's already in the right place to submit a PR to add it to pop-tools; CON: it might takes weeks or months for the full process (develop feature, go through review process, have PR merged, get new release) during which time I'm stuck using an older version of pop-tools; also, it's a lot to ask of users who aren't familiar with the development process for the tool they want to improve
  2. In the high-res repo, create a pop_tools_future directory and develop directly there. Then, when the feature is complete, start the review / PR / release process in a separate sandbox. PRO: once feature is available in pop-tools, it's just a change from pop_tools_future to pop_tools in your call (and maybe commenting out the import pop_tools_future line). CON: it's pretty kludgy during the development stage; if the new feature depends on other (existing) pop-tools features it might not be as easy as copying directly into the pop-tools sandbox

view this post on Zulip Michael Levy (Mar 31 2021 at 15:32):

But I think the main point is that python's modularity at the package level is a new paradigm for most of CGD. The idea that you could be developing an application that depends on a dozen independent (or intertwined) packages, and some of the features in your application are really improvements to those existing packages, is not something we've dealt with in the past and a blog post providing guidance would be very helpful.

view this post on Zulip Michael Levy (Mar 31 2021 at 15:33):

(If that's not what @Matt Long had in mind, then it's an idea for a different post :)

view this post on Zulip Matt Long (Mar 31 2021 at 15:35):

you got it

view this post on Zulip Kevin Paul (Mar 31 2021 at 15:52):

Ok. Thanks, @Michael Levy, for the specification. I was thinking in much more general terms, such as the actual creation of a new package for the purpose of performing a particular scientific analysis (i.e., for a paper). If that is in any way part of the original intent of @Matt Long's post, then there are more philosophical and practical considerations that go beyond just how to contribute to an existing package.

view this post on Zulip Kevin Paul (Mar 31 2021 at 15:54):

...But, @Michael Levy, the things you just wrote up would be great for a blog...but they could/should also be added to the CONTRIBUTING.md document of pop-tools, right?

view this post on Zulip Michael Levy (Mar 31 2021 at 15:57):

...But, Michael Levy, the things you just wrote up would be great for a blog...but they could/should also be added to the CONTRIBUTING.md document of pop-tools, right?

I'm not sure if it belongs there or not -- from the pop-tools viewpoint, the only way a feature is added to pop-tools is via the fork / branch / PR route... but it doesn't really matter if the origin of the feature was done in situ or if it was originally a function in a notebook that eventually got copied into a branch on a fork and submitted as a PR. On the other hand, it could be useful information so potential developers know where to begin the process

view this post on Zulip Kevin Paul (Mar 31 2021 at 16:00):

Yeah. That's a fair viewpoint. Perhaps you could just include a simple link to the external blog post in the CONTRIBUTING.md document. Anything that helps people get over the barrier to contributing is good.

view this post on Zulip Max Grover (Mar 31 2021 at 16:01):

Maybe something similar to this https://unidata.github.io/MetPy/latest/devel/CONTRIBUTING.html#setting-up-your-development-environment
it explicitly states that at the end of that setup, you can use the functionality

view this post on Zulip Deepak Cherian (Mar 31 2021 at 16:06):

I think (1) is the way to go.

takes weeks or months for the full process (develop feature, go through review process, have PR merged, get new release)

we should fix this.

view this post on Zulip Michael Levy (Mar 31 2021 at 16:21):

I think (1) is the way to go.

takes weeks or months for the full process (develop feature, go through review process, have PR merged, get new release)

we should fix this.

I wasn't very clear about it, but in my experience the majority of that time is spent in the "develop feature" step so I don't think there is much we can do to help. (I'm thinking specifically of https://github.com/marbl-ecosys/HiRes-CESM-analysis/tree/master/notebooks/utils; on my to-do list for this week is to make a concrete list of features in that directory that actually belong elsewhere, and then I can give a more concrete example)

view this post on Zulip Kevin Paul (Mar 31 2021 at 16:26):

Yeah. I think I agree with @Michael Levy on this. At first, I wholeheartedly agreed with you, @Deepak Cherian. This is anathema to agile developers and advocates of the continuous delivery (CD) paradigm.

However, after thinking about it a bit, I think there is a difference between the "scientist first, then developer" mode of work and the "developer first, then scientist" mode of work. With the former I think the priority is the scientific result, and with the later the priority is the published software feature. Unfortunately, the two priorities do not always align perfectly. Prioritizing the scientific result can sometimes require pushing on many simultaneous features at once, leading to a new scientific result but not "ready for release" software features. This is what I think @Michael Levy was implying. On the other hand, prioritizing the development usually means pushing on individual features one at a time (in sequence, if working alone, or in parallel, if working as a team) and continually releasing along the way, but this can lead to significant delays in achieving the scientific result while you wait for all the required pieces to be put in place.

I'm not sure there is a "fix" to this. In the early days of discussion on ESDS (before ESDS was a "thing"), @Matt Long and I talked about "scientists first" folks pushing toward the scientific result while "developers first" folks simultaneously focused on cleaning up, finishing, and releasing the proto-features that the scientists developed. But I've come to see that this approach tends to be very inefficient and doesn't work well in practice.

view this post on Zulip Deepak Cherian (Mar 31 2021 at 16:42):

Thanks for clarifying @Michael Levy .

@Kevin Paul I mostly agree that there is no "fix". I think the real fix is to write modular code rather than one giant function that does everything. That leaves open the possibility of sending it upstream, as well as reusing the function in many projects. Even if you don't have time for a PR you can always put it in an issue for someone else to use / modify. See here for example haha: https://github.com/NCAR/mom6-tools/issues/25 . This function does exactly one thing and is quite useful :slight_smile:

view this post on Zulip Michael Levy (Mar 31 2021 at 16:45):

I think the real fix is to write modular code rather than one giant function that does everything. That leaves open the possibility of sending it upstream, as well as reusing the function in many projects.

I agree 100%. I think the blog post that Matt requested to start this conversation would basically be focused on (a) writing modular code, and (b) moving that modular function to the proper repository when the author is happy with the results

view this post on Zulip Max Grover (Apr 02 2021 at 22:13):

This week's ESDS post is focused around the topic of software citation! Be sure to check it out https://ncar.github.io/esds/posts/software-citation/

view this post on Zulip Max Grover (Apr 06 2021 at 23:24):

@all Here is a post detailing how to get started using Dask with PBSCluster on Casper through the new Jupyterhub which launches tomorrow https://ncar.github.io/esds/posts/casper_pbs_dask/

view this post on Zulip Matt Long (Apr 06 2021 at 23:25):

nice work @Max Grover!

view this post on Zulip Matt Long (Apr 06 2021 at 23:26):

see also https://github.com/NCAR/ncar-jobqueue/issues/40

view this post on Zulip Max Grover (Apr 09 2021 at 21:55):

Interested in learning how to use intake-esm with dask? Check out the newest ESDS blog post! https://ncar.github.io/esds/posts/intake_esm_dask/

view this post on Zulip Yassir Eddebbar (Apr 13 2021 at 23:17):

Does anyone recommend a blog or tutorial on how to create a reproducible workflow for a project (preferably for python-based workflows in the geosciences), e.g. what files/folders (at minimum) should be included in a github repo, how to create an environment.yml file, best practices, etc...

view this post on Zulip Deepak Cherian (Apr 13 2021 at 23:20):

Here's a list:

  1. https://github.com/jbusecke/cookiecutter-science-project has some docs
  2. There may be something useful here: https://the-turing-way.netlify.app/welcome.html

view this post on Zulip Yassir Eddebbar (Apr 13 2021 at 23:32):

Here's a list:

  1. https://github.com/jbusecke/cookiecutter-science-project has some docs
  2. There may be something useful here: https://the-turing-way.netlify.app/welcome.html

This is perfect, thanks @Deepak Cherian !

view this post on Zulip Max Grover (Apr 16 2021 at 14:58):

@_all Here is a post detailing how to get started using Dask with PBSCluster on Casper through the new Jupyterhub which launches tomorrow https://ncar.github.io/esds/posts/casper_pbs_dask/

Interested in how one can use ncar-jobqueue to make this process even easier? Check out this walk through of the difference between ncar-jobqueue and dask-jobqueue, and how to use this in your workflow! https://ncar.github.io/esds/posts/ncar-jobqueue-example/

Thanks @Anderson Banihirwe for the recent updates/all your work on ncar-jobqueue!

view this post on Zulip Max Grover (Apr 23 2021 at 20:57):

Thanks @Stephen Yeager and @Deepak Cherian for contributing to this blog post regarding indexing data on unstructured grids using Xoak https://ncar.github.io/esds/posts/multiple_index_xarray_xoak/

view this post on Zulip Max Grover (Apr 29 2021 at 15:21):

This post isn't from our blog, but it still worth checking out! The team at Coiled put together a post detailing "Distributed Data Science and Oceanography with Dask" with a focus on observational data stored in the cloud! https://coiled.io/blog/distributed-data-science-and-oceanography-with-dask-2/

view this post on Zulip Max Grover (Apr 30 2021 at 19:52):

Interested in performing calculations in xarray, + dask, requiring operations within columns in a 3D field? Check out this post detailing how to use xarray.map_blocks https://ncar.github.io/esds/posts/map_blocks_example/

view this post on Zulip Yassir Eddebbar (May 04 2021 at 21:17):

@Max Grover Per a conversation this morning, @Matt Long suggested adding a case/example for interpolating POP2 variables unto density surfaces using xgcm's transform function, maybe as an ESDS blog or a case for pop-tools?
Here is a simple case plotting O2 on the 26.5 density surface from an ensemble member from the CESM LE archive: https://hub.gke2.mybinder.org/user/eddebbar-zulip_notebooks-etekfion/notebooks/O2_along_Isopycnals.ipynb

view this post on Zulip Max Grover (May 10 2021 at 13:27):

Interested in paired programming? Visual Studio code has some helpful tools for collaborating and remotely accessing Casper/Cheyenne - here is a post detailing this process! https://ncar.github.io/esds/posts/paired_programming_vs/

view this post on Zulip Max Grover (May 14 2021 at 20:35):

Here is a detailed walkthough of debugging and finding a workaround for the issue that @Stephen Yeager brought up earlier this week while working with intake-esm https://ncar.github.io/esds/posts/intake_cmip6_debug/

view this post on Zulip Max Grover (May 28 2021 at 21:23):

We put together some thoughts on discussions from the Dask Distributed Summit - here is a blog post detailing some of the key sessions, along with relevant links to the talks/slides! Also included is how we can tie this back to ESDS and NCAR as a whole.

view this post on Zulip Anderson Banihirwe (May 28 2021 at 21:26):

We put together some thoughts on discussions from the Dask Distributed Summit - here is a blog post detailing some of the key sessions, along with relevant links to the talks/slides! Also included is how we can tie this back to ESDS and NCAR as a whole.

Thank you for putting together such a great summary!

view this post on Zulip Max Grover (Jun 04 2021 at 20:53):

Interested in creating intake-esm catalogs from CESM history files? Check out this week's ESDS blog post! While the package has not been released yet, we hope this is helpful to those working with CESM data! https://ncar.github.io/esds/posts/ecgtools-history-files-example/

view this post on Zulip Max Grover (Jun 11 2021 at 21:50):

Ever wanted to visualize items in data catalog? Check out this post detailing how to use the graphviz library and intake-esm to visualize assets within your catalog (using CESM-LE data as an example)! https://ncar.github.io/esds/posts/graphviz_example/

view this post on Zulip Max Grover (Jun 18 2021 at 14:12):

During the CESM workshop this week, at the Software Engineering Working Group meeting, we had a discussion about diagnostics related to CESM! I put together a blog post summarizing discussions, as well as a word cloud from all the notes from the small group discussions. Check it out! https://ncar.github.io/esds/posts/cesm-workshop-2021-diagnostics/
pasted image

view this post on Zulip Daniel Marsh (Jun 21 2021 at 14:47):

@Max Grover Thanks for the blog on creating intake-esm catalogs from CESM history files. I tried it out on a development run. It worked, but I had to move the parsing_func argument to the b.build() call. Is that correct?

view this post on Zulip Max Grover (Jun 21 2021 at 14:51):

Yes! We recentuly updated the API, and included these changes within the ecgtools documentation and the ESDS ecgtools blog post.. thanks for pointing this out!

view this post on Zulip Daniel Marsh (Jun 21 2021 at 18:42):

@Max Grover , Looking at the documentation, I think you are using "history file" to mean what typically we would put into the h0 history files which hold a single monthly mean. However, history files are not limited to holding one time and most of the other history files (h1, h2, ...) contain multiple time entries.

view this post on Zulip Max Grover (Jun 21 2021 at 18:45):

So the defnition of "history" in this case is a file with several variables at a single time slice; whereas timeseries files are a single variable with multiple time entries. Is there a case where history files (time slice) coming out of the model has mutliple time steps?

view this post on Zulip Daniel Marsh (Jun 21 2021 at 18:51):

History files (at least for the atmosphere) typically contain both multiple variables and time entries. History output here is what is specified in the fincl1, fincl2, ... entries in the namelist. The h0s are the odd case in which it is chosen to have one time entry. The script that creates timeseries pulls apart by variable and concatenates the various history files. For example if the U, V, & T fields are put out on a daily basis they may be found in one of history file outputs (h1, h2, etc.) that might each have 30 time steps.

view this post on Zulip Sheri Mickelson (Jun 21 2021 at 19:00):

@Max Grover if you'd like to see an example of what @Daniel Marsh is referring to, you can look at this file
/glade/work/cmip6/cases/DECK/b.e21.BHIST.f09_g17.CMIP6-historical.011/user_nl_cam
At the top, nhtfrq indicates the time frequency of the the writes for each stream and mfilt indicates how many slices should be in each file for each stream. All of it is configurable and can change based on what the user would like.

view this post on Zulip Max Grover (Jun 21 2021 at 19:01):

Ahh okay - I will take a look at this use case... what would be a better way of differentiating between timeseries and history files?

view this post on Zulip Daniel Marsh (Jun 21 2021 at 19:05):

Ideally the code that creates a catalog and intake-esm would not care if a file had more than one variable or more than one time step. Is that possible? To me, a timeseries file is just a history file that has just one field in it.

view this post on Zulip Anderson Banihirwe (Jun 21 2021 at 19:16):

Ideally the code that creates a catalog and intake-esm would not care if a file had more than one variable or more than one time step. Is that possible?

The short answer is yes :slight_smile:. As long as the catalog records the "variable" column appropriately, everything should work fine i.e. for files with more than one variables, the "variable" column should contain rows with a list of variables. For "time-series" files, this column should contain rows with single entry (string) referring to the variable in the "time-series" file.

view this post on Zulip Max Grover (Jun 22 2021 at 14:05):

@Daniel Marsh coming back to this... would it better to state that "timeseries" files have a single variable for a series of times, whereas history files contain multiple variables in a single file and could have either a single time step or multiple, depending on the stream?

view this post on Zulip Daniel Marsh (Jun 22 2021 at 17:07):

Perhaps someone who creates the timeseries files can comment, but it seems to me that timeseries files are a subset of history files. I could easily specify a history file with one variable and 100's of time entries using the CAM namelist - I think that would be the equivalent of timeseries file.

view this post on Zulip Kevin Paul (Jun 22 2021 at 18:01):

@Daniel Marsh: Yes. That would be the equivalent of a timeseries. I didn't realize that the CAM namelist gave you that option. However, that would mean only outputting 1 variable, right? There is no option to generate multiple timeseries files directly from CAM, right?

view this post on Zulip Michael Levy (Jun 22 2021 at 18:06):

However, that would mean only outputting 1 variable, right? There is no option to generate multiple timeseries files directly from CAM, right?

@Kevin Paul I can't speak for CAM directly, but most of the CESM components have a hard limit on the number of streams a user can write. Raising the cap would require source mods (or possibly just updating a namelist variable in some cases). E.g. POP has a Fortran parameter named max_avail_tavg_streams that is 9 by default

view this post on Zulip Kevin Paul (Jun 22 2021 at 18:06):

@Michael Levy: I see. Thanks for the clarification.

view this post on Zulip Daniel Marsh (Jun 22 2021 at 18:07):

Unfortunately, the number of history files is limited (about a dozen, I think) so it's not practical create all the timeseries this way. Mainly it was a terminology point - history vs timeseries (latter being a special case of the former).

view this post on Zulip Max Grover (Jun 25 2021 at 20:39):

In this week's ESDS blog post , we look into using a Jupyterbook for documenting CESM model output. This builds a bit off of the past few posts, and provides an example of where Jupyterbooks can be helpful for documentation and sharing your work with others!

Blog Post

view this post on Zulip Julia Kent (Jun 29 2021 at 21:15):

With the merge of the ESDS and Xdev blogs, all of the previous links to the Xdev blog are broken. Is there a way to make these addresses automatically forward to the new page location? I am getting lots of concerned messages from tutorial seminar series participants about it.

view this post on Zulip Max Grover (Jun 29 2021 at 21:26):

@Julia Kent where are the links located? Is through email or github repos?

view this post on Zulip Max Grover (Jun 29 2021 at 21:28):

Could you send an email to everyone on the list stating that the previous tutorial content can be found here?
https://ncar.github.io/esds/blog/tag/python-tutorial-series/

view this post on Zulip Anderson Banihirwe (Jun 29 2021 at 21:38):

@Julia Kent,

all of the previous links to the Xdev blog are broken. Is there a way to make these addresses automatically forward to the new page location?

I set up the redirect for the top-level domain. I thought this was enough at the time. I am going to set up redirects for individual posts shortly

view this post on Zulip Anderson Banihirwe (Jun 29 2021 at 21:57):

@Julia Kent, could you try those (old) links that were broken when you get a moment? They should redirect you to the corresponding links on ESDS blog... Let me know if I missed any link

view this post on Zulip Julia Kent (Jun 30 2021 at 00:02):

The links are from past emails, so I can't edit them. Sending a group email is a good idea! Thanks @Max Grover

view this post on Zulip Julia Kent (Jun 30 2021 at 00:06):

@Anderson Banihirwe It works now! Thanks for doing that. I also sent out an email.

view this post on Zulip Kevin Paul (Jun 30 2021 at 00:21):

Thanks, @Anderson Banihirwe!

view this post on Zulip Max Grover (Jul 02 2021 at 21:34):

Hi All! I had an opportunity to attend a "Scaling Python with Dask" class this week and put together a blog post detailing some of the main takeways + information helpful to the NCAR community! Feel free to check it out, and reach out if you have any questions

https://ncar.github.io/esds/posts/scaling-with-dask-class-takeaways/

view this post on Zulip Max Grover (Jul 30 2021 at 17:15):

Happy Friday All!

This week, we have two ESDS blog posts, including one detailing the main takeaways from the Scientific Python (SciPy) conference, and the other a detailed overview of the Project Pythia Portal, which is a fantastic resource for getting started with Python, especially within the geosciences.

Be sure to check out these posts!

Blog Post Links

SciPy Conference 2021 Takeaways

Project Pythia Portal Overview

Also, as a reminder, Xdev will be hosting virtual office hours on Monday from 3-5 PM - details can be found on the ESDS calendar page.

view this post on Zulip Max Grover (Aug 06 2021 at 22:05):

Interested in using the CESM2-Large Ensemble? Check out the latest ESDS blog post detailing how to access that dataset on GLADE using intake-esm

view this post on Zulip Max Grover (Aug 13 2021 at 21:52):

Hi All! Happy Friday! I put together a post using data provided by @Stephen Yeager , using interactive visualization libraries in Python to plot CESM data on an unstructured grid. Check it out! https://ncar.github.io/esds/posts/cesm-datashader/

view this post on Zulip Max Grover (Aug 20 2021 at 22:30):

This week's ESDS blog post details how to use various Python tools to plot interactive CESM diagnostic plots. We walk through everything from data access, setting up your plot, to putting together the final multi-panel visualizations.

https://ncar.github.io/esds/posts/2021/intake-esm-holoviews-diagnostics/

Have a great weekend!

view this post on Zulip Alice DuVivier (Aug 24 2021 at 15:06):

Max Grover said:

Interested in using the CESM2-Large Ensemble? Check out the latest ESDS blog post detailing how to access that dataset on GLADE using intake-esm

I have been following this example and am finding today that at the "import dask" and getting a cluster going the notebook is just hanging. Any suggestions here? I've been waiting about 15 minutes and have tried restarting before that.

view this post on Zulip Michael Levy (Aug 24 2021 at 15:11):

@Alice DuVivier what machine are you on? I'm having trouble running qsub from the command line on casper so I wonder if there's a PBS issue? I'm going to submit a ticket to CISL.

Has anyone else been able to launch a job on casper this morning?

view this post on Zulip Max Grover (Aug 24 2021 at 15:11):

It looks like even accessing a compute node is slow this morning... looks like someone is running a large job using NCL which is using quite a few resources

view this post on Zulip Max Grover (Aug 24 2021 at 15:13):

The queue is pretty full.. there are quite a few more jobs lined up. Dask is interfacing with the compute nodes, so if there are not any resources available, it won't spin up... one option is to use the LocalCluster, but that might have limited resources depending on what you requested when accessing the JupyterHub or if you are on the login node

view this post on Zulip Kristen Krumhardt (Aug 24 2021 at 15:14):

I can't get a get a casper job started on jupyterhub this morning .. just keeps timing out

view this post on Zulip Michael Levy (Aug 24 2021 at 15:17):

I was finally able to get qsub to run, but it took a while. I think Max is right about resources on the log-in nodes being in short supply (and the cluster being in heavy use would also add to the wait times)... I don't think it makes sense to ask CISL-help about it, so I'm not going to email them just yet

view this post on Zulip Alice DuVivier (Aug 24 2021 at 15:21):

I am on casper also.

view this post on Zulip Max Grover (Aug 24 2021 at 15:35):

CISL is aware of the problem

view this post on Zulip Alice DuVivier (Aug 24 2021 at 15:46):

Thanks! So just hold on? My notebook is still just hanging. Should I stop it and restart my kernel or just restart the whole session?

view this post on Zulip Max Grover (Aug 24 2021 at 15:52):

If you restart the whole session you might not be able to get back in... I would wait... I'd imagine they will send out a notification once this issue is resolved.

view this post on Zulip Max Grover (Aug 24 2021 at 18:43):

Not sure if you all saw, but the system is back to normal

view this post on Zulip Max Grover (Aug 27 2021 at 22:19):

Interested in creating a data catalog from observational datasets already on GLADE? Check out this week's blog post walking through building a catalog from the AMWG diagnostic package observational datasets, comparing against the CESM2-Large Ensemble!

https://ncar.github.io/esds/posts/2021/intake-obs-cesm2le-comparison/

view this post on Zulip Max Grover (Sep 24 2021 at 20:15):

Happy Friday!

In this week's ESDS blog post, we detail how to build a model diagnostics package using the Jupyter ecosystem! This workflow produces an interactive webpage which can be shared with others, parametrizable via a configuration file which allows the flexibility to use on a variety of use cases.

Check it out using this link!

view this post on Zulip Max Grover (Oct 01 2021 at 12:17):

Happy Friday!

Interested in what progress has been made related to ESDS over the past few months? We put together a blog post detailing some key highlights, including Work in Progress Talks, Xdev Office Hour statistics, and links to various Python tutorial sessions which have been taught so far!

This is definitely worth a read :grinning_face_with_smiling_eyes:

Link to the post

view this post on Zulip Matt Long (Oct 01 2021 at 12:21):

@Max Grover, thanks for putting this together, it's really great to see!

view this post on Zulip Max Grover (Oct 08 2021 at 22:27):

Happy Friday!

Interested in comparing the CESM2-Large Ensemble with observations? We put together a post reproducing a figure from the original LENS paper (Kay et al. 2015), using data from the CESM2-Large Ensemble! We use Intake-ESM, Dask, and hvPlot to put the final figure together!

Link to the post

Here is a gif of the interactive figure at the end!
kay_et_al_lens2.gif

view this post on Zulip Max Grover (Oct 15 2021 at 20:48):

Interested in using Xarray + Dask with WRF data? Check out this week's ESDS blog post where we cover how to use a new xarray backend, xwrf, to read in WRF data and create interactive plots of the data!

Link to the post

Here is a preview of what you can create using this example:
xarray_wrf_blog_post.gif

view this post on Zulip Max Grover (Oct 22 2021 at 19:06):

Interested in extending your data catalog? Check out this week's ESDS blog post detailing how to add derived variables to an Intake-ESM catalog! This enables users to add additional diagnostic variables that may not be in the original dataset.

Link to the blog post

view this post on Zulip Matt Long (Oct 22 2021 at 20:04):

This is great work by @Anderson Banihirwe and @Max Grover!

view this post on Zulip Deepak Cherian (Feb 24 2022 at 22:43):

New blog post from @Katie Dagon and I!

An experiment with representing sparse arrays (here vegetation types from CLM) with xarray + dask. This was prompted by a Zulip conversation from a few months ago.

view this post on Zulip Katie Dagon (Feb 24 2022 at 23:07):

Thanks @Deepak Cherian ! Python tools for remapping PFT-level output came up in our CLM meeting today, so this is excellent timing.
cc @Danica Lombardozzi @Will Wieder @Daniel Kennedy

view this post on Zulip Anderson Banihirwe (Feb 24 2022 at 23:08):

This was prompted by a Zulip conversation from a few months ago.

Xref the relevant Zulip topic Code examples for plotting CLM Plant Functional Types

view this post on Zulip Danica Lombardozzi (Feb 24 2022 at 23:25):

Thanks @Deepak Cherian and @Katie Dagon ! It's great to see this -- translating from vector to mult-dimensional grids has always been problematic and time consuming! LENS2 has numerous vector files, and I wonder if we can devise a way to handle multiple files at once to make it easier for people to work with those data

view this post on Zulip Deepak Cherian (Feb 25 2022 at 03:30):

Thanks Danica. The function works with xarray Datasets, so as long as you make one using xr.open_mfdataset it should work! (assuming it has the appropriate 1D PFT variables)

view this post on Zulip Deepak Cherian (Mar 31 2022 at 16:51):

Two new blogposts!

  1. Anderson & @Matt Long demonstrate "Batch Processing Jupyter Notebooks with Papermill"
  2. I wrote up a dask debugging adventure looking at detrending. It has simple tips on what to look for if your dask workflows aren't flowing.

view this post on Zulip Katie Dagon (May 12 2022 at 22:14):

@Julia Kent has a great new blog post on debugging, which I initially discovered via the Project Pythia twitter account :grinning:

view this post on Zulip Julia Kent (May 12 2022 at 22:19):

Thanks Katie!

view this post on Zulip Deepak Cherian (Jul 28 2022 at 21:21):

After attending today's great CuPy tutorial at the GPU workshop, I adapted some of that notebook to showcase integrating cupy arrays in xarray

view this post on Zulip Anderson Banihirwe (Aug 30 2022 at 22:00):

@geocat, you might find this interesting given your recent work exploring GeoCAT routines on GPUs: https://xarray.dev/blog/xarray-kvikio

view this post on Zulip Anderson Banihirwe (Aug 31 2022 at 23:58):

for folks interested in keeping track of physical units while using Xarray: https://xarray.dev/blog/introducing-pint-xarray

view this post on Zulip Katie Dagon (Dec 02 2022 at 21:35):

New blog post from @Heather Craker on seasonal averaging using @geocat's climatology_average function. This post came out of some discussions we had during the ESDS Event a couple weeks ago.
https://ncar.github.io/esds/posts/2022/xarray-groupby-vs-geocat-climatology/

view this post on Zulip Deepak Cherian (Dec 12 2022 at 17:15):

New blog post with @Christine Shields on regridding CAM-SE using xESMF + an existing ESMF weights file: https://ncar.github.io/esds/posts/2022/cam-se-regridding/

view this post on Zulip Katie Dagon (Dec 19 2022 at 18:36):

New blog post recapping the ESDS Event back in November. All materials from that event (recordings, slides, notes) are linked in the post:
https://ncar.github.io/esds/posts/2022/esds-event-recap/

view this post on Zulip Deepak Cherian (Mar 09 2023 at 22:52):

New blog post on generating virtual aggregate datasets for CESM MOM6 output with the kerchunk package:
https://ncar.github.io/esds/posts/2023/kerchunk-mom6/

we use kerchunk to generate a JSON file containing “references” to binary blocks stored elsewhere. The JSON file is structured to look like a Zarr dataset. Such a file can be interpreted as an aggregate Zarr dataset using fsspec and zarr.

Amongst other things, this approach allows you to open a dataset by only reading a single JSON file (and not touching 1000s of netCDF files, and not having to coerce Xarray into combining them sensibly)

view this post on Zulip Julia Kent (May 11 2023 at 19:12):

New blog post for anyone hoping to help out with office hours!

https://ncar.github.io/esds/posts/2023/office-hours-help/

view this post on Zulip Deepak Cherian (May 15 2023 at 19:45):

Recap of the unstructured grid collab work time event: https://ncar.github.io/esds/posts/2023/unstructured-grid-collab-1/ . It was fun!

view this post on Zulip Deepak Cherian (Jun 27 2023 at 17:10):

Zarr developer blog post on recent talks about Zarr enhancements: https://zarr.dev/blog/zarr-talks/

view this post on Zulip Deepak Cherian (Jul 03 2023 at 15:28):

New Xarray blogpost on "cubed", an alternative to dask: https://xarray.dev/blog/cubed-xarray

view this post on Zulip Deepak Cherian (Jul 14 2023 at 04:41):

New blogpost on Pangeo-forge: https://medium.com/pangeo/pangeo-forge-is-all-in-on-apache-beam-d7370299405f

Pangeo Forge, a modular Python toolkit for reproducibile and scalable production of analysis ready cloud optimized (ARCO) ocean, weather, and climate datasets, is now all in on Apache Beam. This blog recaps the background of Pangeo Forge and its prior architecture, explains the motivations and costs of this major shift in direction, and concludes with a discussion of the opportunities presented by this change.
...
We’re so excited to start putting the 0.10.0 release to use building ARCO data for the community, and in particular building ambitious datasets that previously felt unattainable due to their scale or technical complexity, but which are now within reach due to the scalability and features of Apache Beam.

view this post on Zulip Deepak Cherian (Jul 18 2023 at 19:28):

New blogpost on speeding up Xarray groupby with flox:

Significantly faster groupby calculations are now possible through a new-ish package in the Xarray/Dask/Pangeo ecosystem called flox.
Practically, this means faster climatologies, faster resampling, faster histogramming, and faster compositing of array datasets.
It also means that very very many discussions in the Pangeo community are now closed :tada: :scream: 🤯 🥳.

view this post on Zulip Katie Dagon (Aug 29 2023 at 17:43):

New blog post from @Elena Romashkova recapping ESDS at SciPy 2023!

view this post on Zulip Katie Dagon (Sep 12 2023 at 23:17):

New ESDS blog post on Analyzing and visualizing CAM-SE output in Python :tada:

This one was in the works for a while, inspired by the event ESDS hosted this past spring on working with unstructured grids.

BIG thanks to all who participated in those discussions, contributed code for CAM-SE analysis/viz, and reviewed the blog post. Let us know what you think / if you try out the code!

view this post on Zulip John Clyne (Sep 13 2023 at 19:12):

Katie Dagon said:

New ESDS blog post on Analyzing and visualizing CAM-SE output in Python :tada:

This one was in the works for a while, inspired by the event ESDS hosted this past spring on working with unstructured grids.

BIG thanks to all who participated in those discussions, contributed code for CAM-SE analysis/viz, and reviewed the blog post. Let us know what you think / if you try out the code!

Nice work! FYI GeoCAT's Philip Chmielowiec will be giving a talk at CISL's next WIP session on the status of visualizing unstructured grids (e.g. CAM-SE) without resampling using UXarray , Tuesday, October 10, 1pm in the Mesa Lab Main Seminar Room, or virtually zoom.

view this post on Zulip Daniel Adriaansen (Sep 13 2023 at 22:49):

Hi @John Clyne is there a calendar invite or formal event page for that 10 October event?

view this post on Zulip John Clyne (Sep 13 2023 at 22:58):

Yes there is. Here's the calendar invite

view this post on Zulip Daniel Adriaansen (Sep 13 2023 at 23:33):

Thanks John. That is the event on your calendar only, and it simply says "busy" with no other details. I tried checking the CISL calendar here: https://www2.cisl.ucar.edu/events/calendar, but I don't see anything for 10 Oct. Maybe I'll keep an eye on staff notes daily for a bit.

view this post on Zulip John Clyne (Sep 14 2023 at 02:06):

Hmm. I'm not sure how to re-share the calendar invite. I'll look into it. BTW, Katie pointed out that Philip will be giving a version of this talk at the Oct 2. ESDS forum.

view this post on Zulip John Clyne (Sep 25 2023 at 22:06):

Here are the meeting coordinates for this:

CISL WIP Talks
Tuesday, October 10 · 1:00 – 2:00pm
Time zone: America/Denver
Google Meet joining info
Video call link: https://meet.google.com/xxr-vqnk-pua
Or dial: ‪(US) +1 484-416-4699‬ PIN: ‪280 807 531‬#
More phone numbers: https://tel.meet/xxr-vqnk-pua?pin=4128857278087

view this post on Zulip Deepak Cherian (Jan 21 2024 at 04:28):

Cool post by @Negin Sobhani : https://xarray.dev/blog/cupy-tutorial

view this post on Zulip Katelyn FitzGerald (Mar 06 2024 at 19:49):

We have a new ESDS blog post with a brief recap of the 2024 Annual Event and links to slides, recordings (on the new ESDS YouTube channel:tada:), and other resources!


Last updated: May 16 2025 at 17:14 UTC