Finding appropriate resources · new members

Stream: new members

Topic: Finding appropriate resources

Benjamin Berkey (May 08 2023 at 21:32):

Hi All,
I am a new Zulip/ESDS user from HAO working with the Mauna Loa Solar Observatory, trying to get a handle on what resources are available and how to access them.

I want to use a particular problem as an example to motivate this resource search. The new MLSO instrument UCoMP has periodic camera glitches, where some fraction of the image shifts some number of pixels to the left, leaving an annoying signal. Based on initial assessments, one glitched image is for every 10,000-50,000 saved images. So outside of reports from our users, these can be tricky to find. I developed a method for correcting these glitches (once identified), but we need a robust way to find the glitched images. Using some naive metrics to look for elevated signal in regions of the image we expect to be a low signal, we have identified about 70 of these glitched images. Still, the method finds 30 false positives for every actual glitches image.

I have never done machine learning, but given how much we see AI in the news, I decided to try using image classification to find the glitch images. As a first attempt, I followed the tutorial:
https://www.tensorflow.org/tutorials/images/classification

But with only 70 images in the glitch classification, I have been unable to get the model to do anything useful when trying to sort these from the false positives. When I use a few thousand false positives, the model happily tells me none of the images are glitches; all the images are "false positives ."As I reduce teh number of images in the "false positive" classification, I can teach the model to find some glitched images in the glitch classification. Still, it gets the image classification wrong about 50% of the time, so it isn't helpful.

So two questions. 1) Where are the best places to look across NCAR for resources/help with this kind of problem? Does someone do ML office hours or have writeups to complement what is available from the web? I live most of my work life in a team of 6 at HAO, so I don't have a good feel of what resources are out there if our team doesn't already use them. 2) Specifically, what should be my next steps if I am talking to experts in this kind of thing? I would be happy to share notebooks or images etc., but I didn't in this post to try to focus answers on the first question.

-Ben

Katie Dagon (May 08 2023 at 22:25):

Hi @Benjamin Berkey ! Thanks for posting. Here's a couple of things I can think of:

The NSF AI2ES institute has a nice page on AI/ML tutorials, including a couple of NCAR CISL machine learning tutorials.
I do wonder if you need more labels, i.e. true positives, to successfully train the ML model. We have this issue with climate applications too, not enough labeled data. I wonder if there is a way to identify more true positives and/or enhance the labeled data. You could look into data augmentation, for example.
Tagging a few ML folks here in case they have other ideas: @David John Gagne @Kirsten Mayer1 @William Chapman @John Schreck
FWIW, there is a #machine-learning zulip stream. It's pretty quiet but this could be a good topic for discussion or future ML discussions.

Thomas Martin (May 09 2023 at 18:47):

I would be happy to help too, I have a office hours you can book here -> https://calendar.app.google/ZsM8dLHLa65eGAr39

Wondering if clustering would also work for this.

Benjamin Berkey (May 10 2023 at 23:24):

HI Kaite

Thanks for the advice on this; it needed more true positives. After I found 20 more true positives, the model started giving the correct predictions on the test data. I am now running part of our unable data set and finding even more true positives.

Thomas,
Thanks for the offer, I will try to push what I have a little further and then book some time the next time I get stuck.

-Ben

Last updated: May 16 2025 at 17:14 UTC