Hi All,
I am a new Zulip/ESDS user from HAO working with the Mauna Loa Solar Observatory, trying to get a handle on what resources are available and how to access them.
I want to use a particular problem as an example to motivate this resource search. The new MLSO instrument UCoMP has periodic camera glitches, where some fraction of the image shifts some number of pixels to the left, leaving an annoying signal. Based on initial assessments, one glitched image is for every 10,000-50,000 saved images. So outside of reports from our users, these can be tricky to find. I developed a method for correcting these glitches (once identified), but we need a robust way to find the glitched images. Using some naive metrics to look for elevated signal in regions of the image we expect to be a low signal, we have identified about 70 of these glitched images. Still, the method finds 30 false positives for every actual glitches image.
I have never done machine learning, but given how much we see AI in the news, I decided to try using image classification to find the glitch images. As a first attempt, I followed the tutorial:
https://www.tensorflow.org/tutorials/images/classification
But with only 70 images in the glitch classification, I have been unable to get the model to do anything useful when trying to sort these from the false positives. When I use a few thousand false positives, the model happily tells me none of the images are glitches; all the images are "false positives ."As I reduce teh number of images in the "false positive" classification, I can teach the model to find some glitched images in the glitch classification. Still, it gets the image classification wrong about 50% of the time, so it isn't helpful.
So two questions. 1) Where are the best places to look across NCAR for resources/help with this kind of problem? Does someone do ML office hours or have writeups to complement what is available from the web? I live most of my work life in a team of 6 at HAO, so I don't have a good feel of what resources are out there if our team doesn't already use them. 2) Specifically, what should be my next steps if I am talking to experts in this kind of thing? I would be happy to share notebooks or images etc., but I didn't in this post to try to focus answers on the first question.
-Ben
Hi @Benjamin Berkey ! Thanks for posting. Here's a couple of things I can think of:
I would be happy to help too, I have a office hours you can book here -> https://calendar.app.google/ZsM8dLHLa65eGAr39
Wondering if clustering would also work for this.
HI Kaite
Thanks for the advice on this; it needed more true positives. After I found 20 more true positives, the model started giving the correct predictions on the test data. I am now running part of our unable data set and finding even more true positives.
Thomas,
Thanks for the offer, I will try to push what I have a little further and then book some time the next time I get stuck.
-Ben
-
Last updated: May 16 2025 at 17:14 UTC