Skip to content

Information, Entropy, SNR, and Dabbling in Astrometry

If you ask me how much I like the fields of statistics and metrology, I will tell you that I need to exit the conversation so I can go rearrange my sock drawer. Conversely, if you ask me how I can prove some scientific claim about image processing and feature/object detection, I will immediately start spouting information based on statistics and metrology. What a mess.

I have been spending the last few years researching the foundations of detection theory. I wanted to understand how and why signals (generally, not just RF) are analyzed using statistical equations, like the standard deviation equation. Along the way I found that I really liked the elegance of explanations by O’Haver, Kutz, and Mackay (see references page).

Reading books like The Idea Factory by Gertner got me interested in the people who created Information Theory at Bell Labs. Lectures by people like Mackay helped me understand its concepts and applications, while also tying it back to engineering topics I was familiar with (entropy, SNR, etc). Like any other field of theory, I quickly came to realize that reading and watching experts was not going to get me to appreciate the subject matter.

Having a background in optical systems, I knew that any form of imaging for the purpose of detection, identification (classification), and/or measurement is rich with “signals”. I also know from personal experience that it is rife with NOISE. So much noise. Fun fact: if you relax your detection tolerances enough, your hastily-written OpenCV Aruco detection code will start tracking caution tape instead of the things you actually care about!

I zeroed in on the astronomy community. I’m not an astronautics or astronomy person by trade, hobby, accident, or divine intervention… but I know that if anyone is collecting data the right way, it would be the kids (grown-up kids) with big telescopes. After much dabbling with various APIs and sites, I fell in love with the NOIRLab’s data archive. Below you can see the number of calibrated, short exposure (under 10 seconds) images available at Kitt Peak (KP) and Cerro Tololo (CT) observatories using the NEWFIRM Infrared imaging instrument. I created these histograms to identify the largest dataset available if I narrowed my study down to a single exposure duration, with a single filter, at a single location.

To get serious about my learning, I put my technical program manager muscles back into action. I knew I wanted to use astrometry (the measure of stars and celestial objects) to test my understanding of signal/image processing as well as information theory but in order to do that, I’d need to understand the pipelines used to process the imagery, the pipeline’s limitations, and the uncertainty of its results.

I titled the project “astroPipeline” because I’m horribly unoriginal when I code. The major chunks of work are organized into Epics with an attempt [just an attempt] to isolate each Epic’s work. One benefit of being a TPM, engineer, and code dev simultaneously is that building the Jira plan actually helps me think through how to structure the code. I was not lucky enough to get a California free-flowing code development gene.

Wrapping up this ramble, I am happy to say that I have already knocked out a fairly robust extract-transform-load (ETL) epic that allows me to generate my own normalized luminance astro images from raw data. I am now delving into the weird and wacky world of correcting image optical distortion using star catalogs. More to follow!