Deep learning in Astronomy

Hello there!

If you are reading this post, I must assume you found it to do a background reading for the Summer school.

If you did not find this for the summer school, no probs! Just go ahead, and try to have a good read.

First off, the Summer school has been partitioned in 4 ways: (i). Introduction to General Relativity, Relativistic effects in Astronomy- almost purely qualitative,(ii). Electromagnetism, electromagnetic radiation in Astronomy, (iii). Basics of Astronomy- Optics, filters, and observations,  (iv). Machine learning and Deep learning in Astronomy.

Please note I will not put a lot of math, but instead refer you to corresponding books to do the same. Keeping in mind the session is for First year undergrads, I am trying to do some Mathematica coding for the same (if I am not lazy, that is )

Finally, don’t hesitate to post questions if any on the comments section!

Other lectures in this series:

Part I: Intro to Relativity

If there is anyone interested to have  a discussion on any of these topics, you could comment here, of just search for the title on Quora, wherein I have uploaded all of this stuff under Abstracted Abstract Science.

We have seen different kinds of radiation, their causes, and how they are measured. We now have the data. What do we do with it?

Of course, manually browse the data and infer some physics. Sounds easy huh? Not quite.

Given the sheer magnitude of missions undertaken by NASA, ESA and ISRO[1][2][3], we will end up having Petabytes of data, and no manpower to go through all of them manually. In such a case, we will need intelligent systems which can devour large data like how the saiyans eat food 😛

The most obvious choice for this is Artificial Intelligence, or its subset of Machine learning and Deep learning. Deep learning is deep, so let’s dive into its applications in Astronomy!

This section is not organized like previous lectures, but is more like in the format of the Summer school talk. I would strongly recommend you to go through the previous parts before proceeding.

Deep learning is a set of mathematical models which try to fit a given dataset, to put it bluntly. Basically, one has some form of data- images, time series, etc- and one has a Black box, into which the data is fed to do whatever application needs to be done.

Let us see what kinds of data, and what kinds of models we have in this subject.

Datasets

There is just a LOT of data present to do astronomy, both near and far from our planet.

Close to us, we have Solar and Spaceweather data from a lot of instruments aboard Solar Dynamics Observatory, WIND, ACE, etc. A typical dataset from SDO looks like:

Please note the raw data doesn’t have any of the colours shown. The raw data is just Black-Grey-White, and the colours are assigned keeping in mind the wavelengths represented, or just by some custom rule of thumb. Look at how certain wavelengths capture the Solar Corona, also known as the Atmosphere of Sun. Each wavelength corresponds to certain transition(s) of elements of interest. There is either optical data, or Magnetogram- which basically measures some of the Fe lines split due to Zeeman effectwhich gives us the magnetic field intensity plot[5]. Learning about the data is as important as making the model, since pre-processing of data should be consistent with the data acquisition method.

We also have Deep-space multi-wavelength data, taken from an assortment of different scopes like Chandra, Spitzer[now defunct], Hubble, etc. The data looks like:

Some deep learning systems require a lot of labelled data, which means there is some sort of ‘Input-Output’ dataset available. For example, there are Citizen projects called Galaxy zoowhich basically have citizen scientists do the labelling of data, and gives out the really huge dataset for researchers. Work of citizen scientists include classifying galaxies on their types from given old photographs. More info is given on their website. The classification workflow is given below, though:

We also get Line emissions [Transitions of electrons and nuclei] and Continuum emissions [like synchrotron, Bremsstrahlung, etc] from satellites like Astrosat, and VLT, which look like this:

Speaking of exoplanets, we have many Exoplanets discovered due to Light-curves of stars from Kepler mission. Basically, light curves are luminosity (or brightness) plots of stars taken over time- for years together at times. And if conditions are correct, if a planet eclipses the star with respect to us, we can see a dip in the brightness. Such periodic dips can let us determine planets around stars. And this is how such data looks like:

So much for the data part! We can see the variety of data present in astronomy, and this is just not it! We have many more kinds of data, and we haven’t touched Cosmology yet. Also, this is not the raw data researchers generally use. There is a lot of processing in terms of increasing the SNR, or taking out features from this data.

That brings us to our most important thing in Deep learningFeature extraction. There are certain systems which extract features, or important defining charachteristics in the data, but more often than not, astronomers/astrophysicists use specific features in the data to be used for processing- for instance, in Solar physics, researchers might be more interested in the Area of the bright spots, or take an average pixel value of each image. But basically, if you are to expect anything from the Machine at all, you must provide it proper, defined inputs, which are a source of interest to you, and you know have some underlying science. For example, you cannot expect the system to learn anything if you wish to do something with the number of times Naruto says “Dattebayo” given he has just had some Ramen. Very difficult for the machine to learn.

And this brings us to the different kinds of tasks predominantly done in Deep learning. I have attached a lot of resources which would take you into the details of these algorithms, however, in this lecture, I shall present algorithms only as Blackboxes with charachteristics[11-16]. And the software that I am comfortable with coding in is called Tensorflow, a python-interface library by Google. It is very much Numpy-like, and easy to use. It has some memory issues, but I feel its very comfortable to get going with it (I learnt on tensorflow, though).

Now, Machine learning itself has different kinds of problems, some of which are:

1. Classification: Problems where the goal of the system is to say which Class, or label a given input corresponds to. We can think of this like the Galaxy zoo project. Given the image of a galaxy, the machine will have to say what kind of galaxy it is.
2. Regression: What is the value of a (set of) parameter(s) Y, given a (set of) parameter(s) X? This is the question answered by the Regressing model. What is the effective temperature of the star, given certain properties like its Metallicity, Luminosity, etc?
3. Time series regression or classification: If the data is a set of vectors at different time-stamps, it is known as a time series model. Given N days of data, what is the value on N+1th day?
4. Image reconstruction: I have some part of a given image destroyed, due to some reason. Can I have a machine which can best reconstruct the destroyed part of the image?
5. Generative modelling: Can I try, and generate Fake data, given some input? For example, can I have a system which generates fake Galaxy images?
6. Obtaining physics out of the data: By far the most difficult thing in learning. Can I infer some physics, in terms of equations or qualitatively by teaching the data to a machine? This has been done by using something called the Restricted Boltzmann Machines[17] by Condensed matter people, for finding the lattice structure and all. Astro people are still not there to see if any physics can be gotten out of these models.

The above applications have been discussed from a ‘Black-box’ perspective- we haven’t touched upon different ways of performing the same job. For example,there is no “single” model which does classification- there are different kinds of models, or in our case Networks, which perform the job. Let’s look at the most interesting of these models:

Neural networks: Multi-layer perceptron

These are a set of ‘cells’, which were modelled after Nerve cells, since people were exploring on making artificial memory. Our brain is a complex set of Nerve cells, with each cell passing a signal out to the next cell, depending on its activation potential- that is, only if the incoming signal lies in a certain range of signal strength, will it be transmitted to the next neuron. Else, the signal is just not given to the next neuron. Layers of such neurons are constructed, and they are made to map between a given input and output. In turn, the network learns the specifics of the mapping, and performs the same mapping on a new datapoint. The network may look like this:

The input layer is where the inputs are fed in, the output layers are where the outputs are obtained, and the hidden layer tries to learnt the mapping between both of them. Most people use more than 1 hidden layer, making it Deep, hence making the thing Deep learning.

The propagation of input is basically a set of matrix multiplications, which go as:

Now, do you see the activation part? This is basically a non-linear function operated on the linear Matrix transformation. The activation function can be a Sigmoid, Tanh or ReLu, which are by far the more famous activations.

Convolutional Neural Networks

We can do better, though. We need not have such Full-connections between the neurons, but can rather perform a Convolution, to obtain Convolutional Neural networks, which are the state of art models for Image manipulation. Their architecture, or neuron-arrangement looks like:

You must have seen the ‘automatic’ animal recognition software- these basically use CNNs, or Deep CNNs to be specific. There are many famous networks like AlexNet, LeNet, etc designed for such classification tasks.

Long short term memory cells

What do we do if we have timeseries of data? We give in sequential time-inputs and train on a set of time-segments together, using CNN and MLP. But then, the networks inherently don’t know there is a temporal feature in the dataset, and the network is forced to learn the same. Seeing this, people developed something called Recurrent Neural Networks, which have an explicit time relation- in which the output from the Network is fed into the the network again, signifying a second time stamp, thus making it explicit to the network.

Enough words, let’s look at the network:

Look at the first box corresponding to $x_0$? There is a special variable called State, which acts like the memory of the cell. The next input will modify the state a bit, and so on. The output from the cell at a give time depends on the state and the input.

Now, the description I gave was keeping in mind a more stable, and better version of the RNN, called the Long short term memory cell, or the LSTM. It looks like this:

That complicated box contains a hell lot of memory, probably even days of memory!

All our networks till now have a definite Input-Output relationship. But then there are networks, wherein such a relationship need not exist- the network might try to find out patterns existing in the given data.

Autoencoders

One such model is an Autoencoder. These are basically MLPs or CNNs, in which the input image/data is compressed into a condensed representation, thereby reducing dimensionality and redundancy in data. Their architecture looks like:

This encoding-decoding thing got pretty famous with AI and Computer vision people, and they developed such models using LSTMs too. Google’s language translation uses such a scheme- compress a given sentence in a language into a State, and decode this State into different languages. Pretty cool huh?

GANs, of Generative Adversarial Networks, are networks which generate a definite data from a given random input. . Their working mechanism is beyond the scope of this short series, but I can tell you this: they can be used to generate fake frames of videos, new, realistic images of bedrooms and all. They could potentially be used in astronomy too.

Now, let’s look at the application of such techniques to astronomy!

This section is written in such a way that I just touch upon the DL algorithm used for the specific task, and how was the performance (if given). I have not written about the work itself, or the ‘science aspects’ of the work.

i) Rotation Invariant Convolutional Neural Network for Transient Detection

Guillermo Cabrera-Vives, Ignacio Reyes, Francisco Forster, Pablo A. Estevez and Juan-Carlos Maureira

This work uses Convolutional Neural networks to try and classify true and false images of transients. The group uses HiTS supernova transient survey as their dataset, and artificially produced noise too. Basically, there are two kinds of data- Positive samples, which are Supernova transients, and Negative samples, which are not Supernova transients but some defect in CCD, or some statistical noise. The network is made to detect these positive and negative samples, and classify them accordingly. The CNN used is made rotation inavriant.

Generally, for better generalization, people tend to do rotations and translations to images, to augument the data.  Such transformations make the network generalize better, and makes its immune to symmetry. Such a procedure had been applied in this case too, and that is a core part of this work.

ii) Deep Neural Networks to Enable Real-time Multi-messenger Astrophysics

Daniel George and E. A. Huerta

These people again used a CNN, this time on a timeseries data. You must realise, LSTM and RNNs are among the hardest networks to be trained, and need some sort of great greedy guts to be trained. CNNs, however are (apparently) much easier to train. Hence, these people used CNNs to (i) Find if a given observation has a signal and (ii) Find the Blackhole parameters from the signal. Oh wait, where did the black hole come from?

So, these people are from LIGO, and are trying to automate the process of finding the ‘chirp signal’ from a noisy signal. Currently, a lot of matched-filtering is done by comparing the signal with Numerical relativity simulations, and obtaining the output. This process is tedious, and could be greatly sped up with a trained network. And if our model can be trained to predict what the massess and angular momenta of the merging black holes are, wouldn’t it greatly speed up the efficiency of the observatory?

The network seems to be very good in performance, but I would love to know the visualizations of the filter- which part of the network does the matched-filtering of the signal and all.

Meanwhile, the work can be found here:  https://arxiv.org/pdf/1701.00008.pdf

iii) Star-galaxy Classification Using Deep Convolutional Neural Networks

Edward J. Kim and Robert J. Brunner

Remember our Galaxy zoo project, and how it had a lot of labelled data, specifying the type of galaxy and star? Well, this was the phenomenal paper on star and galaxy classification. The authors used CNNs (again! look at how much popular CNNs are!) to do this classification, and achieved very good results!

This piece of work deserves to be read thoroughly, and it can be found in arxiv server here: https://arxiv.org/pdf/1608.04369.pdf

iv) Deep Recurrent Neural Networks for Supernovae Classification

If one keeps following up a star as it starts to go supernova, there is a change in the flux observed just before the event.  Now since we have models like RNNs which have temporal memory, people started looking into them to understand temporal prediction. And hence, the authors used RNNs wherein they fed in a give time sequence and checked if the network predicts occurrence of a supernova at the next time step.  Their model is not really a single LSTM, but a bi-directional model, in which a given time segment, let’s say from A to B, is fed in to one LSTM as A…..B and a parallel LSTM as B…..A, and a mean pooling is done to find the classification probability.

I personally felt this was a very neat implementation of something which I had been thinking about from the time I had heard about LSTMs. Tricky to train, but if done properly, they can give pretty good results!

The publication can be found here: https://arxiv.org/pdf/1606.07442.pdf

v) Enabling Dark Energy Science with Deep Generative Models of Galaxy Images

Siamak Ravanbakhsh , Francois Lanusse , Rachel Mandelbaum , Jeff Schneider , and Barnabas Poczos

This is a nice piece of work, wherein the authors explore the application of GANs, and Variational Autoencoders (think of it like GANs) to generate fake galaxy data. The authors wanted to generate such fake data for generating validation dataset for next-gen sky surveys.

The math in this paper is simply horrible– at least, I didn’t like it. It is some good amount of hardwork from the authors, but half of them are computer scientists, so I am not surprised. Nevertheless, the results are here:

If you have finished reading all of the other references, then go ahead and try understanding the paper here: https://arxiv.org/pdf/1609.05796.pdf

vi) Solar Flare Prediction Using SDO/HMI Vector Magnetic Field Data with a Machine-Learning Algorithm

M. G. Bobra and S. Couvidat

A phenomenal work in Solar physics was done by M.G Bobra using Machine learning (not deep learning) on doing Solar flare prediction using Support vector machines. The idea was simple: If one were to consider GOES (a satellite by NASA) X-ray fluxes of Active regions on the Sun, these fluxes were seen to reach a maximum just before flaring. Hence, such a prediction was done using SVMs, a shallow network which performs a non-linear separation of parameter space. The authors tried to use the SVMs, and use the maximum flux values [actually some 25 parameters] to see if a registered active region is a flare or not. This is one piece of work which focuses more on the input features, rather than the specific model used. The focus was also on different kinds of accuracy measures, since the authors faced immense imbalance in the dataset.

Dataset imbalance: Generally, in classification problems, for training and testing, one has $N_i$ samples for each class. Now suppose one of the classes constitutes for 80% of the dataset, while the others classes constitute the remaining part. If a system is trained on such a data, this is the thought process of the model:

Well dude, since you are giving me most examples from the same class, and the inputs are not really different by a large margin, I assume most of the data taken is from that class only. Hence, if I just predict the output as that class, unless there is some really large change in the input, I should be correct 80% of the time!

Believe me, that’s what happens, and I can say for sure, after working on the same dataset

I don’t have any image(s) that illustrate the model, but you will find the original work very interesting: https://arxiv.org/pdf/1411.1405.pdf

vii) Searching for exoplanets in the Kepler public data

Xiaofan Jin, and David Glass

This work was done by some students as a part of Standford’s course of CS 229 course. The students tried to use some simple ML algorithms to classify stars which have exoplanets. These people did a survey of a couple of algorithms, but our interest lies in an algorithm called Support vector machines. This is the same algorithm used by the Solar people above in this list.

Not going in any detail, SVMs are shallow networks which try to find a boundary to separate classes of data, by changing the parameter space to change non-linear boundaries between the classes to linear boundaries. The action is like this:

The original work by the students can be found here.

viii) LensFlow: A Convolutional Neural Network in search of strong Gravitational lenses.

Milad Pourrahmani, Hooshang Nayyeri, and Asantha Cooray

Honestly, I cannot believe this work was also done. Just during the summer school, I was talking about using CNNs or something for gravitational lensing, and just 4 days later, this work was published!

This work also used CNNs and performed transformations to augument the data. And then, the network was used for classifying lensed and unlensed (lol) objects. Their architecture, and filters look like this:

This is a very good piece of work with some good results, and even I am to scrutinize it thoroughly before commenting properly on the work. However, it made for a good overview-read for now!

The work can be found at the arxiv server: https://arxiv.org/pdf/1705.05857.pdf

I suppose I have given a lot of examples on how Deep learning is used in astronomy, and how it’s a largely emerging field. I hope students who go through this will be interested in learning DL and applying it to astronomy, for there are a lot of challenges in this application!

This lecture sort of concludes the 4-day summer school. Sorry, we couldn’t have an observation session due to bad-weather, but hey, we had lot of fun with the theory itself. These reads, along with some good courses on Edx and Coursera, will provide a productive way to use this summer for astronomy, if you are not going for internship anywhere.

Finally, I shall as usual attach some references at the end for further reading. Hope you all will have a great time going through them!

References:

Some other references one might find interesting:

1. L. Walkowicz, et al. Mining the Kepler Data using Machine Learning. American Astronomical Society. AAS Meeting 223,
2014
2. http://kiss.caltech.edu/new_website/workshops/imaging/presentations/fergus.pdf
3. http://www.cosmostat.org/meetings/learning-in-astrophysics/
4. Unsupervised feature-learning for galaxy SEDs with denoising
autoencoders, Frontera-Pons et.al, https://arxiv.org/pdf/1705.05620.pdf
5. Edx: Astrophysics X series, can be finished in the summer.
6. https://medium.com/@teamrework/discovering-extrasolar-planets-with-deep-learning-59a50a87e1ac
7. https://blogs.nvidia.com/blog/2016/08/01/deep-learning-will-speed-search-for-extraterrestrial-life/
8. Deep Learning Classification in Asteroseismology: https://arxiv.org/pdf/1705.06405.pdf