DL MIT Lectures Notes
* Barack Obama: Intro to Deep Learning | MIT 6.S191
https://www.youtube.com/watch?v=l82PxsKHxYc&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=41&pp=iAQB
* MIT 6.S191 (2019): Introduction to Deep Learning
https://www.youtube.com/watch?v=5v1JnYv_yWs&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=42&pp=iAQB
* MIT 6.S191 (2019): Recurrent Neural Networks
https://www.youtube.com/watch?v=_h66BW-xNgk&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=43&pp=iAQB
* MIT 6.S191 (2019): Convolutional Neural Networks
https://www.youtube.com/watch?v=H-HVZJ7kGI0&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=44&pp=iAQB
* MIT 6.S191 (2019): Deep Generative Modeling
https://www.youtube.com/watch?v=yFBFl1cLYx8&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=45&pp=iAQB
* MIT 6.S191 (2019): Deep Reinforcement Learning
https://www.youtube.com/watch?v=i6Mi2_QM3rA&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=46&pp=iAQB
* MIT 6.S191 (2019): Deep Learning Limitations and New Frontiers
https://www.youtube.com/watch?v=INja7C5_vqk&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=47&pp=iAQB
* MIT 6.S191 (2019): Visualization for Machine Learning (Google Brain)
https://www.youtube.com/watch?v=ulLx2iPTIcs&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=48&pp=iAQB
* MIT 6.S191 (2019): Biologically Inspired Neural Networks (IBM)
https://www.youtube.com/watch?v=4lY-oAY0aQU&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=49&pp=iAQB
* MIT 6.S191 (2019): Image Domain Transfer (NVIDIA)
https://www.youtube.com/watch?v=_MzaThb_jno&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=50&pp=iAQB
* MIT 6.S191 (2018): Introduction to Deep Learning
https://www.youtube.com/watch?v=JN6H4rQvwgY&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=51&pp=iAQB
* MIT 6.S191 (2018): Sequence Modeling with Neural Networks
https://www.youtube.com/watch?v=CznICCPa63Q&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=52&pp=iAQB
* MIT 6.S191 (2018): Convolutional Neural Networks
https://www.youtube.com/watch?v=NVH8EYPHi30&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=53&pp=iAQB
* MIT 6.S191 (2018): Deep Generative Modeling
https://www.youtube.com/watch?v=JVb54xhEw6Y&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=54&pp=iAQB
* MIT 6.S191 (2018): Deep Reinforcement Learning
https://www.youtube.com/watch?v=s5qqjyGiBdc&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=55&pp=iAQB
* MIT 6.S191 (2018): Deep Learning Limitations and New Frontiers
https://www.youtube.com/watch?v=l_yWLAQg7LU&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=56&pp=iAQB
* MIT 6.S191 (2018): Issues in Image Classification
https://www.youtube.com/watch?v=QYwESy6isuc&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=57&pp=iAQB
* MIT 6.S191 (2018): Faster ML Development with TensorFlow
https://www.youtube.com/watch?v=FkHWKq86tSw&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=58&pp=iAQB
* MIT 6.S191 (2018): Deep Learning - A Personal Perspective
https://www.youtube.com/watch?v=Z7YMDwzUTds&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=59&pp=iAQB
* MIT 6.S191 (2018): Beyond Deep Learning: Learning+Reasoning
https://www.youtube.com/watch?v=mNqVGB2HkXg&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=60&pp=iAQB
* MIT 6.S191 (2018): Computer Vision Meets Social Networks
https://www.youtube.com/watch?v=aFEnWHxUd7s&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=61&pp=iAQB
* MIT 6.S191 Deep Learning New frontiers
https://www.youtube.com/watch?v=FHeCmnNe0P8
Talks happens every Friday 10 AM PT between Fri Mar 10 - Fri May 12, 2023.
Yt = F(Xt, h.t-1) : Output depends on current input and past hidden state.
Each input step updates hidden state using weight matrices.
Hidden state is an array of numbers: The size of that array is called RNN units.
Different 3 weight matrices and 1 hidden state are involved:
Higher the rnn_units, the higher the capacity of RNN. Can recognize more features.
y = X * Wxh * Whh * Why
Update of hidden state is performed using tanh operation. The weight matrix multiplication is sort of Tan-Inverse operation:
ht = tanh( Whh * h(t-1) + Wxh * Xt)
# i.e. Take sort of tanh inverse of past hidden state and add with normalized input then apply tanh.
Can be used for 1-1, 1-N (Image captioning, Text generation from image), N-1 (Sentiment classification), N-N (Language translation), etc.
Sequence model should be able to:
Predict Next Word is an example application of sequence modeling application of type N-1:
Not clear how to do mini-batch training
* 2003: (R)NNLM — (Recurrent) Neural Network Language Models aka Bengio's Neural Language Model:
* 2013: Word2Vec - Word to Vector - Released from Google :
* 2014: GloVe - Released from Stanford:
* 2016: fastText - Released from Facebook:
Extends word2Vect
Each word treated as character n-grams!
Can generate embedding for unknown words?!
Context information is not properly accounted for. Still it is not clear why it is considered superior to GloVe.
https://medium.com/mlearning-ai/recurrent-neural-networks-learn-without-forgetting-598ba9daafe3
RNNs are neural networks with memory, which allows weight optimization to have a better understanding of the data than a fully connected network.
The RNN memory cell could be as simple as past N inputs or could be complex combination of short term and long term memory cells.
LSTM and GRU are improved RNNs that account for long-term memory and are more resistant to gradient vanishing.
Simple RNN:
import tensorflow as tf
rnn = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(32, input_shape=[None, 1]),
tf.keras.layers.Dense(1)
])
Deep RNN:
deep_rnn = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(32, return_sequences=True, input_shape=[None, 1]),
tf.keras.layers.SimpleRNN(32, return_sequences=True),
tf.keras.layers.SimpleRNN(32),
tf.keras.layers.Dense(1)
])
LSTM :
# RNN with LSTM cell
lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(32, return_sequences=True, input_shape=[None, 5]),
tf.keras.layers.Dense(14)
])
# RNN with GRU cell
gru_model = tf.keras.Sequential([
tf.keras.layers.GRU(32, return_sequences=True, input_shape=[None, 5]),
tf.keras.layers.Dense(14)
])
LSTM models could be stateless or stateful. i.e. If you don't want to remember anything older than N inputs (but only interested in predicting with-in that sequence), then you choose stateless.
LSTM example. See https://stackoverflow.com/questions/54009661/what-is-the-timestep-in-keras-lstm?rq=1:
My training set is structured as follow:
- number of sequences: 5358
- the length of each sequence is 300
- each element of the sequence is a vector of 54 features
Should we use sliding window_size ???
Should we give one entire sequence to single input layer so that it learns even the intra dependencies
with in that sequence ?
Is the input 54 x 1 or 300 x 54 x 1 features ?
Use simple stateless LSTM if all sequences are independent from each other.
See Dhavel's tutorial on LSTM: https://www.youtube.com/watch?v=LfnrRPFhkuY
codebasics github link: https://github.com/codebasics/deep-learning-keras-tf-tutorial/blob/master/1_digits_recognition/digits_recognition_neural_network.ipynb
Polynomial features are those features created by raising existing features to an exponent. See https://machinelearningmastery.com/polynomial-features-transforms-for-machine-learning/
If y_train values range from 1 to 10, then you change each value to one-hot-encoding vector of size 10: [0, 0, 1, 0, 0, ...] i.e. 3rd number is 1 to represent class 3. If you do that, the model loss is: loss = 'categorical_crossentropy'. If you use the number as it is, then use loss = 'sparse_categorical_crossentropy'
* Notes From Francois Deep Learning Book ======================================= * Prior to Neural networks, first prababilistic modeling Naive Bayes algorithm used for classification. The "Naive" assumption is that the features are all independent.