- https://github.com/alirezadir/Production-Level-Deep-Learning: A guideline for building practical production-level deep learning systems to be deployed in real world applications.
Gym: simulation environment for general reinforcement learning.
Our aim will be to train a policy that tries to maximize the discounted, cumulative reward.
Bellman Equation: https://en.wikipedia.org/wiki/Bellman_equation
Well, then what is its relationship with MDP (Markov Decision Process)?
The Huber loss acts like the mean squared error when the error is small, but like the mean absolute error when the error is large - this makes it more robust to outliers when the estimates of Q are very noisy.
Replay memory: random sampling
PIL image: http://effbot.org/imagingbook/image.htm
Policy_net and target_net are all DQN.
Policy_net’s parameters are being optimized.
We also use a target network to compute V(st+1) for added stability.
I don’t understand this stability thing. Also, why is replay memory important?
Replay memory is the source of data for training the network. And remove correlations.
Target network => target Q-values => loss
https://danieltakeshi.github.io/2016/12/01/going-deeper-into-reinforcement-learning-understanding-dqn/: How loss works
Transfer learning in PyTorch
- Scheduling the learning rate
- Saving the best model
generally you want a Variable to have gradient if it contains some learnable parameters.
Question: how to understand require_grad?
The weight etc. are stored in tensors, and they are connected in a gra[h of Functions.
Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers
Boosting: ensemble meta-algorithm
https://en.wikipedia.org/wiki/Ensemble_learning: combine multiple hypotheses to form a (hopefully) better hypothesis
Can a set of weak learners create a single strong learner?
Upsampling is the process of inserting zero-valued samples between original samples to increase the sampling rate.
FCN: Fully convolutional network
Typically, the workflow of a collaborative filtering system is: 1. A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain. 2. The system matches this user's ratings against other users' and finds the people with most "similar" tastes. 3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)