Skip to content

Machine learning

Production

  • https://github.com/alirezadir/Production-Level-Deep-Learning: A guideline for building practical production-level deep learning systems to be deployed in real world applications.

DQN

Gym: simulation environment for general reinforcement learning.

Our aim will be to train a policy that tries to maximize the discounted, cumulative reward.

Bellman Equation: https://en.wikipedia.org/wiki/Bellman_equation

Well, then what is its relationship with MDP (Markov Decision Process)?

The Huber loss acts like the mean squared error when the error is small, but like the mean absolute error when the error is large - this makes it more robust to outliers when the estimates of Q are very noisy.

Replay memory: random sampling

PIL image: http://effbot.org/imagingbook/image.htm

Policy_net and target_net are all DQN.

Policy_net’s parameters are being optimized.

We also use a target network to compute V(st+1) for added stability.

I don’t understand this stability thing. Also, why is replay memory important?

Replay memory is the source of data for training the network. And remove correlations.

Target network => target Q-values => loss

https://ai.stackexchange.com/questions/4740/i-dont-understand-the-target-network-in-deep-q-learning

https://danieltakeshi.github.io/2016/12/01/going-deeper-into-reinforcement-learning-understanding-dqn/: How loss works

Transfer learning in PyTorch

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

  • Scheduling the learning rate
  • Saving the best model

generally you want a Variable to have gradient if it contains some learnable parameters.

Question: how to understand require_grad?

https://pytorch.org/docs/master/notes/autograd.html

The weight etc. are stored in tensors, and they are connected in a gra[h of Functions.

http://cs231n.github.io/transfer-learning/

Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers

Gradient Boosting

https://en.wikipedia.org/wiki/Gradient_boosting

Regression/classification

https://en.wikipedia.org/wiki/Boosting_(machine_learning)

Boosting: ensemble meta-algorithm

https://en.wikipedia.org/wiki/Ensemble_learning: combine multiple hypotheses to form a (hopefully) better hypothesis

Can a set of weak learners create a single strong learner?

Upsampling

Upsampling is the process of inserting zero-valued samples between original samples to increase the sampling rate.

https://www.quora.com/What-is-the-difference-between-Deconvolution-Upsampling-Unpooling-and-Convolutional-Sparse-Coding

FCN: Fully convolutional network

Collaborative filtering

https://en.wikipedia.org/wiki/Collaborative_filtering

Typically, the workflow of a collaborative filtering system is: 1. A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain. 2. The system matches this user's ratings against other users' and finds the people with most "similar" tastes. 3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)