# Machine learning

## Fundamental

1. 平方和损失函数(square loss):L(yi,f(xi)) = (yi - f(xi))2,常用于回归中如最小二乘，权重可直接初始化，再通过梯度下降不断更新。
2. 铰链损失函数(Hing loss): L(mi) = max(0,1-mi(w)),常用于 SVM 中，在 SVM 损失函数表示为： L(y(i),x(i)) = max(0,1-y(i)f(x(i)))
3. 对数损失函数：L(yi,f(xi)) = -logP(yi|xi),常用于逻辑回归
4. 对数损失函数：L(yi,f(xi)) = exp(-yif(xi)),主要应用于 Boosting 算法中

## DQN

Gym: simulation environment for general reinforcement learning.

Our aim will be to train a policy that tries to maximize the discounted, cumulative reward.

Bellman Equation: https://en.wikipedia.org/wiki/Bellman_equation

Well, then what is its relationship with MDP (Markov Decision Process)?

The Huber loss acts like the mean squared error when the error is small, but like the mean absolute error when the error is large - this makes it more robust to outliers when the estimates of Q are very noisy.

Replay memory: random sampling

PIL image: http://effbot.org/imagingbook/image.htm

Policy_net and target_net are all DQN.

Policy_net’s parameters are being optimized.

We also use a target network to compute V(st+1) for added stability.

I don’t understand this stability thing. Also, why is replay memory important?

Replay memory is the source of data for training the network. And remove correlations.

Target network => target Q-values => loss

https://ai.stackexchange.com/questions/4740/i-dont-understand-the-target-network-in-deep-q-learning

## Transfer learning in PyTorch

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

• Scheduling the learning rate
• Saving the best model

generally you want a Variable to have gradient if it contains some learnable parameters.

The weight etc. are stored in tensors, and they are connected in a gra[h of Functions.

http://cs231n.github.io/transfer-learning/

Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers

Regression/classification

Boosting: ensemble meta-algorithm

https://en.wikipedia.org/wiki/Ensemble_learning: combine multiple hypotheses to form a (hopefully) better hypothesis

Can a set of weak learners create a single strong learner?

## Upsampling

Upsampling is the process of inserting zero-valued samples between original samples to increase the sampling rate.

https://www.quora.com/What-is-the-difference-between-Deconvolution-Upsampling-Unpooling-and-Convolutional-Sparse-Coding

FCN: Fully convolutional network

## Collaborative filtering

https://en.wikipedia.org/wiki/Collaborative_filtering

Typically, the workflow of a collaborative filtering system is:

1. A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain.
2. The system matches this user's ratings against other users' and finds the people with most "similar" tastes.
3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)