置信度计算规则为： 同时购买商品 A 和商品 B 的交易次数/购买了商品 A 的次数
支持度计算规则为： 同时购买了商品 A 和商品 B 的交易次数/总的交易次数
- 平方和损失函数(square loss):L(yi,f(xi)) = (yi - f(xi))2,常用于回归中如最小二乘，权重可直接初始化，再通过梯度下降不断更新。
- 铰链损失函数(Hing loss): L(mi) = max(0,1-mi(w)),常用于 SVM 中，在 SVM 损失函数表示为： L(y(i),x(i)) = max(0,1-y(i)f(x(i)))
- 对数损失函数：L(yi,f(xi)) = -logP(yi|xi),常用于逻辑回归
- 对数损失函数：L(yi,f(xi)) = exp(-yif(xi)),主要应用于 Boosting 算法中
- https://github.com/alirezadir/Production-Level-Deep-Learning: A guideline for building practical production-level deep learning systems to be deployed in real world applications. -
Gym: simulation environment for general reinforcement learning.
Our aim will be to train a policy that tries to maximize the discounted, cumulative reward.
Bellman Equation: https://en.wikipedia.org/wiki/Bellman_equation
Well, then what is its relationship with MDP (Markov Decision Process)?
The Huber loss acts like the mean squared error when the error is small, but like the mean absolute error when the error is large - this makes it more robust to outliers when the estimates of Q are very noisy.
Replay memory: random sampling
PIL image: http://effbot.org/imagingbook/image.htm
Policy_net and target_net are all DQN.
Policy_net’s parameters are being optimized.
We also use a target network to compute V(st+1) for added stability.
I don’t understand this stability thing. Also, why is replay memory important?
Replay memory is the source of data for training the network. And remove correlations.
Target network => target Q-values => loss
Transfer learning in PyTorch
- Scheduling the learning rate
- Saving the best model
generally you want a Variable to have gradient if it contains some learnable parameters.
Question: how to understand require_grad?
The weight etc. are stored in tensors, and they are connected in a gra[h of Functions.
Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers
Boosting: ensemble meta-algorithm
https://en.wikipedia.org/wiki/Ensemble_learning: combine multiple hypotheses to form a (hopefully) better hypothesis
Can a set of weak learners create a single strong learner?
Upsampling is the process of inserting zero-valued samples between original samples to increase the sampling rate.
FCN: Fully convolutional network
Typically, the workflow of a collaborative filtering system is:
- A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain.
- The system matches this user's ratings against other users' and finds the people with most "similar" tastes.
- With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)