DRN: A deep reinforcement learning framework for news recommendation
遇到的问题:无法体现dynamic nature of news recommendation
First, they only try to model current reward
Second, very few studies consider to use user feedback other than click/no click labels (how frequent user returns) to help improve recommendations.
Third, these methods tend to keep recommending similar news to users, which may cause users to bored.

-greedy问题: 会推荐完全不相关的东西

需要尝试多次才能准确得到价值估计
*contribution: *
1.强化学习框架
2.用户活跃度,比起仅仅用点击和不点击的反馈要好很多
3.Dueling Bandit Gradient Descent
4.效果确实好

method:
We use a continuous state feature representation of users and continuous action feature representation of items as inputs to DQN.
model framework:

  1. Push: when a user sends a news request to the system, the recommendation agent G will take the feature representation of the current user and news candidates as input, and generates a top-k list of news to recommend L. L is generated by combining the exploitation of current model and exploration of movie items
  2. Feedback: User u who has received recommended news L will give their feedback B by his clicks on this set of news.
  3. Minor update: After each timestamp, with feature representation of the previous user u and news list L, and the feedback B. G 会比较两个DQN exploitation Q Network and exploration Q network, 哪个效果好,如果后者效果好,现在的模型会朝着exploration更新一点。
  4. Major update: 经验回放, agent保留者最近历史点击和用户活跃度记录.

User Activeness

user activeness