Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Thomas Kleine Buening; Aadirupa Saha; Christos Dimitrakakis; Haifeng Xu

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

Published: 20 Jun 2023, Last Modified: 03 Jul 2023ILHF Workshop ICML 2023EveryoneRevisions

Keywords: online learning, mechanism design, incentive-aware learning, multi-armed bandits, incentive design

TL;DR: We introduce the strategic click-bandit problem and design an incentive-aware learning algorithm that induces desirable arm strategies while simultaneously minimizing regret.

Abstract: We study a strategic variant of the multi-armed bandit problem, which we coin the \emph{strategic click-bandit}. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-through rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) aligning incentives by incentivizing desirable arm actions under uncertainty; (b) learning unknown parameters. We approximately characterize all Nash equilibria among arms under UCB-S and show a $\tilde O(\sqrt{KT})$ regret bound in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit setup.

Submission Number: 43

Loading