「我在淘天做技术」迈步从头越-阿里妈妈广告智能决策技术的演进之路
阿里妹导读
一、前言
二、持续突破的自动出价决策技术
2.1 主线:从跟随到引领,迈向更强的序列决策技术
第一代:经典控制类
把效果最大化的优化问题间接转化为预算消耗的控制问题。基于业务数据计算消耗曲线,控制预算尽可能按照设定的曲线来消耗。PID[1]及相关改进[2][10]是这一阶段常用的控制算法。当竞价流量价值分布稳定的情况下,这类算法能基本满足业务上线之初的效果优化。
第二代:规划求解类
相比于第一代,规划求解类(LP)算法直接面向目标最大化优问题来进行求解。可基于前一天的参竞流量来预测当前未来流量集合,从而求解出价参数。自动出价问题根据当前已投放的数据变成新的子问题,因此可多次持续的用该方法进行求解,即Online LP[3][4]。这类方法依赖对未来参竞流量的精准预估,因此在实际场景落地时需要在未来流量的质和量的预测上做较多的工作。
第三代:强化学习类
现实环境中在线竞价环境是非常复杂且动态变化的,未来的流量集合也是难以精准预测的,要统筹整个预算周期投放才能最大化效果。作为典型的序列决策问题,第三阶段用强化学习类方法来优化自动出价策略。其迭代过程从早期的经典强化学习方法落地[5][6][8][9],到进一步基于Offline RL方法逼近「在线真实环境的数据分布」[9],再到末期贴近问题本质基于Online RL方法实现和真实竞价环境的交互学习[13]。
第四代:生成模型类
以ChatGPT为代表的生成式大模型以汹涌澎湃之势到来,在多个领域都表现出令人惊艳的效果。新的技术理念和技术范式可能会给自动出价算法带来革命性的升级。阿里妈妈技术团队提前布局,以智能营销决策大模型AIGA(AI Generated Action)为核心重塑了广告智能营销的技术体系,并衍生出以AIGB(AI Generated Bidding)[14]为代表的自动出价策略。
2.1.1 强化学习在自动出价场景的大规模应用实践
跟随:不断学习、曲折摸索
「MDP是什么?」 由于用户到来的随机性,参竞的流量之间其实并不存在明显的马尔可夫转移特性,那么状态转移是什么呢?让我们再审视下出价公式,其包含两部分:流量价值和出价参数。其中流量价值来自于请求粒度,出价参数为对当前流量的出价激进程度,而激进程度是根据广告主当前的投放状态来决定的。一种可行的设计是将广告的投放信息按照时间段进行聚合组成状态,上一时刻的投放策略会影响到广告主的投放效果,并构成新一时刻的状态信息,因此按照时间段聚合的广告主投放信息存在马尔可夫转移特性。而且这种设计还可以把问题变成固定步长的出价参数决策,给实际场景中需要做的日志回流、Reward收集、状态计算等提供了时间空间。典型的工作[5][6][7][8][9][12] 基本上都是采用了这样的设计理念。
「Reward如何设计?」 Reward设计是RL的灵魂。出价策略的Reward设计需要让策略学习如何对数亿计流量出价,以最大化竞得流量下的价值总和。如果Reward只是价值总和的话,就容易使得策略盲目追求好流量,预算早早花光或者成本超限,因此还需要引导策略在约束下追求更有性价比的流量。另外,自动出价是终点反馈,即直到投放周期结束才能计算出完整的投放效果;且转化等信号不仅稀疏,还存在较长时间的回收延迟。因此我们需要精巧设计Reward让其能够指导每一次的决策动作。实践下来建立决策动作和最终结果的关系至关重要,比如[9]在模拟环境中保持当前的最优参数,并一直持续到终点,从而获取到最终的效果,以此来为决策动作设置较为精准的Reward。另外,在实际业务中,为了能够帮助模型更好的收敛,往往也会把业务经验融入到Reward设计中。
创新:立足业务、推陈出新
突破:破解难题、剑走偏锋
2.1.2 引领生成式Bidding的新时代(AIGB)
2.2 副线:百花齐放,更全面的出价决策技术
复杂的竞价环境下的最优出价策略
多智能体联合出价
Fairness
多阶段协同出价
三、拍卖机制设计也是一个决策问题
机制性质如何满足:需要一种简洁的数学形式表达机制需要满足的博弈性质,并将其融入到机制的优化过程中。
如何面向实际后验效果优化:工业界中很多优化目标指标难以得到精确解析形式(例如成交额、商品收藏加购量等),如何通过真实反馈的方式优化机制也是需要考虑的。
3.1 主线:飘然凡尘,从只远观到深度优化的拍卖机制
第一代:经典拍卖机制
经典的GSP[23]、VCG[24]在互联网场景大规模落地后,针对场景特点的优化主要集中在2方面:1). 提升平台收入,最典型的是Squashing[25]和保留价 。2). 多目标优化能力,通过在排序公式中引入更多的项来优化多目标,最典型的是Ugsp。这些机制的分配和扣费形式相对清晰,所以关于他们的激励性质也大量被研究。
第二代:Learning-based 拍卖机制
随着深度学习&强化学习的蓬勃发展,大家开始探索将深度学习/强化学习引入到拍卖机制设计中,学术界典型的工作包括RegretNet[26]、RDM[41]等,阿里妈妈结合工业界的场景特点,先后设计出Deep GSP[31]、Neural Auction[32]、Two-Stage Auction[33]等机制,这些机制都借助了深度网络强大的学习能力,提升拍卖机制的优化效果。
第三代:拍卖机制&自动出价联合设计
随着自动出价能力的广泛应用,广告主竞价方式相较于之前有了大幅度的改变,广告主向平台提交高层次的优化目标和约束条件,然后由出价代理代表广告主在每次广告拍卖中做出详细的出价决策。对于广告主来说,平台需要把出价和拍卖机制看成一个整体联合设计,典型的工作包括[36]。
3.1.1 一相逢便胜却无数:当拍卖机制遇到智能化
惊艳登场:可Learning的拍卖机制
持续发力:整页拍卖(考虑外部性)机制
一片蓝海:融合机制设计
3.1.1 浑然一体:自动出价和拍卖机制的联合设计
3.2 副线:多样的广告主行为建模
四、结语
了解我们&加入我们
参考文献:
[1]. Chen Y, Berkhin P, Anderson B, et al. Real-time bidding algorithms for performance-based display ad allocation[C]//Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011: 1307-1315.
[2]. Zhang W, Rong Y, Wang J, et al. Feedback control of real-time display advertising[C]//Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 2016: 407-416.
[3]. Yu H, Neely M J. A Low Complexity Algorithm with $ O (\sqrt {T}) $ Regret and $ O (1) $ Constraint Violations for Online Convex Optimization with Long Term Constraints[J]. arXiv preprint arXiv:1604.02218, 2016.
[4]. Yu H, Neely M, Wei X. Online convex optimization with stochastic constraints[J]. Advances in Neural Information Processing Systems, 2017, 30.
[5]. Zhao J, Qiu G, Guan Z, et al. Deep reinforcement learning for sponsored search real-time bidding[C]//Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018: 1021-1030.
[6]. Cai H, Ren K, Zhang W, et al. Real-time bidding by reinforcement learning in display advertising[C]//Proceedings of the tenth ACM international conference on web search and data mining. 2017: 661-670.
[7]. Jin J, Song C, Li H, et al. Real-time bidding with multi-agent reinforcement learning in display advertising[C]//Proceedings of the 27th ACM international conference on information and knowledge management. 2018: 2193-2201.
[8]. Wu D, Chen X, Yang X, et al. Budget constrained bidding by model-free reinforcement learning in display advertising[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 1443-1451.
[9]. He Y, Chen X, Wu D, et al. A unified solution to constrained bidding in online display advertising[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 2993-3001.
[10]. Yang X, Li Y, Wang H, et al. Bid optimization by multivariable control in display advertising[C]//Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019: 1966-1974.
[11]. Guan Z, Wu H, Cao Q, et al. Multi-agent cooperative bidding games for multi-objective optimization in e-commercial sponsored search[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 2899-2909.
[12]. Wen C, Xu M, Zhang Z, et al. A cooperative-competitive multi-agent framework for auto-bidding in online advertising[C]//Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022: 1129-1139.
[13]. Mou Z, Huo Y, Bai R, et al. Sustainable Online Reinforcement Learning for Auto-bidding[J]. Advances in Neural Information Processing Systems, 2022, 35: 2651-2663.
[14]. 阿里妈妈生成式出价模型(AIGB)详解https://zhuanlan.zhihu.com/p/619301816, 2023
[15]. Lin Q, Tang B, Wu Z, et al. Safe Offline Reinforcement Learning with Real-Time Budget Constraints[J]. arXiv preprint arXiv:2306.00603, 2023.
[16]. Zhang H, Niu L, Zheng Z, et al. A Personalized Automated Bidding Framework for Fairness-aware Online Advertising[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023: 5544-5553.
[17]. Gong Z, Niu L, Zhao Y, et al. MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023: 4588-4594.
[18]. Ou, W., Chen, B., Liu, W., Dai, X., Zhang, W., Xia, W., Li, X., Tang, R., & Yu, Y. (2023). Optimal Real-Time Bidding Strategy for Position Auctions in Online Advertising. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management.
[19]. Gligorijevic, D., Zhou, T., Shetty, B., Kitts, B., Pan, S., Pan, J., & Flores, A. (2020). Bid Shading in The Brave New World of First-Price Auctions. Proceedings of the 29th ACM International Conference on Information & Knowledge Management.
[20]. Zhang, W., Kitts, B., Han, Y., Zhou, Z., Mao, T., He, H., Pan, S., Flores, A., Gultekin, S., & Weissman, T. (2021). MEOW: A Space-Efficient Nonparametric Bid Shading Algorithm. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
[21]. Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-Learning for Offline Reinforcement Learning. ArXiv, abs/2006.04779.
[22]. Kostrikov, I., Nair, A., & Levine, S. (2021). Offline Reinforcement Learning with Implicit Q-Learning. ArXiv, abs/2110.06169.
[23]. Aggarwal, G., Muthukrishnan, S., Pál, D., & Pál, M. (2008). General auction mechanism for search advertising. ArXiv, abs/0807.1297.
[24]. Varian, H.R., & Harris, C. (2014). The VCG Auction in Theory and Practice. The American Economic Review, 104, 442-445.
[25]. Bachrach, Y., Ceppi, S., Kash, I.A., Key, P.B., & Kurokawa, D. (2014). Optimising trade-offs among stakeholders in ad auctions. Proceedings of the fifteenth ACM conference on Economics and computation.
[26]. Dütting, P., Feng, Z., Narasimhan, H., & Parkes, D.C. (2017). Optimal auctions through deep learning. Communications of the ACM, 64, 109 - 116.
[27]. Deng, Y., Mao, J., Mirrokni, V.S., & Zuo, S. (2021). Towards Efficient Auctions in an Auto-bidding World. Proceedings of the Web Conference 2021.
[28]. Li, N., Ma, Y., Zhao, Y., Duan, Z., Chen, Y., Zhang, Z., Xu, J., Zheng, B., & Deng, X. (2023). Learning-Based Ad Auction Design with Externalities: The Framework and A Matching-Based Approach. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
[29]. Xing, Y.Y., Zhang, Z., Zheng, Z., Yu, C., Xu, J., Wu, F., & Chen, G. (2023). Truthful Auctions for Automated Bidding in Online Advertising. International Joint Conference on Artificial Intelligence.
[30]. Wilkens, C.A., Cavallo, R., & Niazadeh, R. (2017). GSP: The Cinderella of Mechanism Design. Proceedings of the 26th International Conference on World Wide Web.
[31]. Zhang, Z., Liu, X., Zheng, Z., Zhang, C., Xu, M., Pan, J., Yu, C., Wu, F., Xu, J., & Gai, K. (2020). Optimizing Multiple Performance Metrics with Deep GSP Auctions for E-commerce Advertising. Proceedings of the 14th ACM International Conference on Web Search and Data Mining.
[32]. Liu, X., Yu, C., Zhang, Z., Zheng, Z., Rong, Y., Lv, H., Huo, D., Wang, Y., Chen, D., Xu, J., Wu, F., Chen, G., & Zhu, X. (2021). Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
[33]. Wang, Y., Liu, X., Zheng, Z., Zhang, Z., Xu, M., Yu, C., & Wu, F. (2021). On Designing a Two-stage Auction for Online Advertising. Proceedings of the ACM Web Conference 2022.
[34]. Liu, Y., Chen, D., Zheng, Z., Zhang, Z., Yu, C., Wu, F., & Chen, G. (2023). Boosting Advertising Space: Designing Ad Auctions for Augment Advertising. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining.
[35]. Lv, H., Zhang, Z., Zheng, Z., Liu, J., Yu, C., Liu, L., Cui, L., & Wu, F. (2022). Utility Maximizer or Value Maximizer: Mechanism Design for Mixed Bidders in Online Advertising. AAAI Conference on Artificial Intelligence.
[36]. Xing, Y., Zhang, Z., Zheng, Z., Yu, C., Xu, J., Wu, F., & Chen, G. (2023). Designing Ad Auctions with Private Constraints for Automated Bidding. ArXiv, abs/2301.13020.
[37]. Varian H R. Position auctions[J]. international Journal of industrial Organization, 2007, 25(6): 1163-1178.
[38]. Zhao, X., Gu, C., Zhang, H., Yang, X., Liu, X., Tang, J., & Liu, H. (2019). DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems. AAAI Conference on Artificial Intelligence.
[39]. Chen, D., Yan, Q., Chen, C., Zheng, Z., Liu, Y., Ma, Z., Yu, C., Xu, J., & Zheng, B. (2022). Hierarchically Constrained Adaptive Ad Exposure in Feeds. Proceedings of the 31st ACM International Conference on Information & Knowledge Management.
[40]. Liao, G.R., Wang, Z., Wu, X., Shi, X., Zhang, C., Wang, Y., Wang, X., & Wang, D. (2021). Cross DQN: Cross Deep Q Network for Ads Allocation in Feed. Proceedings of the ACM Web Conference 2022.
欢迎加入【阿里云开发者公众号】读者群
微信扫码关注该文公众号作者