Introduction+to+Use+Cases+of+Predictive+Modeling+Use+Cases+of+LinkedIn+ +Final

1. An Introduction to Use Cases of Predictive Modeling at LinkedIn Lei Zhang LinkedIn Engineering Manager
3. Lei Zhang Engineering Manager at LinkedIn • 伊利诺伊大学芝加哥分校计算机科学博士,主要研究领域是自然 语言处理、情感分析、数据挖掘,以及机器学习。在国内外学术 期刊和会议上已发表20多篇学术文章,获得多项美国专利,合著 有Mining Text Data等四本关于文本数据挖掘和大数据计算书籍, 并长期受邀担任国际期刊评委和国际会议程序委员会委员。 • 目前在领英公司(LinkedIn)机器学习基础组(Machine Learning Foundation Team ) 从事机器学习算法, 特征量和平台的研发。
4. • LinkedIn Products driven by Predicative Modeling • Predicative Modeling System : good practices and pitfalls • Photon-ML with Case Studies • Photon-Gateway with Case Study • Q&A
5. Predicative Modeling for LinkedIn Products “Artificial Intelligence (AI) is like oxygen at LinkedIn” - It powers everything that we do. Driven by AI, predicative modeling has a huge impact for many products and member experience at LinkedIn.
6. LinkedIn Feed
7. Jobs You May Be Interested In (JYMBII)
8. Email Marketing Campaign
9. More Products People You may Know (PYMK) Search Ads
10. End-to-End Predictive Modeling System Courtesy of BADM team at LinkedIn
11. Predicative Modeling : Objective The objective of predicative modeling may change from time to time and may different from perspectives The first objective of the predicative modeling should be easy to measure. The metric should be directly attributable Sometimes, we need a“product rule layer”on top of predicative model to add additional logics for the final result
12. Predicative Modeling : Label Data Quality Avoid following pitfalls: • Information leak - Prevent mixture of train and test data, future and past data • Training imbalance - down-sampling, up-sampling, SMOTE (synthetic minority oversampling) • Missing data - imputation (e.g., most frequent, median, zero, etc.) • Outlier - transformation (e.g., log, binning)
13. Predicative Modeling : Feature Engineering Good practices: • Try and iterate • Use very specific features if you have large training data • Create new feature based on error pattern • Modify existing feature for better predication - numeric feature: binarization, binning, log/box-cox transform - categorical feature: one-hot encoding , grouping or target encoding for high cardinality - feature cross: combine two or more features, e.g., {skillsmember} X {skillsjob}
14. Predicative Modeling : Offline-Online Discrepancy Sometimes, the is a discrepancy between performance during offline training and performance during online deployment It can be caused by • A discrepancy between data processing in training and deployment pipelines - code reuse between offline training and online deployment • A discrepancy between feature data in training and deployment stage - feature data offline-online monitoring is important • A feedback loop in the model training and deployment
15. Predicative Modeling : Online A/B Test Good practices: • Do not roll everything into single test • Prefer long-term and short term metrics for example, a bad search experience may result in increased short-term activities but harm retention in long run • Power up experiment Sample size needs to be carefully determined to ensure sufficient statistical power to measure a practically meaningful effect
16. Photon-ML Photon-ML is a large scale machine learning library based on Apache Spark, which can process massive datasets with powerful model training and diagnostic utilities. Git hub: Features: • Support for large scale regression, supporting Linear, Logistic, and Poisson regression with L1/L2, and elastic-net regularization. • Support offsets, weights, and bounds for coefficients • Support generalized linear mixed effect models (GLMix) model, an implementation of generalized additive mixed effect (GAME) model
17. GLMix Model A GLMix model consists of a fixed effect component and multiple random effects. A fixed effect model corresponds to a generalized linear model and assumes each observation is independent. Random effects capture additional heterogeneity in residuals from fixed effects by attaching parameters at multiple granularities (users, items, segments). Shrinkage/regularization is often used to avoid overfitting.
18. Model Formulation Probabilistic formulation Optimization problem
19. Case Study - Feed Ranking Feed Events - Sponsored Ads - Updates from connections - Updates from followed companies - Updates from joined groups - Articles shared / commented / liked by connections - Articles mentioned connections - Articles posted by influencers or connections - News from followed channels - Job recommendations - People You May Know ...
20. Actor-Verb-Object Formalism Actor Verb Object
21. Member Activity Types
22. Features Feature Categories Viewer features Features member title, skill, industry, etc. Activity features activity time, activity type ( e.g., like, comment, share, etc.) Object features object position Viewer-actor features Viewer-activity type features Viewer-actor-activity type features Viewer-object features number messages sent to actor from viewer, number shared connections viewer CTR for activity type viewer CTR for actor : combination of activity types Object is in the same language as viewer Feature engineering can be applied: log transform, binarization, binning
23. Model Formulation Formalize as binary classification problem: let ??????it represent the interaction between viewer i and update t: 1, if viewer interacts with feed update ??????it = -1, otherwise Assuming logistic regression model, let Xit be a vector of features characterizing viewers, feed update and the context, and ?????? be a vector of parameters: P (??????it = 1 viewer, update) = 1/ (1 + exp(- ??????′Xit)) let Xit be a vector of features characterizing viewers, feed update and the context. ?????? be a vector of parameters.
24. Model Training The parameter vector ?????? can estimated by maximizing the likelihood of the training data as a function of ?????? L(??????) = (1 + exp(- ??????′Xit))-1 We can add a regularization term to the log-likelihood function to mitigate overfitting ??????(??????) = - ∑ log (1 + exp(- ??????it ??????′Xit)) - ?????? ?????? 2 Compute the gradient vector of the regularized log-likelihood function by using stochastic gradient descent in Photon-ML to optimize the above function.
25. Case Study : JYMBII GLMix model is applied, which consists of three parts: fixed effect model (global population average model, content-based); Per-Member model (personalized for member behavior) + Per-Job model (collaborative effect) Courtesy of JYMBII team at LinkedIn
26. Features in GLMix Dense Vector Bag of Words Similarity Features in global model for generalization (e.g., Similarity in title text ) Sparse Cross Features in global, user, and job model for memorization (e.g., memorize that computer science students will transition to entry engineering roles. Courtesy of JYMBII team at LinkedIn
27. Deep + Wide Model Generate embeddings to capture generalization through semantic similarity
28. Photon Gateway Photon Gateway serves as a end-to-end offline propensity model and feature experimentation platform at LinkedIn, which enables data scientists to train effective propensity models based on Photon-ML with ease. Objectives: - fast prototype - fast iteration -fast deployment • Features: - integration with feature stores and automatic feature engineering - automatic parameter tuning - model and feature analysis report - model deployment
29. Architecture
30. Main Components • Model generation • Model offline analysis • Model deployment • Feature quality monitoring
31. Case Study : Email Marketing Campaign Propensity model: predict the behavior of member for some events, e.g., premium account subscription Why Photon Gateway: • Easy to use UI • Feature store and automatic feature engineering (e.g., member activity feature aggregation within time window) • Easy model management • Easy model deployment • Feature data monitoring
32. More Reads Business Applications of Predictive Modeling at Scale KDD Tutorial by LinkedIn Open Sourcing Photon ML LinkedIn Engineering Blog Personalized Job Recommendation System at LinkedIn LinkedIn JYMBII team GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction KDD paper Deep Learning for Personalized Search and Recommendation Systems LinkedIn Careers team