胡新 Dec4 ArchSummit 2018 Platformize LinkedIn Feed To Improve Product Iteration Velocity

1. Platformize LinkedIn Feed To Improve Product Iteration Velocity Xin Hu (LinkedIn Tech Lead & Senior Staff Engineer) December 2018
3. • LinkedIn Feed Serving Stack • LinkedIn Feed Ecosystem • Why Onboarding To Feed Is So Costly? • Feed as a platform - project High Five • Next Steps • Feed Platformization Impact & Takeway
4. LinkedIn Flagship Feed
5. LinkedIn Feed Serving Stack
6. Feed Mixer Feed-mixer is a multi-tenancy service as well as a second pass ranking framework to provide federation and blending/ranking functionalities for various LinkedIn applications • First Pass Rankers (FPR) - A service that recommends entities in a particular domain type • Second Pass Ranking (SPR) - A ranking step within feed-mixer that federates and ranks entities from different FPRs • Federated Feed - A list of URNs served to apps by feed-mixer. It is the result of second pass ranking
7. Terminology Organic vs Non-organic update Organic update refers to an in-network distribution of a user action, whereas non-organic update refers to out of network recommendation of a user action or content. User here could be existing entities in the LinkedIn social graph (member, company, etc.).Organic updates comprise of ~75% of flagship feed Some examples: • Organic update: viewer sees an article share or in his feed as he is the connection of the actor. Other examples include influencer published a post, members liked a feed update • Non-organic update: trending articles, course you may be interested, people you may know, jobs you may be interested in
8. Definition - Onboarding Period Day 0: ready with use-case scope definition (organic vs recommended), product spec, tracking spec, success metrics. No code has been written Onboard period • During this period: data models will be defined, end to end integration with the whole feed ecosystem (e.g. Feed Backend, Feed UI, Social Action, Viral Engagement - see below for an overview of the feed ecosystem), as well as tracking validation will take place. This period also includes team and company ramp to do sanity check for a new updates in various distribution channels • By the end of this period: the new update is ramped to public (non-LinkedIn employees) for experimentation (~5% of overall feed sessions)
9. Definition - Evaluation Period Evaluation period • During this period: we will receive reports of new update metrics and model coefficients will be updated using user's engagement data on the new update type. Potential bugs could be fixed. If the new update is well-received from our members, additional infrastructure provisions may be required to serve increased traffic • By the end of this period: product and engineering leads can make an informed decision regarding whether the new update should be ramped up to majority members or ramped down. If the decision is to ramp down, then specific data driven recommendations should be provided. If the decision is to ramp up, then the whole feed ecosystem should be ready to support ramp of new update to 100% in production
10. Feed Ecosystem & Organic Update Lifecycle
12. LinkedIn Feed Platformization - Project High5 Today our feed product is severely constrained on iteration velocity: adding a new feed update across the feed ecosystem requires significant effort and substantial calendar time High5 (Feed Iteration Velocity): enable faster iteration on feed by simplifying onboarding and distribution cost for new update types across the feed ecosystem
13. Why is feed onboarding so costly? Onboarding to feed is more than onboarding to Voyager (Neptune). It means to integrate a new feed update across the feed ecosystem, which touches 7* domains, 10* teams, ~30 multiple products • Source of truth store (e.g. UGC) • Feed front-end: the voyager stack • Feed back-end: USCP (activity store), FollowFeed and seas-activity (activity indexer) • FeedMixer • Feed relevance • Social action • Viral engagement • Experimentation, tracking and analytics
14. Why is feed onboarding so costly? Lack of platform Product Evolution boundary ● Feed product has and ● Core platform logic and usecase specific customization is mixed ● Complex interdependence exists continues to evolve significantly (new use cases, new features, new member experiences) across domains e.g. Hashtags, video, threaded comments, reactions Transformation ● Each domain serves specific goal and requires its unique data extraction and transformation ● Complex changes required to adopt new use-cases Manual Operations ● Many critical tasks can only be done manually e.g., Tracking data validation, ramping new types in FPR
15. High5 Overview Strategy ● Platformize Feed by shielding usecases from onboarding complexities ○ Local + Global optimization ○ Maximize task parallelization ○ Prioritize for high frequency use- cases ○ Handle organic and non-organic use- cases differently Onboarding Cost Improvement ● Baseline: 1-2 quarters dev time ● End of Phase I: 3 weeks dev time (~4 if counting deployment) ● End of Phase II: 1 week dev time (~2 if counting deployment) Future ● Long term, onboarding a new use-case or update type to the Feed should be self-service
16. High5 Patterns Establishes the right patterns to “platformize” across the feed ecosystem ● Abstract away boilerplate code; create clear separation between platform and use-case logic ● Maximize leverage and reuse; standardize product behavior ● Build automated tools and solutions; provide a fast path for default behavior ● Apply generic processing and templatized flows ● Ensure platform stability when new uscases and access patterns are introduced
17. High5 Patterns Patterns Established Notable Projects Abstract away boilerplate logic; create clear separation between platform and use-case logic • Chipotle • Skeleton FPR • Content Type Interface CTI) Maximize leverage and reuse; standardize product behavior by adopting • Render Models a common vocabulary • Viral Engagement Improvement Automated tools and solutions that provide a fast onboarding path to enable default behavior • Explore-Exploit-FPR • Cross-Dataset Validation Introduce generic processing and templatized flows to hide complexities from usecases • Chipotle • Content Type Interface (CTI) • Viral Engagement Improvement Added checks and balances to ensure platform stability when new uscases and access patterns are introduced • Iron Man (fault attribution and isolation)
18. High5 Patterns Maximize leverage and reuse; standardize product behavior by adopting a common vocabulary Project Spotlight Render Model
19. Project Spotlight - Render Model • Allows new update to be launched from the server without any client work • Standard Design Language to describe Feed updates • Composable basic components for faster iteration of new Updates • Improve site speed by reducing entity to view mapping logic on clients Reduce code complexity, dramatically improve reusability and reduce the number of models of Feed updates throughout the app
20. Project Spotlight - Render Model What has been improved? ● Reusability ○ Focus is on building new components for new content ● UX Consistency ○ Client side behavior is more consistent (since logic is shared on server, e.g. image cropping) ○ Components look exactly same on all updates Limitations and tradeoffs we made: ● Clients are update-type agnostic, so update/type-specific UI changes are more difficult ○ Makes design more consistent, but not as flexible ● Fine balance between view models and data models ● Server does not send styling information (e.g. text font or size)
21. High5 Patterns • Create clear separation between platform and use-case logic • Introduce generic processing and templatized flows to hide complexities from usecases Chipotle Project
22. Project Spotlight - Chipotle • Provided a plugin architecture to integrate sharing flows with new content types • Principle: integrate partner specific UI through common interface, while delegating the UI implementation to partner teams • Any partner-specific data is captured in what we call the "detour screen" which is entirely owned by the partner • When that screen is rendered we pass it a callback and the partner team is responsible for returning the expected metadata from the callback
23. Project Spotlight - Chipotle Chipotle Callback: mechanism for data collection between the sharing framework and the usecases. It defines an interface to capture the data needed for sharing preview and for creating a UGC post interface ContentTypeDetour { RenderableComponent getContentCreationScreen(Function<SharingData, void> callback) } interface SharingData { FeedComponent getPreview(), Task<ShareMedia[]> getShareMedia(), AnnotatedText getShareText() }
24. High5 Patterns • Maximize leverage and reuse; standardize product behavior • Introduce generic processing and templatized flows to hide complexities from usecases Project Feed Viral Engagement Improvement
25. Feed Viral Engagement Pipeline What is Feed Viral Engagement? ● On-site notification/push/email triggered by member’s viral action taken on feed updates ○ e.g. “A reacted to your post”, “B commented on your post”, “C mentioned you in a comment/post” High5 Improvement ● Consolidated 16 legacy types to 10 new types and rebuild those on project 1EP stack ● Migrated producer flow to Concourse for applying consistent fanout strategy ● Simplify ATC processing logic (config driven onboarding)
26. Project Spotlight - Feed Viral Engagement Solution Strategy: ● Share notification UI formatting logic across channel (notif/push, email) ● Leverage generic representation of feed content types (MiniUpdates) ● Unify producer flows (Concourse) Problem statement: Enabling viral notifications for new content types (e.g. polls, events) onboarded to feed has a super high cost Results: Reduced onboarding cost for new content by streamlining workflows and maximizing leverage
27. High5 Patterns Automated tools and solutions that provide a fast onboarding path to enable default behavior eeFPR Project
28. Project Spotlight - eeFPR Streamlined onboarding of content by automatically understanding members’ personalized preference of new content types Goal • Fast onboarding any new content types with little effort from Feed AI • Ramping orthogonally from relevance modeling • Make ramp decision-based on SQR process with no extra waiting High-level Idea • Explorations with suboptimal rankings with contained impact • Exploitations on optimal relevance model that are auto-trained bi-weekly • Non-blocking ramp through orthogonal LiX
29. High5 Patterns Automated tools and solutions that provide a fast onboarding path to enable default behavior Project Cross Dataset Validation
30. Project Spotlight - Cross Dataset Validation The very first LinkedIn cross dataset validation solution that verifies feed tracking data quality across the serving stack in an automatic fashion
31. High5 Patterns Abstract away boilerplate logic; create clear separation between platform and use-case logic Project Skeleton FPR
32. Project Spotlight - Skeleton FPR Goal: speed up the FPR wire-in process by providing "template" for fast creation of simple FPRs ● Reusable components for bootstrapping an FPR ○ Pre-approved generic avro schemas for venice store ○ Pre-approved pegasus schemas for the FPR endpoint to return the recommendation ● Streamline FPR wire-in to FeedMixer ● Basic features extracted out of FPR response like score, FPR name etc. and make them available for blending
33. High5 Patterns • Create clear separation between platform and use-case logic • Introduce generic processing and templatized flows to hide complexities from usecases Project CTI (Content Type Interface)
34. Project Spotlight - CTI (Content Type Interface) • Use interface to decouple usecase logic from core platform logic • Single onboarding touch point for multiple systems • Steps towards separating out use-case code from Platform code • Provides well defined interfaces for various feed backend subsystems
35. Next Step - Feed SDK for self serving onboarding
36. Next Step - Content Platform Componentization Problem: Onboarding teams must invest substantial and overlapping pre-requisite work to build their own sources of truth Solution: Extract content-related functionality from UGC, SAP, and UFO and turn those into stand-alone components. Investigate in conjunction with Data Realm team Benefit: Standardized solutions that onboarding teams can leverage and reuse directly from their SOT stores or services Components under consideration: • Text content: Hashtag, mention, shortlink • Media: metadata management (e.g. video), photo tagging, lifecycle management, cascading delete, ownership validation • Sponsorship: Compliance management, targeting, tracking • Edges: generic storage layer with transactional guarantee and bi-directional lookup
37. Feed Platformization Impact Onboarding Cost Reduction: • Before: 1 ~ 2 quarters development cost • After: 1 ~ 2 weeks development cost Engineering Culture Improvement: • Elevated developer happiness • Establish the best practice and patterns for platformizing middleware @ LinkedIn Business Impact: • Accelerated product iteration & innovation speed • Improved business agility and revenue*
38. Feed Platformization Key Takeaway “平台化”中间件系统的设计模式: • 抽象样板代码; 创建平台与用例逻 辑的清晰分离 • 最大化杠杆和重用; 规范产品行为 • 构建自动化工具和解决方案; 提供 默认行为的快速开发路径 • 应用泛型处理和模板化流程 • 在引入新的用例和访问模式时确保 平台的稳定性 Design patterns for “platformize” middleware: • Abstract away boilerplate code; create clear separation between platform and usecase logic • Maximize leverage and reuse; standardize product behavior • Build automated tools and solutions; provide a fast path for default behavior • Apply generic processing and templatized flows • Ensure platform stability when new uscases and access patterns are introduced