谢晓辉 视频内容理解在Hulu的应用与实践+pptx

1. Applications and best practice: video understanding in Hulu Xiaohui Xie Hulu Principal Research Lead
3. Xiaohui Xie Principal Research Lead • Xiaohui joined Hulu in 2016 and now is leading the AI research team in Hulu, focusing on video content understanding and innovation on interactive video streaming. Before that, he was a senior algorithm lead in Lenovo core technology lab from 2012 to 2016, and research staff member at Nokia research center from 2007 to 2011, researcher at Panasonic R&D center from 2005 to 2007. He received his Ph.D. degree in information and communication systems from Beijing University of Posts and Telecommunications in 2005. He enrolled the education reform class of Xi'an Jiaotong University (XJTU) in 1995 and received B.S. in information science and technology in 1999. He holds 30+ US/EU/JP patents and has 70 more patents pending. SPEAKER ArchSummit 2018 Beijing
4. • About Hulu and its contents • Why video understanding in Hulu and its challenges • Hulu AI platform and AI automation pipeline • Applications of video understanding in Hulu • Best practice and applications in Hulu business and production
5. Hulu, the Premier Digital Video Company Best Quality Content Video-On-Demand and Live Broadcasting 5
6. 6
7. 540 CONTENT PARTNERS 7
8. 1000+ ADVERTISNG PARTNERS 8
9. As a content platform: ● Live channels 1000+ ● Movies: 23w ● TV series: 55w, episodes: 430w ● Ads video:185w ● Short form video:200w
10. Why video understanding Data grows Tech breakthrough Industry trending Business requirement 24% CAGR 16-20 Exabytes per month Cisco: Global consumer internet traffic, 2016–2021 Feifei Li, ImageNet classification errors, CVPR 17 2016.10 released
11. Technical challenges in video understanding ● 数据的多与少 ○ 4.5M videos / 22M+ users ○ Rich scenarios with diversity and small labelled dataset ● 数据的真与假 ○ Sports, news, … ○ Si-Fi, cartoon, cosmetic, … ● 技术的难与易 ○ Temporal sequential frame analysis ○ Semantic gap ● 企业的买与研 ○ AI platform and upgrade of tech stacks
12. Hulu AI platform • Share Model / API • Share Data • Share Infrastructure
13. Hulu AI automation pipeline
14. Content understanding application categories 1. Fine grain video segmentation Video derived metadata 2. Video derived Tags 3. Content generation & highlights 4. Practice & applications in production
15. 1. Fine grain video segmentation Branding Antecedents Rating Opening Shot boundary Thumbnail Ads break End credits marker End credits info Burn-in channel logo (burn-in) caption soundtrack (music, sound, ... ) Green: well resolved Dark: not started
16. 1.1 End credits detection Model Existing markers in theR database Our hybrid Deep CNN model Error < 5s (%) 62.64 Error < 10s (%) 84.31 86.86 92.53
17. 1.2 Burnt-in video logo detection 300+ logos, 97.7% accuracy & 98.0% recall with 8s/video
18. 1.3 Music detection and classification
19. 2. Video derived tags 6012 classes General fusion model for a merged taxonomy Hulu merged taxonomy 487 classes 3000+ tags ... 365 scenes 527 classes Enhanced InceptionV3 3DCNN 3DCNN / LSTM Inception / VGG VGGish Labelling Google OpenImage Google Sports1M (9M) (1M) YFCC100M (0.8M) ... MIT Places (8M) Google AudioSet Human enriched Leveraging multiple sources for concept mining (weakly-supervised / active learning)
20. Video derived tags - shot level Multi-source multi-modal tag fusion Hulu unified TAXONOMY Video derived metadata algorithm flow ① Ⓞ Taxonomy mapping ② Alg1: Open image ③ Tag generation ④ Tag postprocessing ⑤ Alg1 module evaluation Alg2: Places 365 Taxonomy mapping Tag generation Alg k ... Taxonomy mapping Tag generation Tag postprocessing Tag postprocessing Alg2 module evaluation Algn module evaluation Alg N: Audio set Taxonomy mapping Tag generation Tag postprocessing Alg2 module evaluation ⑥ ⑦
21. Video derived tags: Scenes and objects Air force Cuisine America football
22. Video derived tags: Events and actions Kiss Parkour Giving or receiving awards Tasting beer Carrying baby Hugging
23. Video derived tags: celebrity ….. . Advanced name search demo:
24. Video derived tags: video level tags • Gracenote tags • Genre & setting • Reality, cooking, kitchen, restaurant • Character & Subject • Master chef, host, competitor • Cooking, training, competition • Content derived concept • From visual signal • Food, meal, dish, metropolis • From textual signal • Chef, host, judge, competition • Gracenote tags • Genre & setting • Football, football field, Texas, school, house • Character & Subject • Football coach, football player, student • High-school, football, family, friendship • Content derived concept • From visual signal • Player, football, team sport, tackle, stadium • From textual signal • Special, football, entertainment • Gracenote tags • Genre & setting • Sitcom, animated, household, springfield • Character & Subject • Homer simpson, Lisa simpson, … • Sibling, bookselling, culture clash • Content derived concept • From visual signal • Cartoon, play, toy, comics, sketch • From textual signal • Family, household, child, amusing • Gracenote tags • Genre & setting • Drama, military, Iraq, Army outpost • Character & Subject • Soldier, sergeant, Arab-American • Iraq war, U.S. army, friendship, military life • Content derived concept • From visual signal • Military, army, soldier, troop, military officer • From textual signal • Military, drama, 2000s
25. 3. Content generation & highlights ● Thumbnails ○ Video thumbnail, cover image, etc. ● Highlights and moments ○ Sports ● Video summary ○ Exhibition: using visual features to find shots that most represent the storyline ○ Tough: find fierce and action shots using audio features ○ Indicative: find shots conveying the most information using caption ○ Leading role: find shots where leading roles dominate using facial detection and recognition ● AI generated arts ○ Music, avatar, …
26. 3.1. Hulu cover image generation tool Resolution and devices: • 16*9 Living room • 16*6 website • 2*3 iPhone • 16*9 iPad • ……
27. Best practice and applications in Hulu - Contextual Ads Selective targeting ✓ video tags before Ads video tags after video tags before Ads video tags after ✕ Ads Targeting: Reverse targeting 1. Target Ads to preferred scenes 2. Avoid Ads to specific scenes Ads reverse targeting 1. Avoid Ads targeted to specific contents or channels 23s content (wildlife) + 20s Ads (range rover) + 13 content
28. Best practice and applications in Hulu - Content embedding fusion func
29. Content based video relevance prediction (CBVRP) Code start in recommendation system • 2017 IEEE ICIP (International Conference on Image Processing) Grand Challenge (Link) • 2018 ACM MM (ACM Multi-media) (Link) ACMMULTIMEDIA 2018 22 - 26 October 2018, Seoul, Korea 29