京东 李维 - 自动深度语法分析是自然语言应用的核武器

敖清逸

2017/12/18 发布于 技术 分类

ArchSummit全球架构师峰会是InfoQ中国团队推出的面向高端技术管理者、架构师的技术大会,参会者中超过50%拥有8年以上的工作经验。 ArchSummit秉承“实践第一、案例为主”的原则,展示新技术在行业应用中的最新实践,技术在企业转型中的加速作用,帮助企业技术管理者、CTO、架构师做好技术选型、技术团队组建与管理,并确立技术对于产品和业务的关键作用。

文字内容
1. Deep-Parsing Natural Languages Ⴎଶ᧍ဩ‫ړ‬ຉฎᛔᆐ᧍᥺ଫአጱ໐ྎ࢏ ​ ArchSummit ‫ق‬ቖຝ຅૵શտ 2017 (۹Ղ) ๫ᖌҁՂӳᏑᨕᎸᑪᴺ҂ 12/08/2017 (liweinlp.com)
3. Outline •  Ոૡฬᚆጱܲ‫޾ݥ‬ሿᇫᓌՕғ՗ఽᎣ‫ᦊک‬Ꭳ ྌၾஂᳩጱᕪḵԆԎ޾ቘ௔ԆԎᰦ൳ •  ႮଶᥴຉҁDeep Parsing҂ฎՋԍҘ •  NLP ຝ຅ᕒᥦ •  ໐ྎ࢏ଫአԈֺ
4. Outline: AI History •  Ոૡฬᚆጱܲ‫޾ݥ‬ሿᇫᓌՕғ՗ఽᎣ‫ᦊک‬Ꭳ ྌၾஂᳩጱᕪḵԆԎ޾ቘ௔ԆԎᰦ൳
5. NLP Mainstream Since 1990s Courtesy of Prof. Church: “A Pendulum Swung Too Far” http://blog.sciencenet.cn/blog-362400-988692.html NLP᷇᭲ liweinlp.com
6. Two Basic Approaches to NLP Approach Pros Statistical Learning (based on keywords) •  Good for document-level •  High recall •  Robust •  Easy to scale •  Fast development (if data available) Cons •  Requires large annotation •  Coarse-grained •  Difficult to debug •  Fail in short messages •  Only shallow NLP •  No understanding Grammar Engineering (based on sentence structure) •  Good for sentence level •  Requires deep skills •  Handles short messages well •  Requires scale up skills •  High precision •  Requires robustness skills •  Fine-grained insights •  Moderate recall (coverage) •  Easy to debug •  Parser development slow •  Parsing and understanding •  Complementary rather than competing •  Hybrid: Best of both worlds •  Balance and configurability between precision and recall NLP᷇᭲ liweinlp.com
7. Outline: What Is Deep Parsing •  Ոૡฬᚆጱܲ‫޾ݥ‬ሿᇫᓌՕғ՗ఽᎣ‫ᦊک‬Ꭳ ྌၾஂᳩጱᕪḵԆԎ޾ቘ௔ԆԎᰦ൳ •  ႮଶᥴຉҁDeep Parsing҂ฎՋԍҘ
8. Deep Parsing: Unstructured to Structures NLP᷇᭲ liweinlp.com
9. Why parsing? Limited Patterns NLP᷇᭲ liweinlp.com
10. Subtree ern:'>Pattern: Data to Intelligence SVO Pa'ern: Barack Obama (S) Endorse (V) Hillary Clinton (O) Knowledge Graph NLP᷇᭲ liweinlp.com
11. Deep Parsing: Unstructured to Structures NLP᷇᭲ liweinlp.com
12. Subtree ern:'>Pattern: Data to Intelligence Inter-Clause Pa'ern: 虽然 … 遗憾…无所谓… mild senGment Linear: Infinite number of sentences Structure: Limited pa'erns Data à Intelligence NLP᷇᭲ liweinlp.com
13. Outline: NLP Architectures •  Ոૡฬᚆጱܲ‫޾ݥ‬ሿᇫᓌՕғ՗ఽᎣ‫ᦊک‬Ꭳ ྌၾஂᳩጱᕪḵԆԎ޾ቘ௔ԆԎᰦ൳ •  ႮଶᥴຉҁDeep Parsing҂ฎՋԍҘ •  NLP ຝ຅ᕒᥦ
14. NLP Architecture 1: Deep Parser as Core Cascaded FSAs break through Chomsky’s hierarchy walls Robust, linear, F-measure: scale up to big data NLP᷇᭲ liweinlp.com
15. Sample Deep Parse Tree (dependency) NLP᷇᭲ liweinlp.com
16. Sample Deep Parse Tree (PS flavor) NLP᷇᭲ liweinlp.com
17. NLP Architecture 2: Information Extraction Including sentiment analysis ҁon subjective language҂ NLP᷇᭲ liweinlp.com
18. NLP Architecture 3: Text Mining NLP᷇᭲ liweinlp.com
19. NLP Architecture 4: Landing on Applications NLP᷇᭲ liweinlp.com
20. Sample Deep Parse Tree NLP᷇᭲ liweinlp.com
21. Sample Deep Parse Tree NLP᷇᭲ liweinlp.com
22. Outline: NLP Applications •  Ոૡฬᚆጱܲ‫޾ݥ‬ሿᇫᓌՕғ՗ఽᎣ‫ᦊک‬Ꭳ ྌၾஂᳩጱᕪḵԆԎ޾ቘ௔ԆԎᰦ൳ •  ႮଶᥴຉҁDeep Parsing҂ฎՋԍҘ •  NLP ຝ຅ᕒᥦ •  ໐ྎ࢏ଫአԈֺ ᐒড়ᛡఘ‫ړ‬ຉ҅य़හഝ೵യ҅ฬᚆ൤ᔱ҅੒ᦾᔮᕹ …………
23. Sentiment Analysis Why deep parsing, not deep learning? Learning without parsing does not work for social media sentiment an• aSlyosciisal media is dominated by short messages •  Statistical learning breaks in short messages: no sufficient data points •  Deep parsing enables linguistic analysis for best precision •  Deep parsing enables insights mining 2 magnitudes more efficient •  parsing-supported rule has power of about 100 ngram rules •  Deep learning is a great algorithm but still delinked from parsing •  Parsers trained by deep learning are all research systems •  difficult to adapt to real life text of social media (or other genres) •  knowledge bottleneck: domains where labeled data are insufficient •  Real life deep learning systems are mostly end-to-end, still no structures NLP᷇᭲ liweinlp.com
24. Sentiment Analysis: Bag of Words vs. Parsing KEYWORD CHALLENGE The iPhone has never been good. The iPhone has never been this good. ASSOCIATION CHALLENGE Another reason to switch from Visa to MasterCard I prefer MasterCard over Visa. MasterCard is way better than Visa. CLASSIFICATION CHALLENGE I had a wonderful day today. Even my instant coffee tastes great. However my Dell laptop doesn't boot again. Maybe I should check out the MacBook. It [MacBook] seems so easy to use. FCionaer-sgera-ginraeidneAdnaClylassissiuficnactoiovnertshu“wmhbys”-buephainndd sdeonwtinm: eonvtesr:all tone positive (3 vs 1) (1)  Instant coffee / tastes great (2)  Dell Laptop / does not boot (3)  Macbook / easy to use NLP᷇᭲ liweinlp.com
25. Deep Parsing Supports Deep Sentiments Sentiment analysis has different layers 1. sentiment classification: thumbs-up and down (or neutral) 2. sentiment association: to associate a sentiment with a topic or brand as its object 3. deep sentiment insights: (i)  who has the sentiment? (ii)  how intense? (iii)  why? (iv) Evaluations, comparisons and contrasts; (v)  needs and wish-list; (vi) positive/negative actions (e.g. adopt / abandon); (vii) purchase intent; (viii) pros and cons NLP᷇᭲ Most learning systems stop at 1 and sometimes at 2. All 3 can be done via deep parsing. liweinlp.com
26. Illustration: Real-time Polls Challenges observed: economy topic at 6:55pm; China topic at 7:30pm NLP᷇᭲ liweinlp.com
27. Illustration: Stock Market Trends Topic: HTC Data 1: Stock Market Performance Data 2: Chinese social media (Weibo, Tianya, Facebook, Twitter…) Time range: 2013/08 – 2014/08 Strong correlation observed NLP᷇᭲ liweinlp.com
28. Big Data Mining: Who benefits? For businesses: social listening Consumer in:sights: sentiments and why Brand image: trends Competitive research: where do we stand For consumers Purchase decision Personalized service For government Election campaign Public opinions on policies and social topics Others? Hot topics or anywhere public opinions are involved Stock market trends correlation NLP᷇᭲ liweinlp.com
29. For consumers: Purchase Decision NLP᷇᭲ liweinlp.com
30. For consumers: Purchase Decision NLP᷇᭲ liweinlp.com
31. For consumers: Purchase Decision NLP᷇᭲ liweinlp.com
32. Intelligent Search and Chatbots Three types of Chatbots: 1. Domain knowledge A:'>QA:'>A:'>QA: e.g. customer service; 2. Open domain knowledge A:'>QA:'>A:'>QA: e.g. Who won Nobel Prize in 2015? 3. Interactive chatting: e.g. just for fun (killing time); in time, for comfort (senior people); for mental health counselling Q: questions are a subset of language, tractable for decoding intent, asking point, and hidden slots A: 1 and 2 can be based on Knowledge Graph enabled by deep parser; 3 can be enabled by learning from human chats plus parsing A mixture/convergence of 3 is possible NLP᷇᭲ liweinlp.com
33. Apply NLP to Verticals:'>Verticals: Medicine Domain Some Big Data Verticals:'>Verticals: 1. News 2. Social Media 3. Medicine 4. Legal 5. Education 6. Financing 7. Multilingual NLP᷇᭲ liweinlp.com
34. And we are hiring! At Beijing & Silicon Valley NLP᷇᭲: liweinlp.com www.linkedin.com/in/ liwei4nlp