文字内容
2. The Era of AI and the Fourth Industrial Revolution
3. The Innovations Driven by AI Task-specific Learning Representation Learning Digital Representation of the World Big Data Semantic Representation of the World Big Compute Big/Deep Model • • • • • • • • • • • • • Recognition Detection Classification Inferencing Reasoning Decision Making Risk Analysis Prediction Automation Science & Discovery Digital Assistant Digital Work and Life New Form of HCI
4. Exciting Progress in AI in Recent Years • Convolutional neural networks (CNNs); Recurrent neural networks (RNNs) • Dual learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: improve dual tasks using unlabeled data via a dual-learning game • Long short term memory (LSTM): useful for sequenceto-sequence (seq2seg), e.g. machine translation • Binarized neural networks (BNNs): neural networks with binary weights and activations at run-time; good for running on small devices • Network morphism: morph a well-trained neural network to a new one so that its network function can be completely preserved and further improved • Meta-learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: cast the design of an optimization algorithm as a learning problem and use learning algorithms to exploit the structure in an automatic way • One-shot (few shots) learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: learn new object categories from few examples by synthesizing & learning from existing information about different, previously learned classes • Unsupervised learning & weakly-supervised learning • Systems for Machine Learning (ML) & ML for Systems • Residual networks (ResNet): Allow both shallow & deep networks to co-exist; local errors corrected by shallow networks; global errors corrected by deep networks • Deep reinforcement learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: learning by interacting with environment to maximize cumulative rewards • Multi-task learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: learning multiple tasks that share representations (or part of low level representations) • Transfer learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning:'>learning: use auxiliary tasks to boost the learning of target task • Generative adversarial networks (GAN): two neural networks (one generative and the other discriminative) compete against each other in a zero-sum game framework
5. Progress on Word & Sentence Modeling Specific Task Contextualized embedding • • • • Borrowed from NMT encoder ELMo [Peters et al, NAACL 2018] GPT [Radford et al, 2018] BERT [Devlin et al, 2018] Tree-based sentence embedding • • • Tree LSTM with syntax tree Learned trees by RL or Gumbel softmax [Choi et al 2018, Yogotama et al 2017] Balanced binary tree [Shi et al, EMNLP 2018] Multiple layers King Queen Man Word Vectors Women Pretrained Embedding (Word2Vec) Shallow Representation ea eb ec ed A cat sits on EMB Pretrained Network (BERT) Deep Representation Pre-training for NLU: Pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks as Pretrained ImageNet models on Computer Vision
6. Progress on Word & Sentence Modeling ea eb ec e ea eb ec e d ea eb ec ed ea eb ec ed sits on A cat BERT [Devlin et al, 2018] ea d ea A sits on cat GPT [Radford et al, 2018] eb eb ed ec ec ed A cat sits on ELMo [Peters et al, NAACL 2018] Datasets Model Parameters Train time SQuAD ELMo 1B word benchmark (1B tokens) Bi-LSTM 90M - 87.4 GPT Book Corpus (800M tokens) Transformer Decoder 110M 1 month, 8 GPUs - BERT Book Corpus + Wiki (3300M tokens) Transformer Encoder 330M 4 days, 64 TPUs 93.1 • Get deep contextualized word & sentence representation • More (unlabeled) data, Bigger model, Stronger computing power, Better representation
7. TVM:'>TVM: Learning-based Learning System (by Tianqi Chen) TVM:'>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning (by Tianqi Chen et al from UW) Trends 1. Systems for Machine Learning 2. Machine Learning for Systems 3. Software-defined AI Chip
8. & G G + G G C
9. & Collaborative Filtering Logistic Regression Deep Neural Networks Generative Adversarial Networks Factorization Machine Gradient Boosting Decision Tree Deep Reinforcement Learning Multi-task Multimodal Models
10. Users Recommendation Personalized Search Personal Assistant / Social Networks / / Information/Content
11. • • • • • • • • • • • • Content moderation Automatic classification Copyright infringement detection Deduplication Recommendation Similarity-based search Object detection & tracking Segmentation Video to text (description) Emotion analysis: Is it repulsive? Popularity: will it go viral? Content creation
12. • Landmark • Beauty • Action • Gender • Age • Emotion
14. 7 2 1 3 8 08 1 3 0
15. ..... .... &
16. 59 3 9 0 4 1 2 4 78
17. 19 : 27 8 . 650 4
18. A N ( L C ) ) ) ) ) ) L G M EC S DP a )) ) V &
19. 7 R a d b b e T F 7 01 55 4 1 0 7 0 0 55 0 V N B c