MIT CSAIL Tomaso Poggio：智能科学与智能工程
登录发表评论
文字内容
1. The Science and the Engineering of Intelligence Tomaso Poggio Center for Biological and Computational Learning McGovern Institute for Brain Research at MIT Department of Brain & Cognitive Sciences CSAIL Massachusetts Institute of Technology Cambridge, MA 02139 USA
2. Engineering of Intelligence: recent successes
3. Intelligence: engineering
5. Recent progress in AI 5
6. 6
9. Mobileye
11. CBMM: motivations Key recent advances in the engineering of intelligence have their roots in basic science of the brain STC Annual Meeting, 2016
12. The same hierarchical architectures in the cortex, in models of vision and in Deep Learning networks Desimone & Ungerleider 1989; vanEssen+Movshon STC Annual Meeting, 2016
13. The race for Intelligence • The science of intelligence was at the roots of today’s engineering success • …we need to make another basic effort on it  for the sake of basic science  for the engineering of tomorrow STC Annual Meeting, 2016
14. Science + Engineering of Intelligence Mission: We aim to make progress in understanding intelligence — that is in understanding how the brain makes the mind, how the brain works and how to build intelligent machines. CBMM’s main goal is to make progress in the science of intelligence which enables better engineering of intelligence. Third Annual NSF Site Visit, June 8 – 9, 2016
15. Interdisciplinary Cognitive Science Machine Learning Computer Science Neuroscience Computational Neuroscience Science+ Technology of Inte15lligence
16. Centerness: collaborations across different disciplines and labs MIT Boyden, Desimone ,Kaelbling , Kanwisher, Katz, Poggio, Sassanfar, Saxe, Schulz, Tenenbaum, Ullman, Wilson, Rosasco, Winston Harvard Blum, Kreiman, Mahadevan, Nakayama, Sompolinsky, Spelke, Valiant Rockefeller Allen Institute Freiwald Koch UCLA Yuille Stanford Goodman Hunter Epstein,Sakas, Chodorow Wellesley Hildreth, Conway, Wiest Puerto Rico Bykhovaskaia, Ordonez, Arce Nazario Cornell Hirsh Howard Manaye, Chouikha, Rwebargira
17. IIT Metta, Recent Stats and Activities A*star Tan Hebrew U. Shashua MPI Buelthoff Genoa U. Verri Weizmann Ullman MEXT, Japan City U. HK Smale Google IBM DeepMind Honda Microsoft Siemens Schlumberger GE Boston Dynamics Orcam Nvidia Rethink Robotics MobilEye Third CBMM Summer School, 2016
18. EAC members Pietro Perona, Caltech Charles Isbell, Jr., Georgia Tech Joel Oppenheim, NYU Lore McGovern, MIBR, MIT David Siegel, Two Sigma Demis Hassabis*, DeepMind Marc Raibert, Boston Dynamics Kobi Richter, Medinol Judith Richter, Medinol Dan Rockmore, Dartmouth Susan Whitehead, MIT Corporation FeiFei Li, Stanford Third CBMM Summer School, 2016
19. CBMM Brains, Minds and Machines Summer School at Woods Hole: our flagship initiative STC Annual Meeting, 2016
20. Brains, Minds and Machines Summer School In 2016: 302 applications for 35 slots Annual STC meeting, 2016
21. Brains, Minds and Machines Summer School Broad introduction to research on human and machine intelligence • computation, neuroscience, cognition • research methods and current results • lecture videos on CBMM website • summer 2015 course materials to be published on MIT OpenCourseWare List of speakers*: Tomaso Poggio Winrich Freiwald Elizabeth Spelke Ken Nakayama Amnon Shashua Dorin Comaniciu Demis Hassabis Gabriel Kreiman Matthew Wilson Rebecca Saxe Patrick Winston James DiCarlo Tom Mitchell Josh McDermott Nancy Kanwisher Boris Katz Josh Tenenbaum L Mahadevan Shimon Ullman Laura Schulz Lorenzo Rosasco Ethan Meyers Larry Abbott Aude Oliva Eero Simoncelli Eddy Chang * CBMM faculty, industrial partners STC Annual Meeting, 2016
23. An example project across thrusts: face recognition Nancy Kanwisher Third Annual NSF Site Visit, June 8 – 9, 2016
24. A project across thrusts: face recognition Winrich Freiwald and Doris Tsao Third Annual NSF Site Visit, June 8 – 9, 2016
25. A project across thrusts: face recognition Model ML AL AM Third Annual NSF Site Visit, June 8 – 9, 2016
26. A project across thrusts: face recognition Model ML AL AM Third Annual NSF Site Visit, June 8 – 9, 2016
27. Another project When and why are deep networks better than shallow networks? Work with Hrushikeshl Mhaskar; initial parts with L. Rosasco and F. Anselmi
28. 28
29. 29
30. 30
31. 31
32. 32
34. Hierarchical feedforward models of the ventral stream do “work”
35. Convolutional networks “HubelWiesel” models include Hubel & Wiesel, 1959: Fukushima, 1980, Wallis & Rolls, 1997; Mel, 1997; LeCun et al 1998; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Wersing and Koerner, 2003; Serre et al., 2007; Freeman and Simoncelli, 2011…. Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
36. Hierarchical feedforward models of the ventral stream do “work”
37. The same hierarchical architectures in the cortex, in the models of vision and in Deep Learning networks 37
40. The same hierarchical architectures in the cortex, in the models of vision and in Deep Learning networks 40
41. DLNNs: two main scientiﬁc questions When and why are deep networks better than shallow networks? Why does SGD work so well for deep networks? Could unsupervised learning work as well? Work with Hrushikeshl Mhaskar; initial parts with L. Rosasco and F. Anselmi
42. Classical learning algorithms: “high” sample complexity and shallow architectures How do the learning machines described by classical learning theory such as kernel machines  compare with brains? ❑ One of the most obvious differences is the ability of people and animals to learn from very few examples (“poverty of stimulus” problem). ❑ A comparison with real brains offers another, related, challenge to learning theory. Classical “learning algorithms” correspond to onelayer architectures. The cortex suggests a hierarchical architecture. Thus…are hierarchical architectures with more layers the answer to the sample complexity issue? Notices of the American Mathematical Society (AMS), Vol. 50, No. 5, 537544, 2003. The Mathematics of Learning: Dealing with Data Tomaso Poggio and Steve Smale
44. Classical learning theory and Kernel Machines
(Regularization in RKHS) ∑min f ∈H & $% 1 ℓ ℓ i =1 V ( f (xi ) − yi ) + λ f 2# K !" implies ∑f (x) = l i α i K (x, xi ) Equation includes splines, Radial Basis Functions and Support Vector Machines (depending on choice of V). RKHS were explicitly introduced in learning theory by Girosi (1997), Vapnik (1998). Moody and Darken (1989), and Broomhead and Lowe (1988) introduced RBF to learning theory. Poggio and Girosi (1989) introduced Tikhonov regularization in learning theory and worked (implicitly) with RKHS. RKHS were used earlier in approximation theory (eg Parzen, 19521970, Wahba, 1990). For a review, see Poggio and Smale, The Mathematics of Learning, Notices of the AMS, 2003
45. Classical kernel machines are equivalent to shallow networks Kernel machines… XY ∑f (x) = l i ci K (x, x i ) + b K KK can be “written” as shallow networks: the value of K corresponds to the “activity” of the “unit” for the input and the correspond to “weights” C1 C n CN + f
47. Deep and shallow networks • Thus depth is not needed to for approximation r ∑g(x) = ci < wi , x > +bi + i=1
48. Deep and shallow networks • Thus depth is not needed to for approximation • Conjecture: depth may be more effective for certain classes of functions r ∑g(x) = ci < wi , x > +bi + i=1
49. When isGdeeneepricbefuttnecrtitohnasn shallow f (x1, x2,..., x8 ) Compositional functions f (x1, x2,..., x8 ) = g3(g21(g11(x1, x2 ), g12 (x3, x4 ))g22 (g11(x5, x6 ), g12 (x7, x8 ))) Mhaskar, Poggio, Liao, 2016
50. When is deeTpheboertetemr:than shallow why and when are deep networks better than shallow network? f (x1, x2,..., x8 ) = g3(g21(g11(x1, x2 ), g12 (x3, x4 ))g22 (g11(x5, x6 ), g12 (x7, x8 ))) Mhaskar, Poggio, Liao, 2016
51. When is deeTpheboertetemr:than shallow why and when are deep networks better than shallow network? f (x1, x2,..., x8 ) = g3(g21(g11(x1, x2 ), g12 (x3, x4 ))g22 (g11(x5, x6 ), g12 (x7, x8 ))) r ∑g(x) = ci < wi , x > +bi + i=1 Mhaskar, Poggio, Liao, 2016
52. When is deeTpheboertetemr:than shallow why and when are deep networks better than shallow network? f (x1, x2,..., x8 ) = g3(g21(g11(x1, x2 ), g12 (x3, x4 ))g22 (g11(x5, x6 ), g12 (x7, x8 ))) Theorem (informal statement) Suppose that a function of d variables is compositional . Both shallow and deep network can approximate f equally well. The number of parameters of the shallow network depends exponentially on d as O(ε −d ) with the dimension whereas for the deep network depends linearly on d that is O(dε −2 ) Mhaskar, Poggio, Liao, 2016
53. Shallow vs deep networks This is the best possible estimate (nwidth result) Mhaskar, Poggio, Liao, 2016
54. Similar results for VC dimension of shallow vs deep networks Poggio, Anselmi, Rosasco, 2015
55. When is deeTphbeoertetemr than shallow Suppose that a function of d variables is compositional . Both shallow and deep network can approximate f equally well. The number of parameters of the shallow network depends exponentially on d as O(ε −d ) with the dimension whereas for the deep network depends linearly on d that is O(dε −2 ) New Proof. Linear combinations of 6 units provides an indicator function; k partitions for each coordinates require 6 k n units in one layer. The next layer computes the entries in the 2D table corresponding to 6kn g+((x61k, xn2)^) 2; they also correspond to tensor products. Two layers with units represent one of the g functions. For convolutional nets total units is (l (6kn + (6kn)^2)) Mhaskar, Poggio, Liao, 2016
56. Our theorem implies directly other known results • A classical theorem [Hastad, 1987] shows that deep circuits are more efﬁcient in representing certain Boolean functions than shallow circuits. Hastad proved that highlyvariable functions (in the sense of having high frequencies in their Fourier spectrum) in particular the parity function cannot even be decently approximated by small constant depth circuits • The main result of [Telgarsky, 2016, Colt] says that there are functions with many oscillations that cannot be represented by shallow networks with linear complexity but can be represented with low complexity by deep networks. 56
57. When is deeCpobroeltlateryr than shallow Our main theorem implies Hastad and Telgarsky theorems. Use our theorem with Boolean variables. Consider the parity function which is comx1pxo2s.it.i.oxndal. Q.E.D For the second part, consider for instance the realvalued polynomial x1x2 ...xd defined on the cube (1, 1)^d. This is a compositional functions that changes signs a lot. Q.E.D. Mhaskar, Poggio, Liao, 2016
58. The curse of dimensionality, the blessing of compositionality
59. The curse of dimensionality, the blessing of compositionality For compositional functions deep networks — but not shallow ones — can avoid the curse of dimensionality, that is the exponential dependence on the dimension of the network complexity and of its sample complexity.
60. Why are compositional functions important? They seem to occur in computations on text, speech, images…why? Conjecture (with Max Tegmark) The hamiltonians of physics induce compositionality in natural signals such as images
61. 61
62. 62
63. 63
64. Remarks 1. A binary tree net is a good proxy for ResNets 2.Scalable algorithms and compositional functions 4. Invariance and pooling 6. Sparse functions and Boolean functions
65. Convolutional Deep Networks (no pooling like in ResNets)) x1 x2 x3 x4 x5 x6 x7 x8 x1 x2 x3 x4 x5 x6 x7 x8 Similar theorems apply to the network on the left and the network on the right in terms of # parameters
66. Hyper deep residual networks: a binary tree net is a good mathematical proxy ∼ x1 x2 x3 x4 x5 x6 x7 x8
67. Remarks 1. A binary tree net is a good proxy for ResNets 2. Scalable algorithms and compositional functions 4. Invariance and pooling 6. Sparse functions and Boolean functions
68. Shiftinvariant, scalable algorithms Mhaskar, Poggio, Liao, 2016
69. Qualitative arguments for compositional functions in vision • Images require algorithms of the compositional function type • Recognition in clutter requires computations compositional functions with
70. Remarks 1. A binary tree net is a good proxy for ResNets 2.Scalable algorithms and compositional functions 4. Invariance and pooling: interpretation of nodes in binary tree 6. Sparse functions and Boolean functions
71. Comment on itheory • itheory is not essential for today theorem; it represents s further analysis of convolutional networks and extensions of them • itheory characterizes how convolution and pooling in multilayer networks reduces sample complexity (—>Lorenzo) • Theorems about extending invariance beyond position invariance and how to learn it from the environment (—> Lorenzo) Anselmi and Poggio, 2016, MIT Press
72. Remarks 1. A binary tree net is a good proxy for ResNets 2.Scalable algorithms and compositional functions 4. Invariance and pooling 6. Sparse functions and Boolean functions
73. Sparse functions Mhaskar, Poggio, Liao, 2016
74. More remarks • Functions that are not compositional/sparse may not be learnable by deep networks • Deep, nonconvolutional, densely connected networks are not better than shallow networks; DCLNs can be much better (for compositional functions) but not for all functions/ computations • Binarization leads to consider sparse Boolean function
75. DLNNs: two main scientiﬁc questions When and why are deep networks better than shallow networks? Why does SGD work so well for deep networks?
76. Parenthetical comment on itheory • Convolution and pooling in multilayer networks reduces sample complexity • Theorems about extending invariance beyond position invariance and how to learn it from the environment Anselmi and Poggio, 2016, MIT Press

1
北京大学林宙辰FirstOrder O...
歧心怡

2
MSRA周明：对话机器人的关键技术
寸俊健

3
北大林宙辰：机器学习一阶算法的优化
朱宏伯

4
微软人工智能首席科学家 邓力——驱动大数据...
兴贤淑

5
清华大学计算机科学与技术系副教授 张敏——...
壤驷迎彤

6
驭势科技联合创始人、CEO吴甘沙：2016...
闵梦竹

7
ScienceEngineeringInt...
乌修远

8
吴甘沙——驭势科技（北京）有限公司 汇报
皮嘉誉

9
Singularity.io公司联合创始人...
阮博涛

10
今日头条科学家李磊：会思考的通用智能机器还...
步以蕊

11
singulariti.io联合创始人和c...
奈宏毅

12
Status and challenges...
葛碧白

13
微软人工智能首席科学家和深度学习技术中心研...
雨天Rainyday

MIT CSAIL Tomaso Pogg...
衡云韶

15
清华大学张敏：我们离真正的智能还缺少什么
季醉蝶

16
Three Key Engines of ...
邵令暎
分享