Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 11, Issue 1 (2024)
  4. A quantitative functional central limit ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

A quantitative functional central limit theorem for shallow neural networks
Volume 11, Issue 1 (2024), pp. 85–108
Valentina Cammarota   Domenico Marinucci   Michele Salvi ORCID icon link to view author Michele Salvi details   Stefano Vigogna  

Authors

 
Placeholder
https://doi.org/10.15559/23-VMSTA238
Pub. online: 28 November 2023      Type: Research Article      Open accessOpen Access

Received
5 July 2023
Revised
23 October 2023
Accepted
15 November 2023
Published
28 November 2023

Abstract

We prove a quantitative functional central limit theorem for one-hidden-layer neural networks with generic activation function. Our rates of convergence depend heavily on the smoothness of the activation function, and they range from logarithmic for nondifferentiable nonlinearities such as the ReLu to $\sqrt{n}$ for highly regular activations. Our main tools are based on functional versions of the Stein–Malliavin method; in particular, we rely on a quantitative functional central limit theorem which has been recently established by Bourguin and Campese [Electron. J. Probab. 25 (2020), 150].

References

[1] 
Azmoodeh, E., Peccati, G., Yang, X.: Malliavin-Stein method: a survey of some recent developments. Mod. Stoch. Theory Appl. 8(2), 141–177 (2021). MR4279874. https://doi.org/10.15559/21-vmsta184
[2] 
Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(19), 1–53 (2017). MR3634886
[3] 
Basteri, A., Trevisan, D.: Quantitative Gaussian approximation of randomly initialized deep neural networks (2022). arXiv:2203.07379
[4] 
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA 116(32), 15849–15854 (2019). MR3997901. https://doi.org/10.1073/pnas.1903070116
[5] 
Bietti, A., Bach, F.: Deep equals shallow for ReLu networks in kernel regimes. In: International Conference on Learning representations (ICLR), 9 (2021)
[6] 
Bordino, A., Favaro, F., Fortini, S.: Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities (2023). arXiv:2304.04010
[7] 
Bourguin, S., Campese, S., Leonenko, N., Taqqu, M.S.: Four moments theorems on Markov chaos. Ann. Probab. 47(3), 1417–1446 (2019). MR3945750. https://doi.org/10.1214/18-AOP1287
[8] 
Bourguin, S., Campese, S.: Approximation of Hilbert-valued Gaussians on Dirichlet structures. Electron. J. Probab. 25, 150 (2020), 30 pp. MR4193891. https://doi.org/10.1214/20-ejp551
[9] 
Chizat, L., Oyallon, E., Bach, F.: On lazy training in differentiable programming. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (2019).
[10] 
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989). MR1015670. https://doi.org/10.1007/BF02551274
[11] 
Daniely, A., Frostig, R., Singer, Y.: Toward deeper understanding of neural networks: the power of initialization and a dual view on expressivity. In: NeurIPS 2016, Volume 29, 2253–2261 (2016)
[12] 
Döbler, C., Kasprzak, M., Peccati, G.: The multivariate functional de Jong CLT. Probab. Theory Relat. Fields 184(1–2), 367–399 (2022). MR4498513. https://doi.org/10.1007/s00440-022-01114-3
[13] 
Eldan, R., Mikulincer, D., Schramm, T.: Non-asymptotic approximations of neural networks by Gaussian processes (2021). arXiv:2102.08668
[14] 
Goel, S., Karmalkar, S., Klivans, S.A.: Time/accuracy tradeoffs for learning a ReLu with respect to Gaussian marginals. In: NeurIPS 2019, pp. 8582–8591 (2019)
[15] 
Hanin, B.: Random neural networks in the infinite width limit as Gaussian processes (2021). arXiv:2107.01562
[16] 
Hornik, K.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
[17] 
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
[18] 
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (2018).
[19] 
Klukowski, A.: Rate of convergence of polynomial networks to Gaussian processes (2021). arXiv:2111.03175
[20] 
Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25(1), 256–306 (2015). MR3320893. https://doi.org/10.1007/s00039-015-0312-0
[21] 
Leshno, M., Lin, V.Ya., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5
[22] 
Marinucci, D., Peccati, G.: Random Fields on the Sphere. Cambridge University Press (2011). MR2840154. https://doi.org/10.1017/CBO9780511751677
[23] 
Neal, R.M.: Priors for infinite networks. In: Bayesian Learning for Neural Networks, pp. 29–53. Springer, New York, NY (1996). https://doi.org/10.1007/978-1-4612-0745-0_2
[24] 
Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145(1–2), 75–118 (2009). MR2520122. https://doi.org/10.1007/s00440-008-0162-x
[25] 
Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Universality. Cambridge Tracts in Math., vol. 192. Cambridge University Press, Cambridge (2012). MR2962301. https://doi.org/10.1017/CBO9781139084659
[26] 
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numerica (1999). MR1819645. https://doi.org/10.1017/S0962492900002919
[27] 
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20 (NeurIPS 2007) (2007)
[28] 
Roberts, Yaida S, D.A., Hanin, B.: The principles of deep learning theory (2021). arXiv:2106.10165
[29] 
Yaida, S.: Non-Gaussian processes and neural networks at finite widths (2019). arXiv:1910.00019. MR4198759. https://doi.org/10.1007/s40687-020-00233-4
[30] 
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: 5th International Conference on Learning Representations (ICLR) (2017)

Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2024 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Quantitative functional central limit theorem Wiener-chaos expansions neural networks Gaussian processes

MSC2010
60F17 68T07 60G60

Funding
The work was partially supported by the MUR Excellence Department Project MatMod@TOV awarded to the Department of Mathematics, University of Rome Tor Vergata, CUP E83C18000100006. We also acknowledge financial support from the MUR 2022 PRIN project GRAFIA, project code 202284Z9E4, the INdAM group GNAMPA and the PNRR CN1 High Performance Computing, Spoke 3.

Metrics
since March 2018
907

Article info
views

393

Full article
views

507

PDF
downloads

146

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

MSTA

Journal

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy