A quantitative functional central limit theorem for shallow neural networks
Volume 11, Issue 1 (2024), pp. 85–108
Pub. online: 28 November 2023
Type: Research Article
Open Access
Received
5 July 2023
5 July 2023
Revised
23 October 2023
23 October 2023
Accepted
15 November 2023
15 November 2023
Published
28 November 2023
28 November 2023
Abstract
We prove a quantitative functional central limit theorem for one-hidden-layer neural networks with generic activation function. Our rates of convergence depend heavily on the smoothness of the activation function, and they range from logarithmic for nondifferentiable nonlinearities such as the ReLu to $\sqrt{n}$ for highly regular activations. Our main tools are based on functional versions of the Stein–Malliavin method; in particular, we rely on a quantitative functional central limit theorem which has been recently established by Bourguin and Campese [Electron. J. Probab. 25 (2020), 150].
References
Azmoodeh, E., Peccati, G., Yang, X.: Malliavin-Stein method: a survey of some recent developments. Mod. Stoch. Theory Appl. 8(2), 141–177 (2021). MR4279874. https://doi.org/10.15559/21-vmsta184
Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(19), 1–53 (2017). MR3634886
Basteri, A., Trevisan, D.: Quantitative Gaussian approximation of randomly initialized deep neural networks (2022). arXiv:2203.07379
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA 116(32), 15849–15854 (2019). MR3997901. https://doi.org/10.1073/pnas.1903070116
Bordino, A., Favaro, F., Fortini, S.: Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities (2023). arXiv:2304.04010
Bourguin, S., Campese, S., Leonenko, N., Taqqu, M.S.: Four moments theorems on Markov chaos. Ann. Probab. 47(3), 1417–1446 (2019). MR3945750. https://doi.org/10.1214/18-AOP1287
Bourguin, S., Campese, S.: Approximation of Hilbert-valued Gaussians on Dirichlet structures. Electron. J. Probab. 25, 150 (2020), 30 pp. MR4193891. https://doi.org/10.1214/20-ejp551
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989). MR1015670. https://doi.org/10.1007/BF02551274
Döbler, C., Kasprzak, M., Peccati, G.: The multivariate functional de Jong CLT. Probab. Theory Relat. Fields 184(1–2), 367–399 (2022). MR4498513. https://doi.org/10.1007/s00440-022-01114-3
Eldan, R., Mikulincer, D., Schramm, T.: Non-asymptotic approximations of neural networks by Gaussian processes (2021). arXiv:2102.08668
Hanin, B.: Random neural networks in the infinite width limit as Gaussian processes (2021). arXiv:2107.01562
Hornik, K.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
Klukowski, A.: Rate of convergence of polynomial networks to Gaussian processes (2021). arXiv:2111.03175
Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25(1), 256–306 (2015). MR3320893. https://doi.org/10.1007/s00039-015-0312-0
Leshno, M., Lin, V.Ya., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5
Marinucci, D., Peccati, G.: Random Fields on the Sphere. Cambridge University Press (2011). MR2840154. https://doi.org/10.1017/CBO9780511751677
Neal, R.M.: Priors for infinite networks. In: Bayesian Learning for Neural Networks, pp. 29–53. Springer, New York, NY (1996). https://doi.org/10.1007/978-1-4612-0745-0_2
Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145(1–2), 75–118 (2009). MR2520122. https://doi.org/10.1007/s00440-008-0162-x
Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Universality. Cambridge Tracts in Math., vol. 192. Cambridge University Press, Cambridge (2012). MR2962301. https://doi.org/10.1017/CBO9781139084659
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numerica (1999). MR1819645. https://doi.org/10.1017/S0962492900002919
Roberts, Yaida S, D.A., Hanin, B.: The principles of deep learning theory (2021). arXiv:2106.10165
Yaida, S.: Non-Gaussian processes and neural networks at finite widths (2019). arXiv:1910.00019. MR4198759. https://doi.org/10.1007/s40687-020-00233-4