More formatting, and stuck on the taylor expansion
This commit is contained in:
parent
b3992f3fbc
commit
b40d030887
|
@ -92,7 +92,7 @@ We will present here some standard invariants of Brownian motions. The proofs ar
|
|||
Let $p \in [2,\infty)$. We denote by $\mathfrak{k}_p \in \R$ the real number given by $\mathfrak{k}:=\inf \{ c\in \R \}$ where it holds that for every probability space $(\Omega, \mathcal{F}, \mathbb{P})$ and every random variable $\mathcal{X}: \Omega \rightarrow \R$ with $\E[|\mathcal{X}|] < \infty$ that $\lp \E \lb \lv \mathcal{X} - \E \lb \mathcal{X} \rb \rp^p \rb \rp ^{\frac{1}{p}} \leqslant c \lp \E \lb \lv \mathcal{X} \rv^p \rb \rp ^{\frac{1}{p}}.$
|
||||
\end{definition}
|
||||
|
||||
\begin{definition}[Primary Setting]\label{primarysetting} Let $d,m \in \mathbb{N}$, $T, \mathfrak{L},p \in [0,\infty)$, $\mathfrak{p} \in [2,\infty)$ $\mathfrak{m} = \mathfrak{k}_{\mathfrak{p}}\sqrt{\mathfrak{p}-1}$, $\Theta = \mathbb{Z}$, $g \in C(\mathbb{R}^d,\mathbb{R})$, assume for all $t \in [0,T],x\in \mathbb{R}^d$ that:
|
||||
\begin{definition}[Primary Setting For This Chapter]\label{primarysetting} Let $d,m \in \mathbb{N}$, $T, \mathfrak{L},p \in [0,\infty)$, $\mathfrak{p} \in [2,\infty)$ $\mathfrak{m} = \mathfrak{k}_{\mathfrak{p}}\sqrt{\mathfrak{p}-1}$, $\Theta = \mathbb{Z}$, $g \in C(\mathbb{R}^d,\mathbb{R})$, assume for all $t \in [0,T],x\in \mathbb{R}^d$ that:
|
||||
\begin{align}\label{(2.1.2)}
|
||||
\max\{|g(x)|\} \leqslant \mathfrak{L} \lp 1+\|x\|_E^p \rp
|
||||
\end{align}
|
||||
|
@ -118,17 +118,17 @@ Assume Setting \ref{primarysetting} then:
|
|||
\end{enumerate}
|
||||
\end{lemma}
|
||||
|
||||
\begin{proof} For (i) Consider that $\mathcal{W}^{(\theta,0,-k)}_{T-t}$ are continuous random fields and that $g\in C(\mathbb{R}^d,\mathbb{R})$, we have that $U^\theta(t,x)$ is the composition of continuous functions with $m > 0$ by hypothesis, ensuring no singularities. Thus $U^\theta: [0,T] \times \mathbb{R}^d\times \Omega \rightarrow \mathbb{R}$.
|
||||
\begin{proof} For (i) Consider that $\mathcal{W}^{(\theta,0,-k)}_{T-t}$ are continuous random fields and that $g\in C(\mathbb{R}^d,\mathbb{R})$, we have that $U^\theta(t,x)$ is the composition of continuous functions with $m > 0$ by hypothesis, ensuring no singularities. Thus $U^\theta: [0,T] \times \mathbb{R}^d\times \Omega \rightarrow \mathbb{R}$ is a continuous random field.
|
||||
|
||||
\medskip
|
||||
|
||||
For (ii) observe that for all $\theta \in \Theta$ it holds that $\mathcal{W}^\theta$ is $\mathcal{B} \lp \lb 0, T \rb \otimes \sigma \lp W^\theta \rp \rp /\mathcal{B}\lp \mathbb{R}^d \rp$-measurable, this, and induction on prove item (ii).
|
||||
|
||||
\medskip
|
||||
Moreover observe that item (ii) and the fact that for all $\theta \in \Theta$ it holds that $\lp\mathcal{W}^{\lp \theta, \vartheta\rp}_{\vartheta \in \Theta}\rp$, $\mathcal{W}^\theta$ are independently establish item (iii).
|
||||
Moreover observe that item (ii) and the fact that for all $\theta \in \Theta$ it holds that $\lp\mathcal{W}^{\lp \theta, \vartheta\rp}_{\vartheta \in \Theta}\rp$, $\mathcal{W}^\theta$ are independent establish item (iii).
|
||||
|
||||
\medskip
|
||||
Furthermore, note that (ii) and the fact that for all $i,k,\mathfrak{i},\mathfrak{k} \in \mathbb{Z}$, $\theta \in \Theta$, with $(i,k) \neq (\mathfrak{i},\mathfrak{k})$ it holds that $\lp\mathcal{W}^{\lp\theta, i,k,\vartheta\rp}\rp_{\vartheta \in \Theta}$ and $\lp\mathcal{W}^{\lp\theta,\mathfrak{i},\mathfrak{k},\vartheta\rp}\rp_{\vartheta \in \Theta}$ are independent establish item (iv).
|
||||
Furthermore, note that (ii) and the fact that for all $i,k,\mathfrak{i},\mathfrak{k} \in \mathbb{Z}$, $\theta \in \Theta$, with $(i,k) \neq (\mathfrak{i},\mathfrak{k})$ it holds that $\lp\mathcal{W}^{\lp\theta, i,k,\vartheta\rp}\rp_{\vartheta \in \Theta}$ and $\lp\mathcal{W}^{\lp\theta,\mathfrak{i},\mathfrak{k},\vartheta\rp}\rp_{\vartheta \in \Theta}$ are independent, establish item (iv).
|
||||
|
||||
\medskip
|
||||
Hutzenhaler \cite[Corollary~2.5 ]{hutzenthaler_overcoming_2020} establish item (v). This completes the proof of Lemma 1.1.
|
||||
|
|
|
@ -9,14 +9,14 @@ Our goal in this dissertation is threefold:
|
|||
\begin{enumerate}[label = (\roman*)]
|
||||
\item Firstly, we will take something called Multi-Level Picard first introduced in \cite{e_multilevel_2019} and \cite{e_multilevel_2021}, and in particular, the version of Multi-Level Picard that appears in \cite{hutzenthaler_strong_2021}. We show that dropping the drift term and substantially simplifying the process still results in convergence of the method and polynomial bounds for the number of computations required and rather nice properties for the approximations, such as integrability and measurability.
|
||||
\item We will then go on to realize that the solution to a modified version of the heat equation has a solution represented as a stochastic differential equation by Feynman-Kac and further that a version of this can be realized by the modified multi-level Picard technique mentioned in Item (i), with certain simplifying assumptions since we dropped the drift term. A substantial amount of this is inspired by \cite{bhj20} and much earlier work in \cite{karatzas1991brownian} and \cite{da_prato_zabczyk_2002}.
|
||||
\item By far, the most significant part of this dissertation is dedicated to expanding and building upon a framework of neural networks as appears in \cite{grohs2019spacetime}. We modify this definition highly and introduce several new neural network architectures to this framework ($\tay, \pwr, \trp, \tun,\etr$, among others) and show, for all these neural networks, that the parameter count grows only polynomially as the accuracy of our model increases, thus beating the curse of dimensionality. This finally paves the way for giving neural network approximations to the techniques realized in Item (ii). We show that it is not too wasteful (defined on the polynomiality of parameter counts) to use neural networks to approximate MLP to approximate a stochastic differential equation equivalent to certain parabolic PDEs as Feynman-Kac necessitates.
|
||||
\item By far, the most significant part of this dissertation is dedicated to expanding and building upon a framework of neural networks as appears in \cite{grohs2019spacetime}. We modify this definition highly and introduce several new neural network architectures to this framework ($\pwr, \pnm, \tun,\etr, \xpn, \csn, \sne, \mathsf{E},\mathsf{UE}, \mathsf{UEX}$, and $\mathsf{UEX}$, among others) and show, for all these neural networks, that the parameter count grows only polynomially as the accuracy of our model increases, thus beating the curse of dimensionality. This finally paves the way for giving neural network approximations to the techniques realized in Item (ii). We show that it is not too wasteful (defined on the polynomiality of parameter counts) to use neural networks to approximate MLP to approximate a stochastic differential equation equivalent to certain parabolic PDEs as Feynman-Kac necessitates.
|
||||
\\~\\
|
||||
We end this dissertation by proposing two avenues of further research: analytical and algebraic. This framework of understanding neural networks as ordered tuples of ordered pairs may be extended to give neural network approximation of classical PDE approximation techniques such as Runge-Kutta, Adams-Moulton, and Bashforth. We also propose three conjectures about neural networks, as defined in \cite{grohs2019spacetime}. They form a bimodule, and that realization is a functor.
|
||||
\end{enumerate}
|
||||
This dissertation is broken down into three parts. At the end of each part, we will encounter tent-pole theorems, which will eventually lead to the final neural network approximation outcome. These tentpole theorems are Theorem \ref{tentpole_1}, Theorem \ref{thm:3.21}, and Theorem. Finally, the culmination of these three theorems is Theorem, the end product of the dissertation.
|
||||
This dissertation is broken down into three parts. At the end of each part, we will encounter tent-pole theorems, which will eventually lead to the final neural network approximation outcome. These tentpole theorems are Theorem \ref{tentpole_1}, Theorem \ref{thm:3.21}, and Theorem \ref{ues}. Finally, the culmination of these three theorems is Corollary \ref{cor_ues}, the end product of the dissertation. We hope, you the reader will enjoy this.
|
||||
|
||||
\section{Notation, Definitions \& Basic notions.}
|
||||
We introduce here basic notations that we will be using throughout this dissertation. Large parts are taken from standard literature inspired by \textit{Matrix Computations} by \cite{golub2013matrix}, and \textit{Probability: Theory \& Examples} by Rick \cite{durrett2019probability}.
|
||||
We introduce here basic notations that we will be using throughout this dissertation. Large parts are taken from standard literature inspired by \textit{Matrix Computations} by Golub \& van Loan, \cite{golub2013matrix}, \textit{Probability: Theory \& Examples} by Rick Durrett, \cite{durrett2019probability}, and \textit{Concrete Mathematics} by Knuth, Graham \& Patashnik, \cite{graham_concrete_1994}.
|
||||
\subsection{Norms and Inner Products}
|
||||
\begin{definition}[Euclidean Norm]
|
||||
Let $\left\|\cdot\right\|_E: \R^d \rightarrow [0,\infty)$ denote the Euclidean norm defined for every $d \in \N_0$ and for all $x= \{x_1,x_2,\cdots, x_d\}\in \R^d$ as:
|
||||
|
@ -26,7 +26,7 @@ We introduce here basic notations that we will be using throughout this disserta
|
|||
For the particular case that $d=1$ and where it is clear from context, we will denote $\| \cdot \|_E$ as $|\cdot |$.
|
||||
\end{definition}
|
||||
\begin{definition}[Max Norm]
|
||||
Let $\left\| \cdot \right\|_{\infty}: \R^d \rightarrow [0,\infty )$ denote the max norm defined for every $d \in \N_0$ and for all $x = \left\{ x_1,x_2,\cdots,x_d \right\} \in \R^d$ as:
|
||||
Let $\left\| \cdot \right\|_{\infty}: \R^d \rightarrow [0,\infty )$ denote the max norm defined for every $d \in \N$ and for all $x = \left\{ x_1,x_2,\cdots,x_d \right\} \in \R^d$ as:
|
||||
\begin{align}
|
||||
\left\| x \right\|_{\infty} = \max_{i \in \{1,2,\cdots,d\}} \left\{\left| x_i \right| \right\}
|
||||
\end{align}
|
||||
|
@ -46,7 +46,7 @@ Let $\|\cdot \|_F: \R^{m\times n} \rightarrow [0,\infty)$ denote the Frobenius n
|
|||
\begin{definition}[Euclidean Inner Product]
|
||||
Let $\la \cdot, \cdot \ra: \R^d \times \R^d \rightarrow \R$ denote the Euclidean inner product defined for every $d \in \N$, for all $\R^d \ni x = \{x_1,x_2,...,x_d\}$, and for all $\R^d \ni y = \{y_1,y_2,..., y_d\}$ as:
|
||||
\begin{align}
|
||||
\la x, y \ra = \sum^d_{i=1} \lp x_i y_i \rp
|
||||
\la x, y \ra = \sum^d_{i=1} \left| x_i y_i \right|
|
||||
\end{align}
|
||||
\end{definition}
|
||||
|
||||
|
@ -56,7 +56,7 @@ Let $\|\cdot \|_F: \R^{m\times n} \rightarrow [0,\infty)$ denote the Frobenius n
|
|||
\begin{enumerate}[label = (\roman*)]
|
||||
\item $\Omega$ is a set of outcomes called the \textbf{sample space}.
|
||||
\item $\mathcal{F}$ is a set of events called the \textbf{event space}, where each event is a set of outcomes from the sample space. More specifically, it is a $\sigma$-algebra on the set $\Omega$.
|
||||
\item A measurable function $\mathbb{P}: \mathcal{F} \rightarrow [0,1]$ assigning each event in the \textbf{event space} a probability between $0$ and $1$. More specifically, $\mathbb{P}$ is a measure on $\Omega$ with the caveat that the measure of the entire space is $1$, i.e., $\mathbb{P}(\Omega) = 1$.
|
||||
\item A measurable function $\mathbb{P}: \mathcal{F} \rightarrow [0,1]$ assigning each event in the \textbf{event space} a probability. More specifically, $\mathbb{P}$ is a measure on $\Omega$ with the caveat that the measure of the entire space is $1$, i.e., $\mathbb{P}(\Omega) = 1$.
|
||||
\end{enumerate}
|
||||
\end{definition}
|
||||
|
||||
|
@ -70,6 +70,12 @@ Let $\|\cdot \|_F: \R^{m\times n} \rightarrow [0,\infty)$ denote the Frobenius n
|
|||
\E\lb X \rb=\int_\Omega X d\mathbb{P}
|
||||
\end{align}
|
||||
\end{definition}
|
||||
\begin{definition}[Variance]
|
||||
Given a probability space $\lp \Omega, \cF, \bbP \rp$, the variance of variable $X$, assuming $\E \lb X\rb < \infty$, denoted $\var\lb X\rb$, is the identity given by:
|
||||
\begin{align}
|
||||
\var\lb X \rb = \E\lb X^2\rb - \lp \E\lb X\rb\rp^2
|
||||
\end{align}
|
||||
\end{definition}
|
||||
|
||||
\begin{definition}[Stochastic Process]
|
||||
A stochastic process is a family of random variables over a fixed probability space $(\Omega, \mathcal{F}, \mathbb{R})$, indexed over a set, usually $\lb 0, T\rb$ for $T\in \lp 0,\infty\rp$.
|
||||
|
|
|
@ -58,7 +58,7 @@ Note here the difference between Definition \ref{actnn} and Definition \ref{7.2.
|
|||
Let $d \in \N$. It then the case that for all $d \in \N$ we have that $\param\lp \id_d\rp = 4d^2+3d$
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
By observation we have that $\param \lp \id_1\rp = 4(1)^2+3(1) = 7$. By induction, suppose that this holds for all natural numbers up to and including $n$, i.e., for all naturals up to and including $n$; it is the case that $\param \lp id_n\rp = 4n^2+3n$. Note then that $\id_{n+1} = \id_n \boxminus \id_1$. For $W_1$ and $W_2$ of this new network, this adds a combined extra $8n+4$ parameters. For $b_1$ and $b_2$ of this new network, this adds a combined extra $3$ parameters. Thus, we have the following:
|
||||
By observation we have that $\param \lp \id_1\rp = 4(1)^2+3(1) = 7$. By induction, suppose that this holds for all natural numbers up to and including $n$, i.e., for all naturals up to and including $n$; it is the case that $\param \lp \id_n\rp = 4n^2+3n$. Note then that $\id_{n+1} = \id_n \boxminus \id_1$. For $W_1$ and $W_2$ of this new network, this adds a combined extra $8n+4$ parameters. For $b_1$ and $b_2$ of this new network, this adds a combined extra $3$ parameters. Thus, we have the following:
|
||||
\begin{align}
|
||||
4n^2+3n + 8n+4 + 3 &= 4(n+1)^2+3(n+1)
|
||||
\end{align}
|
||||
|
@ -119,7 +119,7 @@ Let $x \in \R$. Upon instantiation with $\rect$ and $d=1$ we have:
|
|||
\end{align}
|
||||
\textit{Case 1.ii:} Let $\nu = \lp \lp W_1,b_1 \rp, \lp W_2,b_2 \rp, ..., \lp W_L, b_L \rp \rp $. Deriving from Definition \ref{7.2.1} and \ref{5.2.1} we have that:
|
||||
\begin{align}
|
||||
\id_1\bullet \nu &= \lp \lp W_1,b_1\rp,\lp W_2,b_2 \rp,...,\lp W_{L-1},b_{L-1} \rp, \lp \begin{bmatrix}
|
||||
&\id_1\bullet \nu \nonumber\\ &= \lp \lp W_1,b_1\rp,\lp W_2,b_2 \rp,...,\lp W_{L-1},b_{L-1} \rp, \lp \begin{bmatrix}
|
||||
1 \\-1
|
||||
\end{bmatrix} W_L, \begin{bmatrix}
|
||||
1 \\ -1
|
||||
|
@ -204,7 +204,7 @@ This, along with Case 1. iii, implies that the uninstantiated first layer is equ
|
|||
Observe that Definitions \ref{5.2.5} and \ref{7.2.1} tells us that:
|
||||
|
||||
\begin{align}
|
||||
\boxminus^d_{i=1} \id_i = \lp \lp \overbrace{\begin{bmatrix}
|
||||
&\boxminus^d_{i=1} \id_i\\ &= \lp \lp \overbrace{\begin{bmatrix}
|
||||
\we_{\id_1,1} \\
|
||||
&&\ddots \\
|
||||
&&& \we_{\id_1,1}
|
||||
|
@ -710,6 +710,7 @@ This completes the proof.
|
|||
% This completes the proof of the lemma.
|
||||
%\end{proof}
|
||||
\section{Maximum Convolution Approximations for Multi-Dimensional Functions}
|
||||
We will present here an approximation scheme for continuous functions called maximum convolution approximation. This derives mainly from Chapter 4 of \cite{bigbook}, and our contribution is mainly to show parameter bounds, and convergence in the case of $1$-D approximation.
|
||||
\subsection{The $\nrm^d_1$ Networks}
|
||||
\begin{definition}[The $\nrm_1^d$ neural network]
|
||||
We denote by $\lp \nrm_1^d \rp _{d\in \N} \subseteq \neu$ the family of neural networks that satisfy:
|
||||
|
@ -997,7 +998,7 @@ Given $x\in \R$, it is straightforward to find the maximum; $ x$ is the maximum.
|
|||
\lp\real_{\rect} \lp \mxm^2 \rp \rp \lp x \rp &= \max \{x_1-x_2,0\} + \max\{x_2,0 \} - \max\{ -x_2,0\} \nonumber \\
|
||||
&= \max \{x_1-x_2,0\} + x_2 = \max\{x_1,x_2\}
|
||||
\end{align}
|
||||
Note next that Lemma \ref{idprop}, Lemma \ref{comp_prop}, and \cite[Proposition~2.19]{grohs2019spacetime} then imply for all $d \in \{2,3,4,...\}$, $x = \{x_1,x_2,...,x_d\} \in \R^d$ it holds that $\lp \real_{\rect} \lp \mxm^d \rp \rp \lp x \rp \in C \lp \R^d,\R \rp$. and $\lp \real_{\rect} \lp \mxm^d \rp \rp \lp x \rp = \max\{ x_1,x_2,...,x_d \}$. This establishes Items (iii)-(iv).
|
||||
Note next that Lemma \ref{idprop}, Lemma \ref{comp_prop}, and \cite[Proposition~2.19]{grohs2019spacetime} then imply for all $d \in \{2,3,4,...\}$, $x = \{x_1,x_2,...,x_d\} \in \R^d$ it holds that $\lp \real_{\rect} \lp \mxm^d \rp \rp \lp x \rp \in C \lp \R^d,\R \rp$. and $\lp \real_{\rect} \lp \mxm^d \rp \rp \lp x \rp = \max\{ x_1,x_2,...,x_d \}$. This establishes Items (iii)\textemdash(iv).
|
||||
|
||||
Consider now the fact that Item (ii) implies that the layer architecture forms a geometric series whence we have that the number of bias parameters is bounded by:
|
||||
\begin{align}
|
||||
|
@ -1134,7 +1135,7 @@ We will call the approximant $\max_{i \in \{0,1,\hdots, N\}}\{ f_i\}$, the \text
|
|||
|
||||
\tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
|
||||
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=1]
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-0.9,xscale=0.9]
|
||||
%uncomment if require: \path (0,560); %set diagram left start at 0, and has height of 560
|
||||
|
||||
%Shape: Rectangle [id:dp1438938274656144]
|
||||
|
|
|
@ -386,7 +386,7 @@ This, and the fact that $\delta = 2^{\frac{-2}{q-2}}\ve ^{\frac{q}{q-2}}$ render
|
|||
\begin{tabular}{@{}l|llllll@{}}
|
||||
\toprule
|
||||
& Min. & 1\textsuperscript{st} Qu. & Median & Mean & 3\textsuperscript{rd} Qu. & Max. \\ \midrule
|
||||
Experimental $|x^2 - \real_{\rect}(\mathsf{Sqr}^{q,\ve})(x)$ & 0.000003 & 0.089438 & 0.337870 & 3.148933 & 4.674652 & 20.00 \\ \midrule
|
||||
Experimental $|x^2 - \real_{\rect}(\mathsf{Sqr}^{q,\ve})(x)$ & 0.00000 & 0.08943 & 0.33787 & 3.14893 & 4.67465 & 20.00 \\ \midrule
|
||||
Theoretical $|x^2 - \real_{\rect}(\mathsf{Sqr})^{q,\ve}(x)$ & 0.010 & 1.715 & 10.402 & 48.063 & 45.538 & 1250.00 \\ \midrule
|
||||
Difference & 0.001 & 1.6012 & 9.8655 & 44.9141 & 40.7102 & 1230
|
||||
\end{tabular}
|
||||
|
@ -476,7 +476,7 @@ We are finally ready to give neural network representations of arbitrary product
|
|||
\end{align}
|
||||
This proves Item (iv).
|
||||
|
||||
By symmetry it holds that $\param \lp \frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_1,0} \rp \rp = \param \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_2,0} \rp \rp = \param \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_3,0} \rp \rp$ and further that $\lay \lp \frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_1,0} \rp \rp = \lay \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_2,0} \rp \rp = \lay \lp -\frac{1}{2}\triangleright\lp \Psi \bullet \aff_{A_3,0} \rp \rp$.
|
||||
By symmetry it holds that $\param \lp \frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_1,0} \rp \rp \\ = \param \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_2,0} \rp \rp = \param \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_3,0} \rp \rp$ and further that $\lay \lp \frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_1,0} \rp \rp = \lay \lp -\frac{1}{2}\triangleright \lp \Psi \bullet \aff_{A_2,0} \rp \rp = \lay \lp -\frac{1}{2}\triangleright\lp \Psi \bullet \aff_{A_3,0} \rp \rp$.
|
||||
Note also that Corollary \ref{affcor} tells us that for all $i \in \{1,2,3\}$ and $a \in \{ \frac{1}{2},-\frac{1}{2}\}$ it is the case that:
|
||||
\begin{align}
|
||||
\param \lp a \triangleright \lp \Psi \bullet \aff_{A_i,0}\rp \rp = \param \lp \Psi \rp
|
||||
|
@ -688,7 +688,7 @@ We take inspiration from the $\sm$ neural network to create the $\prd$ neural ne
|
|||
\end{align}
|
||||
And that by definition of composition:
|
||||
\begin{align}
|
||||
\param \lp \tun_3 \rp &= \param \lb \lp \lp \begin{bmatrix}
|
||||
&\param \lp \tun_3 \rp \\ &= \param \lb \lp \lp \begin{bmatrix}
|
||||
1 \\ -1
|
||||
\end{bmatrix}, \begin{bmatrix}
|
||||
0 \\ 0
|
||||
|
@ -1172,7 +1172,7 @@ Let $\mathfrak{p}_i$ for $i \in \{1,2,...\}$ be the set of functions defined for
|
|||
This completes the proof of the lemma.
|
||||
\end{proof}
|
||||
\begin{remark}\label{rem:pwr_gets_deeper}
|
||||
Note each power network $\pwr_n^{q,\ve}$ is at least as big as the previous power network $\pwr_{n-1}^{q,\ve}$, one differs from the other by one $\prd^{q, ve}$ network.
|
||||
Note each power network $\pwr_n^{q,\ve}$ is at least as deep and parameter-rich as the previous power network $\pwr_{n-1}^{q,\ve}$, one differs from the next by one $\prd^{q, \ve}$ network.
|
||||
\end{remark}
|
||||
\subsection{$\pnm_{n,C}^{q,\ve}$ and Neural Network Polynomials.}
|
||||
|
||||
|
@ -1282,7 +1282,7 @@ Let $\mathfrak{p}_i$ for $i \in \{1,2,...\}$ be the set of functions defined for
|
|||
|
||||
\end{tikzpicture}
|
||||
\end{center}
|
||||
\caption{Neural network diagram for an elementary neural network polynomial.}
|
||||
\caption{Neural network diagram for an elementary neural network polynomial, with all coefficients being uniformly $1$.}
|
||||
\end{figure}
|
||||
|
||||
\begin{lemma}[R\textemdash,2023]\label{6.2.9}\label{nn_poly}\label{mnm_prop}
|
||||
|
@ -1314,7 +1314,7 @@ Let $\mathfrak{p}_i$ for $i \in \{1,2,...\}$ be the set of functions defined for
|
|||
\end{enumerate}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Note that by Lemma \ref{5.6.3}, Lemma \ref{power_prop}, and Lemma \ref{comp_prop} for all $n\in \N_0$ it is the case that:
|
||||
Note that by Lemma \ref{5.6.3}, Lemma \ref{power_prop}, and Lemma \ref{comp_prop} indicate for all $n\in \N_0$ it is the case that:
|
||||
\begin{align}
|
||||
\real_{\rect}\lp \pnm_{n,C}^{q,\ve} \rp &= \real_{\rect} \lp \bigoplus^n_{i=0} \lb c_i \triangleright\lb \tun_{\max_i \left\{\dep \lp \pwr_i^{q,\ve} \rp\right\} +1 - \dep \lp \pwr^{q,\ve}_i\rp} \bullet \pwr_i^{q,\ve}\rb \rb \rp \nonumber\\
|
||||
&= \sum^n_{i=1}c_i \real_{\rect}\lp \tun_{\max_i \left\{\dep \lp \pwr_i^{q,\ve} \rp\right\} +1 - \dep \lp \pwr^{q,\ve}_i\rp} \bullet \pwr_i^{q,\ve} \rp \nonumber\\
|
||||
|
@ -1338,7 +1338,7 @@ Let $\mathfrak{p}_i$ for $i \in \{1,2,...\}$ be the set of functions defined for
|
|||
\end{align}
|
||||
This then yields us $2$ parameters.
|
||||
|
||||
Note that each neural network summand in $\pnm_n^{q,\ve}$ consists of a combination of $\tun_k$ and $\pwr_k$ for some $k\in \N$. Each $\pwr_k$ has at least as many parameters as a tunneling neural network of that depth, as Lemma \ref{param_pwr_geq_param_tun} tells us. This, finally, with Lemma \ref{aff_effect_on_layer_architecture}, Corollary \ref{affcor}, and Lemma \ref{power_prop} then implies that:
|
||||
Note that each neural network summand in $\pnm_n^{q,\ve}$ consists of a combination of $\tun_k$ and $\pwr_k$ for some $k\in \N$. Each $\pwr_k$ has at least as many parameters as a tunneling neural network of that depth, as Lemma \ref{param_pwr_geq_param_tun} tells us. This, finally, with Lemma \ref{aff_effect_on_layer_architecture}, Corollary \ref{affcor}, and Lemma \ref{power_prop} then implies that: \\
|
||||
\begin{align}
|
||||
\param\lp \pnm^{q,\ve}_{n,C} \rp &= \param \lp \bigoplus^n_{i=0} \lb c_i \triangleright\lb \tun_{\max_i \left\{\dep \lp \pwr_i^{q,\ve} \rp\right\} +1 - \dep \lp \pwr^{q,\ve}_i\rp} \bullet \pwr_i^{q,\ve}\rb \rb \rp\nonumber \\
|
||||
&\les \lp n+1 \rp \cdot \param \lp c_i \triangleright \lb \tun_1 \bullet \pwr_n^{q,\ve} \rb\rp \nonumber\\
|
||||
|
@ -1385,7 +1385,7 @@ Let $\mathfrak{p}_i$ for $i \in \{1,2,...\}$ be the set of functions defined for
|
|||
\end{align}
|
||||
This completes the proof of the Lemma.
|
||||
\end{proof}
|
||||
\subsection{$\xpn_n^{q,\ve}$, $\csn_n^{q,\ve}$, $\sne_n^{q,\ve}$, and Neural Network Approximations of $e^x$, $\cos(x)$, and $\sin(x)$.}
|
||||
\subsection{$\xpn_n^{q,\ve}$, $\csn_n^{q,\ve}$, $\sne_n^{q,\ve}$, and Artificial Neural Network Approximations of $e^x$, $\cos(x)$, and $\sin(x)$.}
|
||||
Once we have neural network polynomials, we may take the next leap to transcendental functions. For approximating them we will use Taylor expansions which will swiftly give us our approximations for our desired functions. Here, we will explore neural network approximations for three common transcendental functions: $e^x$, $\cos(x)$, and $\sin(x)$.
|
||||
|
||||
\begin{lemma}
|
||||
|
@ -1412,7 +1412,7 @@ Once we have neural network polynomials, we may take the next leap to transcende
|
|||
This is a consequence of a finite number of applications of (\ref{6.2.14}).
|
||||
\end{proof}
|
||||
\begin{definition}[R\textemdash 2023, $\xpn_n^{q,\ve}$ and the Neural Network Taylor Approximations for $e^x$ around $x=0$]
|
||||
Let $\delta,\ve \in \lp 0,\infty \rp $, $q\in \lp 2,\infty \rp$ and $\delta = \ve \lp 2^{q-1} +1\rp^{-1}$, and let $\pwr_n^{q,\ve}$ be as in Lemma \ref{power_prop}. We define, for all $n\in \N_0$, the family of neural networks $\xpn_n^{q,\ve} as$:
|
||||
Let $\delta,\ve \in \lp 0,\infty \rp $, $q\in \lp 2,\infty \rp$ and $\delta = \ve \lp 2^{q-1} +1\rp^{-1}$, and let $\pwr_n^{q,\ve} \subsetneq \neu$ be as in Lemma \ref{power_prop}. We define, for all $n\in \N_0$, the family of neural networks $\xpn_n^{q,\ve} as$:
|
||||
\begin{align}
|
||||
\xpn_n^{q,\ve}\coloneqq \bigoplus^n_{i=0} \lb \frac{1}{i!} \triangleright\lb \tun_{\max_i \left\{\dep \lp \pwr_i^{q,\ve} \rp\right\} +1 - \dep \lp \pwr^{q,\ve}_i\rp} \bullet \pwr_i^{q,\ve}\rb \rb
|
||||
\end{align}
|
||||
|
@ -1495,7 +1495,7 @@ Once we have neural network polynomials, we may take the next leap to transcende
|
|||
2 & :n =0 \\
|
||||
\lp 2n+1\rp\lb 4^{2n+\frac{3}{2}} + \lp \frac{4^{2n+1}-1}{3}\rp \lp \frac{360q}{q-2} \lb \log_2 \lp \ve^{-1} \rp +q+1 \rb +372\rp\rb &:n\in \N
|
||||
\end{cases}$ \\~\\
|
||||
\item $\left|\sum^n_{i=0} \frac{(-1)^i}{2i!}x^{2i} - \real_{\rect} \lp \csn_n^{q,\ve} \rp \lp x \rp \right| \les \sum^n_{i=1} \left| \frac{\lp -1\rp^i}{2i!}\right|\lp \left| x \lp x^{2i-1} - \real_{\rect}\lp \pwr^{q,\ve}_{2i-1}\rp\lp x\rp\rp\right| + \ve + |x|^q + \mathfrak{p}_{2i-1}^q \rp $\\~\\
|
||||
\item $\left|\sum^n_{i=0} \frac{(-1)^i}{2i!}x^{2i} - \real_{\rect} \lp \csn_n^{q,\ve} \rp \lp x \rp \right| \\ \les \sum^n_{i=1} \left| \frac{\lp -1\rp^i}{2i!}\right|\lp \left| x \lp x^{2i-1} - \real_{\rect}\lp \pwr^{q,\ve}_{2i-1}\rp\lp x\rp\rp\right| + \ve + |x|^q + \mathfrak{p}_{2i-1}^q \rp $\\~\\
|
||||
Where $\mathfrak{p}_i$ are the set of functions defined for $i \in \N$ as such:
|
||||
\begin{align}
|
||||
\mathfrak{p}_1 &= \ve+1+|x|^2 \nonumber\\
|
||||
|
@ -1552,7 +1552,7 @@ Once we have neural network polynomials, we may take the next leap to transcende
|
|||
\begin{lemma}[R\textemdash, 2023]
|
||||
Let $\delta,\ve \in \lp 0,\infty \rp $, $q\in \lp 2,\infty \rp$ and $\delta = \ve \lp 2^{q-1} +1\rp^{-1}.$ It is then the case for all $n\in\N_0$ and $x\in [a,b]\subseteq \lb 0,\infty \rp$ that:
|
||||
\begin{align}
|
||||
\left| \cos\lp x\rp - \real_{\rect} \lp \csn_n^{q,\ve} \rp \lp x \rp \right| \les \sum^n_{i=0} \frac{\lp -1\rp^i}{2i!}\lp \left| x \lp x^{n-1} - \real_{\rect}\lp \pwr^{q,\ve}_{n-1}\rp\lp x\rp\rp\right| + \ve + |x|^q + \mathfrak{p}_{n-1}^q \rp + + \frac{|x|^{n+1}}{(n+1)!}\nonumber
|
||||
&\left| \cos\lp x\rp - \real_{\rect} \lp \csn_n^{q,\ve} \rp \lp x \rp \right|\\ &\les \sum^n_{i=0} \frac{\lp -1\rp^i}{2i!}\lp \left| x \lp x^{n-1} - \real_{\rect}\lp \pwr^{q,\ve}_{n-1}\rp\lp x\rp\rp\right| + \ve + |x|^q + \mathfrak{p}_{n-1}^q \rp + + \frac{|x|^{n+1}}{(n+1)!}\nonumber
|
||||
\end{align}
|
||||
\end{lemma}
|
||||
|
||||
|
@ -1629,7 +1629,7 @@ Once we have neural network polynomials, we may take the next leap to transcende
|
|||
\end{align}
|
||||
\end{proof}
|
||||
|
||||
\begin{remark}
|
||||
\begin{remark}\label{rem:pyth_idt}
|
||||
Note that under these neural network architectures the famous Pythagorean identity $\sin^2\lp x\rp + \cos^2 \lp x\rp = 1$, may be rendered approximately, for fixed $n,q,\ve$ as: $\lb \sqr^{q,\ve}\bullet \csn^{q,\ve}_n \rb \oplus\lb \sqr^{q,\ve}\bullet \sne^{q,\ve}_n\rb$. A full discussion of the associated parameter, depth, and accuracy bounds are beyond the scope of this dissertation, and may be appropriate for future work.
|
||||
\end{remark}
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
\chapter{ANN representations of Brownian Motion Monte Carlo}
|
||||
\textbf{This is tentative without any reference to $f$.}
|
||||
|
||||
\begin{lemma}[R--,2023]
|
||||
\begin{lemma}[R\textemdash,2023]
|
||||
Let $d,M \in \N$, $T \in (0,\infty)$ , $\act \in C(\R,\R)$, $ \Gamma \in \neu$, satisfy that $\real_{\act} \lp \mathsf{G}_d \rp \in C \lp \R^d, \R \rp$, for every $\theta \in \Theta$, let $\mathcal{U}^\theta: [0,T] \rightarrow [0,T]$ and $\mathcal{W}^\theta:[0,T] \rightarrow \R^d$ be functions , for every $\theta \in \Theta$, let $U^\theta: [0,T] \rightarrow \R^d \rightarrow \R$ satisfy satisfy for all $t \in [0,T]$, $x \in \R^d$ that:
|
||||
\begin{align}
|
||||
U^\theta(t,x) = \frac{1}{M} \lb \sum^M_{k=1} \lp \real_{\act} \lp \Gamma \rp \rp \lp x+ \mathcal{W}^{\lp \theta,0,-k \rp } \rp \rb
|
||||
|
@ -223,7 +223,7 @@ This proves Item (v) and hence the whole lemma.
|
|||
|
||||
\tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
|
||||
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=1]
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=0.9]
|
||||
%uncomment if require: \path (0,477); %set diagram left start at 0, and has a height of 477
|
||||
|
||||
%Shape: Rectangle [id:dp8133807694586985]
|
||||
|
@ -439,7 +439,7 @@ Let $n, N,h\in \N$. Let $\delta,\ve \in \lp 0,\infty \rp $, $q\in \lp 2,\infty \
|
|||
\end{center}
|
||||
|
||||
\begin{proof}
|
||||
Note that from Lemma \ref{comp_prop}, and Lemma \ref{inst_of_stk}, we have that for $\fx \in \R^{N+1}$, and $x \in \R^d$ it is the case that $\real_{\rect} \lp \prd^{q,\ve} \bullet \lb \mathsf{E}^{N,h,q,\ve}_{n} \DDiamond \mathsf{G}_d \rb \rp\lp f\lp \lb \fx\rb_* \frown x\rp\rp = \real_{\rect} \lp \prd^{q,\ve}\rp \circ \real_{\rect}\lp \lb \mathsf{E}^{N,h,q,\ve}_{n} \DDiamond \mathsf{G}_d \rb \rp \lp f\lp \lb \fx\rb_*\rp \frown x\rp $. Then Lemma \ref{prd_network} tells us that $\real_{\rect} \lp \prd^{q,\ve}\rp \in C \lp \R^2,\R\rp$. Lemma \ref{mathsfE} tells us that $\real_{\rect }\lp \mathsf{E}^{N,h,q,\ve}_{n} \rp \in C \lp \R^{N+1},\R\rp$ and by hypothesis it is the case that $\real_{\rect} \lp \mathsf{G}_d\rp \in C \lp \R^d,\R\rp $. Thus, by the stacking properties of continuous instantiated networks and the fact that the composition of continuous functions is continuous, we have that $\real_{\rect} \lp \mathsf{UE}^{N, h,q,\ve}_{n,\mathsf{G}_d}\rp \in C \lp \R^{N+1} \times \R^d,\R \rp$.
|
||||
Note that from Lemma \ref{comp_prop}, and Lemma \ref{inst_of_stk}, we have that for $\fx \in \R^{N+1}$, and $x \in \R^d$ it is the case that $\real_{\rect} \lp \prd^{q,\ve} \bullet \lb \mathsf{E}^{N,h,q,\ve}_{n} \DDiamond \mathsf{G}_d \rb \rp\lp f\lp \lb \fx\rb_* \frown x\rp\rp = \real_{\rect} \lp \prd^{q,\ve}\rp \circ \real_{\rect}\lp \lb \mathsf{E}^{N,h,q,\ve}_{n} \DDiamond \mathsf{G}_d \rb \rp \\ \lp f\lp \lb \fx\rb_*\rp \frown x\rp $. Then Lemma \ref{prd_network} tells us that $\real_{\rect} \lp \prd^{q,\ve}\rp \in C \lp \R^2,\R\rp$. Lemma \ref{mathsfE} tells us that $\real_{\rect }\lp \mathsf{E}^{N,h,q,\ve}_{n} \rp \in C \lp \R^{N+1},\R\rp$ and by hypothesis it is the case that $\real_{\rect} \lp \mathsf{G}_d\rp \in C \lp \R^d,\R\rp $. Thus, by the stacking properties of continuous instantiated networks and the fact that the composition of continuous functions is continuous, we have that $\real_{\rect} \lp \mathsf{UE}^{N, h,q,\ve}_{n,\mathsf{G}_d}\rp \in C \lp \R^{N+1} \times \R^d,\R \rp$.
|
||||
|
||||
Note that by Lemma \ref{comp_prop} it is the case that:
|
||||
\begin{align}
|
||||
|
@ -754,7 +754,7 @@ Note that for a fixed $T \in \lp 0,\infty \rp$ it is the case that $u_d\lp t,x \
|
|||
\end{align}
|
||||
This proves the Lemma.
|
||||
\end{proof}
|
||||
\begin{lemma}[R\textemdash, 2024, Approximants for Brownian Motion]
|
||||
\begin{lemma}[R\textemdash, 2024, Approximants for Brownian Motion]\label{ues}
|
||||
|
||||
Let $t \in \lp 0,\infty\rp$ and $T \in \lp t,\infty\rp$. Let $\lp \Omega, \mathcal{F}, \mathbb{P}\rp$ be a probability space. Let $n,N\in \N$, and $h \in \lp 0, \infty \rp$. Let $\delta,\ve \in \lp 0,\infty \rp $, $q\in \lp 2,\infty \rp$, satisfy that $\delta = \ve \lp 2^{q-1} +1\rp^{-1}$. Let $f:[t, T] \rightarrow \R$ be continuous almost everywhere in $\lb t, T \rb$. Let it also be the case that $f = g \circ \fh$, where $\fh: \lb t,T\rb \rightarrow \R^d$, and $g: \R^d \rightarrow \R$. Let $t=t_0 \les t_1\les \cdots \les t_{N-1} \les t_N=T$ such that for all $i \in \{0,1,...,N\}$ it is the case that $h = \frac{T-t}{N}$, and $t_i = t_0+i\cdot h$ . Let $\mathbf{t} = \lb t_0 \: t_1\: \cdots t_N \rb$ and as such let $f\lp\lb \mathbf{t} \rb_{*,*} \rp = \lb f(t_0) \: f(t_1)\: \cdots \: f(t_N) \rb$. Let $u_d \in C \lp \R^d,\R\rp$ satisfy for all $d \in \N$, $t \in \lb 0,T\rb$, $x \in \R^d$ that:
|
||||
\begin{align}
|
||||
|
@ -812,7 +812,7 @@ Let $t \in \lp 0,\infty\rp$ and $T \in \lp t,\infty\rp$. Let $\lp \Omega, \mathc
|
|||
\end{enumerate}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Note that for all $i \in \{ 1,2,\hdots, \mathfrak{n}\}$, Lemma \ref{UEX} tells us that $\real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rp \in C\lp \R^{N+1} \times \R^d, \R\rp$. Lemma \ref{nn_sum_cont} and Lemma \ref{nn_sum_is_sum_nn}, thus tells us that $\real_{\rect}\lp \lp \bigoplus_{i=1}^{\mathfrak{n}}\lb \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rb\rp\rp = \sum_{i=1}^\mathfrak{n}\lb \real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rp\rb $. The sum of continuous functions is continuous. Note next that $\frac{1}{\mathfrak{n}}\triangleright$ is an affine neural network, and hence, by Lemma \ref{aff_prop}, must be continuous.
|
||||
Note that for all $i \in \{ 1,2,\hdots, \mathfrak{n}\}$, Lemma \ref{UEX} tells us that $\real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rp \\ \in C\lp \R^{N+1} \times \R^d, \R\rp$. Lemma \ref{nn_sum_cont} and Lemma \ref{nn_sum_is_sum_nn}, thus tells us that \\ $\real_{\rect}\lp \lp \bigoplus_{i=1}^{\mathfrak{n}}\lb \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rb\rp\rp = \sum_{i=1}^\mathfrak{n}\lb \real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rp\rb $. The sum of continuous functions is continuous. Note next that $\frac{1}{\mathfrak{n}}\triangleright$ is an affine neural network, and hence, by Lemma \ref{aff_prop}, must be continuous.
|
||||
|
||||
Then Lemmas \ref{comp_prop}, \ref{5.3.4}, and the fact that by Lemma \ref{UEX} each of the individual stacked $\mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}$ neural networks is continuous then ensures us that it must therefore be the case that: $\real_{\rect} \lp \mathsf{UES}^{N,h,q,\ve}_{n,\mathsf{G}_d, \Omega,\fn}\rp \in C \lp \R^{\mathfrak{n}\lp N+1 \rp}\times \R^{\mathfrak{n} d}, \R \rp$. This proves Item (i).
|
||||
|
||||
|
@ -863,71 +863,101 @@ Let $t \in \lp 0,\infty\rp$ and $T \in \lp t,\infty\rp$. Let $\lp \Omega, \mathc
|
|||
% &\les \cancel{\frac{1}{\mathfrak{n}} \sum^{\mathfrak{n}}_{i=1}}\left| \exp \lp \int^T_tf\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp ds \cdot u^T_d\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp\rp - \real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i} \rp \right| \nonumber\\
|
||||
% &\les \left| \exp \lp \int^T_tf\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp ds \cdot u^T_d\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp\rp - \real_{\rect}\lp \mathsf{UEX}^{N,h,q,\ve}_{n,\mathsf{G}_d,\omega_i}\rp \right| \nonumber
|
||||
% \end{align}
|
||||
\begin{corollary}
|
||||
\begin{corollary}\label{cor_ues}
|
||||
Let $N,n,\fn \in \N$, $h,\ve \in \lp 0,\infty\rp$, $q\in\lp 2,\infty\rp$, given $\mathsf{UES}^{N,h,q,\ve}_{n,\mathsf{G}_d, \Omega, \fn} \subsetneq \neu $, it is the case that:
|
||||
\begin{align}
|
||||
\E\left| \E \lb \exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot u\lp T,\cX^{d,t,x}_{r,\Omega}\rp\rb -\frac{1}{\mathfrak{n}}\lb \sum^{\mathfrak{n}}_{i=1}\lb \exp \lp \int_t^T \alpha_d \circ \mathcal{X}^{d,t,x}_{r,\omega_i}\rp ds \cdot u_d^T\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp\rb \rb \right|\nonumber
|
||||
\E\left| \E \lb \exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot \fu_d^T\lp \cX^{d,t,x}_{r,\Omega}\rp\rb -\frac{1}{\mathfrak{n}}\lb \sum^{\mathfrak{n}}_{i=1}\lb \exp \lp \int_t^T \alpha_d \circ \mathcal{X}^{d,t,x}_{r,\omega_i}\rp ds \cdot \fu_d^T\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp\rb \rb \right|\nonumber
|
||||
\end{align}
|
||||
\end{corollary}
|
||||
\begin{proof}
|
||||
Note that $\fu^T$ is deterministic, and $\cX^{d,t,x}_{r,\Omega}$ is a $d$-vector of random variables, where $\mu = \mymathbb{0}_d$, and $\Sigma = \mathbb{I}_d$. Whence we have that:
|
||||
\begin{align}
|
||||
\var \lb \fu^T\lp x\rp\rb &= \lb \nabla \fu^T \lp x\rp\rb^\intercal \cdot \mathbb{I}_d \cdot \nabla \fu^T\lp x\rp + \frac{1}{2}\cdot \Trace\lp \Hess_x^2 \lp f\rp\lp x\rp\rp \nonumber \\
|
||||
&= \lb \nabla \fu^T\lp x\rp \rb_*^2 + \frac{1}{2}\cdot \Trace\lp \Hess_x^2\lp f\rp\lp x\rp\rp
|
||||
\end{align}
|
||||
We will call the right hand side of the equation above as $\fU.$
|
||||
|
||||
For the second factor in our product consider the following:
|
||||
\begin{align}
|
||||
\cY^{d,t}_{x,s} = \int_t^T\alpha_d \circ \cX^{d,t,x}_{r,\Omega}ds
|
||||
\end{align}
|
||||
Whose Reimann sum, with $\Delta t = \frac{T-t}{n}$ and $t_k = t+k\Delta t$, and Lemma \ref{var_of_rand} is thus rendered as:
|
||||
\begin{align}
|
||||
\cY_n &= \Delta t \lb \sum^{n-1}_{k=0} \alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rb \nonumber\\
|
||||
\var\lb \cY_n \rb &= \var \lb \Delta_t\sum^{n-1}_{k=0}\alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rb \nonumber\\
|
||||
&= \lp\Delta t\rp^2 \sum^{n-1}_{k=0}\lb \var \lb \alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp \rb\rb \nonumber\\
|
||||
&\les \lp \Delta t\rp^2 \sum^{n-1}_{k=0}\lb \fL^2\cdot \var\lp \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rp\rb \nonumber\\
|
||||
&=\lp \fL\Delta t\rp^2 \sum^{n-1}_{k=0}\lb \var \lp \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rp \rb
|
||||
\end{align}
|
||||
\textbf{Alternatively}:
|
||||
\begin{align}
|
||||
&\var \lb \int_t^T\alpha \circ \cX\rb \\
|
||||
&=\E \lb \lp \int^T_t \alpha \circ \cX \rp^2\rb - \lp \E \lb \int^T_t \alpha \circ \cX \rb\rp^2 \\
|
||||
&=\E \lb \int^T_t\lp \alpha \circ \cX \rp^2\rb - \lp \int_t^T \E \lb \alpha \circ \cX \rb\rp^2 \\
|
||||
&=
|
||||
\end{align}
|
||||
|
||||
Note that since $\alpha_d$ is Lipschitz with constant $\fL$ we may say that for $\fX^x_t = \cX_t -x$ that:
|
||||
\begin{align}
|
||||
\left| \alpha_d\circ \fX^x_t -\alpha_d \circ \fX^x_0 \right| &\les \fL \cdot\left|\fX^x_t - \fX^x_0\right| \nonumber\\
|
||||
\implies \left| \alpha_d \circ \fX^x_t - \alpha_d\lp 0\rp\right| &\les \fL \left| \fX^x_t-0\right| \nonumber \\
|
||||
\implies \alpha_d \circ \fX^x_t &\les \alpha_d\lp 0\rp + \fL t
|
||||
\end{align}
|
||||
Thus it is the case that:
|
||||
\begin{align}
|
||||
\left| \E \lb \int^T_t \alpha_d \circ \fX_s^t ds \rb\right| &\les \left| \E \lb \int^T_t \alpha_d \lp 0\rp + \fL s ds\rb\right| \nonumber\\
|
||||
&\les \left| \E \lb \int^T_t\alpha_d\lp 0\rp ds +\int^T_t \fL s ds\rb\right| \nonumber\\
|
||||
&\les |\alpha_d\lp 0\rp |\lp T-t\rp + \fL \lp \frac{T^2-t^2}{2} \rp
|
||||
\end{align}
|
||||
We will call the right hand side as $\mathfrak{E}$.
|
||||
|
||||
And it is also the case that:
|
||||
\begin{align}
|
||||
\left| \E \lb \lp \int^T_t \alpha_d \circ \fX^x_t \rp^2\rb\right| &\les \left| \E \lb \iint_{s,\fs=t}^T \lp \alpha_d \circ \fX^x_s\rp\lp \alpha_d \circ \fX^x_\fs\rp\rb dsd\fs\right| \nonumber\\
|
||||
&\les |\alpha_d\lp 0\rp|^2\lp T-t\rp^2 +2\fL |\alpha_d\lp 0\rp |\lp T-t\rp\lp \frac{T^2-t^2}{2}\rp + \fL^2\lp \frac{T^2-t^2}{2}\rp \nonumber
|
||||
\end{align}
|
||||
Thus it is the case that:
|
||||
\begin{align}
|
||||
\var\lp \int_t^T\alpha_d \circ \fX^x_t\rp &\les |\alpha_d\lp 0\rp|^2\lp T-t\rp^2 +2\fL |\alpha_d\lp 0\rp |\lp T-t\rp\lp \frac{T^2-t^2}{2}\rp + \fL^2\lp \frac{T^2-t^2}{2}\rp \nonumber\\
|
||||
&+ |\alpha_d\lp 0\rp |\lp T-t\rp + \fL \lp \frac{T^2-t^2}{2} \rp \nonumber
|
||||
\end{align}
|
||||
Denote the right hand side of the equation above as $\fV$. The the variance vecomes:
|
||||
|
||||
Note that \cite[Corollary~3.8]{hutzenthaler_strong_2021} tells us that:
|
||||
\begin{align}
|
||||
&\E\left| \E \lb \exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot \fu_d^T\lp \cX^{d,t,x}_{r,\Omega}\rp\rb -\frac{1}{\mathfrak{n}}\lb \sum^{\mathfrak{n}}_{i=1}\lb \exp \lp \int_t^T \alpha_d \circ \mathcal{X}^{d,t,x}_{r,\omega_i}\rp ds \cdot \fu_d^T\lp \mathcal{X}^{d,t,x}_{r,\omega_i}\rp\rb \rb \right|\nonumber \\
|
||||
&\les \frac{\fK_p \sqrt{p-1}}{n^{\frac{1}{2}}} \lp \E \lb \left| \exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot \fu_d^T\lp\cX^{d,t,x}_{r,\Omega}\rp \right|\rb \rp
|
||||
\end{align}
|
||||
|
||||
Note that Taylor's theorem states that:
|
||||
\begin{align}
|
||||
\exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega}ds\rp = 1 + \int^T_t \alpha_d \circ \cX ^{d,t,x}_{r,\Omega}ds + \frac{1}{2}\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega }\rp^2 ds + \fR_3
|
||||
\end{align}
|
||||
Where $\fR_3$ is the Lagrange form of the reamainder. Thus $\exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot u\lp T,\cX^{d,t,x}_{r,\Omega}\rp$ is rendered as:
|
||||
\begin{align}
|
||||
&\exp\lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega } ds\rp \cdot \fu_d^T\lp\cX^{d,t,x}_{r,\Omega}\rp \\
|
||||
&= \fu^T\lp\cX^{d,t,s}_{r,\Omega }\rp + \fu_d^T\lp \cX^{d,t,s}_{r,\Omega}\rp \cdot \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\omega } + \frac{1}{2} \fu^T \lp \cX^{d,t,x}_{r,\Omega}\rp \cdot \lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega}\rp^2 \\
|
||||
&+\fR_3 \cdot \fu_d^T\lp \cX^{d,t,s}_{r,\Omega}\rp
|
||||
\end{align}
|
||||
\end{proof}
|
||||
\begin{corollary}
|
||||
We may see that
|
||||
\end{corollary}
|
||||
Jensen's Inequality, the fact that $\fu^T$ does not depend on time, and the linearity of integrals gives us:
|
||||
\begin{align}
|
||||
&= \fu^T\lp\cX^{d,t,s}_{r,\Omega }\rp + \fu^T\lp \cX^{d,t,s}_{r,\Omega}\rp \cdot \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\omega } ds + \frac{1}{2} \fu^T \lp \cX^{d,t,x}_{r,\Omega}\rp \cdot \lp \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega}ds\rp^2 \nonumber\\
|
||||
&+\fR_3 \cdot \fu^T\lp \cX^{d,t,s}_{r,\Omega}\rp \nonumber\\
|
||||
&\les \fu^T\lp \cX^{d,t,x}_{r,\Omega}\rp + \fu^T\lp \cX^{d,t,s}_{r,\Omega}\rp \cdot \int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega} ds + \frac{1}{2}\fu^T\lp \cX^{d,t,x}_{r,\Omega}\rp\cdot \lp \frac{1}{T-t}\int^T_t \alpha_d \circ \cX^{d,t,x}_{r,\Omega} ds\rp^2 \\ &+ \fR_3\nonumber\\
|
||||
&\les \fu^T\lp \cX^{d,t,x}_{r,\Omega}\rp + \int_t^T\fu^T\lp \cX^{d,t,x}_{r,\Omega} \rp \cdot \alpha_d \circ \cX^{d,t,x}_{r,\Omega} ds + \int^T_t \frac{1}{2\lp T-t\rp}\fu^T\lp \cX^{d,t,x}_{r,\Omega }\rp \cdot \lp \alpha_d \circ \cX^{d,t,x}_{r,\Omega }\rp^2 ds\\ &+ \fR_3\nonumber \\
|
||||
&= \fu^T\lp \cX^{d,t,x}_{r,\Omega} \rp + \int^T_t \fu^T \lp \cX^{d,t,x}_{r,\Omega}\rp \cdot \alpha_d \circ \cX^{d,t,x}_{r,\Omega} + \frac{1}{2\lp T-t\rp}\fu^T\lp \cX^{d,t,x}_{r,\Omega}\rp \cdot \lp\alpha_d \circ \cX^{d,t,x}_{r,\Omega}\rp^2 ds + \fR_3\nonumber
|
||||
\end{align}
|
||||
Thus \cite[Lemma~2.3]{hutzenthaler_strong_2021} with $f \curvearrowleft \fu^T$ tells us that:
|
||||
\begin{align}
|
||||
\E
|
||||
\end{align}
|
||||
% \begin{proof}
|
||||
% Note that $\fu^T$ is deterministic, and $\cX^{d,t,x}_{r,\Omega}$ is a $d$-vector of random variables, where $\mu = \mymathbb{0}_d$, and $\Sigma = \mathbb{I}_d$. Whence we have that:
|
||||
% \begin{align}
|
||||
% \var \lb \fu^T\lp x\rp\rb &= \lb \nabla \fu^T \lp x\rp\rb^\intercal \cdot \mathbb{I}_d \cdot \nabla \fu^T\lp x\rp + \frac{1}{2}\cdot \Trace\lp \Hess_x^2 \lp f\rp\lp x\rp\rp \nonumber \\
|
||||
% &= \lb \nabla \fu^T\lp x\rp \rb_*^2 + \frac{1}{2}\cdot \Trace\lp \Hess_x^2\lp f\rp\lp x\rp\rp
|
||||
% \end{align}
|
||||
% We will call the right hand side of the equation above as $\fU.$
|
||||
%
|
||||
% For the second factor in our product consider the following:
|
||||
% \begin{align}
|
||||
% \cY^{d,t}_{x,s} = \int_t^T\alpha_d \circ \cX^{d,t,x}_{r,\Omega}ds
|
||||
% \end{align}
|
||||
% Whose Reimann sum, with $\Delta t = \frac{T-t}{n}$ and $t_k = t+k\Delta t$, and Lemma \ref{var_of_rand} is thus rendered as:
|
||||
% \begin{align}
|
||||
% \cY_n &= \Delta t \lb \sum^{n-1}_{k=0} \alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rb \nonumber\\
|
||||
% \var\lb \cY_n \rb &= \var \lb \Delta_t\sum^{n-1}_{k=0}\alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rb \nonumber\\
|
||||
% &= \lp\Delta t\rp^2 \sum^{n-1}_{k=0}\lb \var \lb \alpha \circ \cX^{d,t,x}_{r,\Omega}\lp t_k\rp \rb\rb \nonumber\\
|
||||
% &\les \lp \Delta t\rp^2 \sum^{n-1}_{k=0}\lb \fL^2\cdot \var\lp \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rp\rb \nonumber\\
|
||||
% &=\lp \fL\Delta t\rp^2 \sum^{n-1}_{k=0}\lb \var \lp \cX^{d,t,x}_{r,\Omega}\lp t_k\rp\rp \rb
|
||||
% \end{align}
|
||||
% \textbf{Alternatively}:
|
||||
% \begin{align}
|
||||
% &\var \lb \int_t^T\alpha \circ \cX\rb \\
|
||||
% &=\E \lb \lp \int^T_t \alpha \circ \cX \rp^2\rb - \lp \E \lb \int^T_t \alpha \circ \cX \rb\rp^2 \\
|
||||
% &=\E \lb \int^T_t\lp \alpha \circ \cX \rp^2\rb - \lp \int_t^T \E \lb \alpha \circ \cX \rb\rp^2 \\
|
||||
% &=
|
||||
% \end{align}
|
||||
%
|
||||
% Note that since $\alpha_d$ is Lipschitz with constant $\fL$ we may say that for $\fX^x_t = \cX_t -x$ that:
|
||||
% \begin{align}
|
||||
% \left| \alpha_d\circ \fX^x_t -\alpha_d \circ \fX^x_0 \right| &\les \fL \cdot\left|\fX^x_t - \fX^x_0\right| \nonumber\\
|
||||
% \implies \left| \alpha_d \circ \fX^x_t - \alpha_d\lp 0\rp\right| &\les \fL \left| \fX^x_t-0\right| \nonumber \\
|
||||
% \implies \alpha_d \circ \fX^x_t &\les \alpha_d\lp 0\rp + \fL t
|
||||
% \end{align}
|
||||
% Thus it is the case that:
|
||||
% \begin{align}
|
||||
% \left| \E \lb \int^T_t \alpha_d \circ \fX_s^t ds \rb\right| &\les \left| \E \lb \int^T_t \alpha_d \lp 0\rp + \fL s ds\rb\right| \nonumber\\
|
||||
% &\les \left| \E \lb \int^T_t\alpha_d\lp 0\rp ds +\int^T_t \fL s ds\rb\right| \nonumber\\
|
||||
% &\les |\alpha_d\lp 0\rp |\lp T-t\rp + \fL \lp \frac{T^2-t^2}{2} \rp
|
||||
% \end{align}
|
||||
% We will call the right hand side as $\mathfrak{E}$.
|
||||
%
|
||||
% And it is also the case that:
|
||||
% \begin{align}
|
||||
% \left| \E \lb \lp \int^T_t \alpha_d \circ \fX^x_t \rp^2\rb\right| &\les \left| \E \lb \iint_{s,\fs=t}^T \lp \alpha_d \circ \fX^x_s\rp\lp \alpha_d \circ \fX^x_\fs\rp\rb dsd\fs\right| \nonumber\\
|
||||
% &\les |\alpha_d\lp 0\rp|^2\lp T-t\rp^2 +2\fL |\alpha_d\lp 0\rp |\lp T-t\rp\lp \frac{T^2-t^2}{2}\rp + \fL^2\lp \frac{T^2-t^2}{2}\rp \nonumber
|
||||
% \end{align}
|
||||
% Thus it is the case that:
|
||||
% \begin{align}
|
||||
% \var\lp \int_t^T\alpha_d \circ \fX^x_t\rp &\les |\alpha_d\lp 0\rp|^2\lp T-t\rp^2 +2\fL |\alpha_d\lp 0\rp |\lp T-t\rp\lp \frac{T^2-t^2}{2}\rp + \fL^2\lp \frac{T^2-t^2}{2}\rp \nonumber\\
|
||||
% &+ |\alpha_d\lp 0\rp |\lp T-t\rp + \fL \lp \frac{T^2-t^2}{2} \rp \nonumber
|
||||
% \end{align}
|
||||
% Denote the right hand side of the equation above as $\fV$. The the variance vecomes:
|
||||
%
|
||||
%
|
||||
% \end{proof}
|
||||
% \begin{corollary}
|
||||
% We may see that
|
||||
% \end{corollary}
|
||||
|
||||
% This renders (\ref{big_eqn_lhs}) as:
|
||||
% \begin{align}
|
||||
|
@ -1020,7 +1050,7 @@ Let $t \in \lp 0,\infty\rp$ and $T \in \lp t,\infty\rp$. Let $\lp \Omega, \mathc
|
|||
|
||||
\tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
|
||||
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=1]
|
||||
\begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-0.9,xscale=0.9]
|
||||
%uncomment if require: \path (0,475); %set diagram left start at 0, and has a height of 475
|
||||
|
||||
%Shape: Rectangle [id:dp5014556157804896]
|
||||
|
|
|
@ -9,12 +9,12 @@ Parts of this code have been released on \texttt{CRAN} under the package name \t
|
|||
|
||||
\lstinputlisting[language = R, style = rstyle, label = activations, caption = {R code for activation functions ReLU and Sigmoid}]{"/Users/shakilrafi/R-simulations/activations.R"}
|
||||
|
||||
\lstinputlisting[language = R, style = rstyle, label = instantiation, caption = {R code for realizations}]{"/Users/shakilrafi/R-simulations/instantiation.R"}
|
||||
\lstinputlisting[language = R, style = rstyle, label = instantiation, caption = {R code for intanitation}]{"/Users/shakilrafi/R-simulations/instantiation.R"}
|
||||
|
||||
\lstinputlisting[language = R, style = rstyle, label = stk, caption = {R code for parallelizing two neural networks}]{"/Users/shakilrafi/R-simulations/stacking.R"}
|
||||
|
||||
|
||||
\lstinputlisting[language = R, style = rstyle, label = Aff, caption = {R code for affine neural networks}]{"/Users/shakilrafi/R-simulations/Aff.R"}
|
||||
\lstinputlisting[language = R, style = rstyle, label = affn, caption = {R code for affine neural networks}]{"/Users/shakilrafi/R-simulations/Aff.R"}
|
||||
|
||||
|
||||
\lstinputlisting[language = R, style = rstyle, label = comp, caption = {R code for composition of two neural networks}]{"/Users/shakilrafi/R-simulations/comp.R"}
|
||||
|
|
|
@ -26,7 +26,7 @@ Substituting (\ref{4.0.1}) and (\ref{4.0.2}) into (\ref{3.3.20}) renders (\ref{
|
|||
v(t,x) &= \E \lb v\lp T, \mathcal{X}_T^{t,x} \rp \rb + \int ^T_t \E \lb f \lp s, \mathcal{X}^{t,x}_s, v \lp s, \mathcal{X}^{t,x}_s \rp \rp ds\rb \nonumber\\
|
||||
v\lp t,x \rp &= \E \lb g\lp \mathcal{X}^{t,x}_T \rp \rb+ \int^T_t \E \lb \lp F \lp v \rp \rp \lp s,\mathcal{X}^{t,x}_s\rp \rb ds\nonumber
|
||||
\end{align}
|
||||
\label{def:1.18}\label{Setting 1.1} Let $d,m \in \mathbb{N}$, $T, \mathfrak{L},p \in [0,\infty)$, $\mathfrak{p} \in [2,\infty)$ $\mathfrak{m} = \mathfrak{k}_{\mathfrak{p}}\sqrt{\mathfrak{p}-1}$, $\Theta = \bigcup_{n\in \mathbb{N}}\mathbb{Z}^n$, $f \in C\lp \lb 0,T \rb \times \R^d \times \R \rp $, $g \in C(\mathbb{R}^d,\mathbb{R})$, let $F: C \lp \lb 0,T \rb \times \R^d, \R \rp \rightarrow C \lp \lb 0,T \rb \times \R^d, \R \rp$ assume for all $t \in [0,T],x\in \mathbb{R}^d$ that:
|
||||
\label{def:1.18}\label{Setting 1.1} Let $d,m \in \mathbb{N}$, $T, \mathfrak{L},p \in [0,\infty)$, $\mathfrak{p} \in [2,\infty)$ $\mathfrak{m} = \mathfrak{k}_{\mathfrak{p}}\sqrt{\mathfrak{p}-1}$, $\Theta = \bigcup_{n\in \mathbb{N}}\mathbb{Z}^n$, \\ $f \in C\lp \lb 0,T \rb \times \R^d \times \R \rp $, $g \in C(\mathbb{R}^d,\mathbb{R})$, let $F: C \lp \lb 0,T \rb \times \R^d, \R \rp \rightarrow C \lp \lb 0,T \rb \times \R^d, \R \rp$ assume for all $t \in [0,T],x\in \mathbb{R}^d$ that:
|
||||
\begin{align}\label{(1.12)}
|
||||
\lv f\lp t,x,w \rp - f\lp t,x,\mathfrak{w} \rp \rv \leqslant L \lv w - \mathfrak{w} \rv &&\max\left\{\lv f \lp t,x,0 \rp \rv, \lv g(x) \rv \right\} \leqslant \mathfrak{L} \lp 1+\|x\|_E^p \rp
|
||||
\end{align}
|
||||
|
|
|
@ -4,9 +4,9 @@ We will present three avenues of further research and related work on parameter
|
|||
|
||||
\section{Further operations and further kinds of neural networks}
|
||||
|
||||
Note, for instance, that several classical operations are done on neural networks that have yet to be accounted for in this framework and talked about in the literature. We will discuss two of them \textit{dropout} and \textit{dilation} and provide lemmas that may be useful to future research.
|
||||
Note, for instance, that several classical operations are done on neural networks that have yet to be accounted for in this framework and talked about in the literature. We will discuss one of them \textit{dropout} provide lemmas that may be useful to future research.
|
||||
|
||||
\subsection{Mergers and Dropout}
|
||||
\subsection{Dropout}
|
||||
|
||||
\begin{definition}[Hadamard Product]
|
||||
Let $m,n \in \N$. Let $A,B \in \R^{m \times n}$. For all $i \in \{ 1,2,\hdots,m\}$ and $j \in \{ 1,2,\hdots,n\}$ define the Hadamard product $\odot: \R^{m\times n} \times \R^{m \times n} \rightarrow \R^{m \times n}$ as:
|
||||
|
@ -33,14 +33,26 @@ We will also define the dropout operator introduced in \cite{srivastava_dropout_
|
|||
|
||||
|
||||
\begin{definition}[Realization with dropout]
|
||||
Let $\nu \in \neu$, $L,n \in \N$, $p \in \lp 0,1\rp$, $\lay \lp \nu\rp = \lp l_0,l_1,\hdots, \l_L\rp$, and that $\neu = \lp \lp W_1,b_1\rp, \lp W_2,b_2\rp, \hdots , \lp W_L,b_L\rp \rp$. Let it be the case that for each $n\in \N$, $\rho_n = \{ x_1,x_2,\hdots,x_n\} \in \R^n$ where for each $i \in \{1,2,\hdots,n\}$ it is the case that $x_i \sim \bern(p)$. We will then denote $\real_{\rect}^{D} \lp \nu \rp \in C\lp \R^{\inn\lp \nu\rp},\R^{\out\lp \nu \rp}\rp$, the continuous function given by:
|
||||
Let $\nu \in \neu$, $L,n \in \N$, $p \in \lp 0,1\rp$, $\lay \lp \nu\rp = \lp l_0,l_1,\hdots, \l_L\rp$, and that $\nu = \lp \lp W_1,b_1\rp, \lp W_2,b_2\rp, \hdots , \lp W_L,b_L\rp \rp$. Let it be the case that for each $n\in \N$, $\rho_n = \{ x_1,x_2,\hdots,x_n\} \in \R^n$ where for each $i \in \{1,2,\hdots,n\}$ it is the case that $x_i \sim \bern(p)$. We will then denote $\real_{\rect}^{D,p} \lp \nu \rp \in C\lp \R^{\inn\lp \nu\rp},\R^{\out\lp \nu \rp}\rp$, the continuous function given by:
|
||||
\begin{align}
|
||||
\real_{\rect}^D\lp \nu \rp = \rho_{l_L}\odot \rect \lp W_l\lp \rho_{l_{L-1}} \odot \rect \lp W_{L-1}\lp \hdots\rp + b_{L-1}\rp\rp + b_L\rp
|
||||
\real_{\rect}^{D,p}\lp \nu \rp = \rho_{l_L}\odot \rect \lp W_l\lp \rho_{l_{L-1}} \odot \rect \lp W_{L-1}\lp \hdots\rp + b_{L-1}\rp\rp + b_L\rp
|
||||
\end{align}
|
||||
\end{definition}
|
||||
Dropout is an example of \textit{ensemble learning}, a form of learning where versions of our model (e.g. random forests or neural networks) are made (e.g. by dropout for neural networks or by enforcing a maximum depth to the trees in our forest), and a weighted average of the predictions of our different models is taken to be the predictive model. That such a model can work, and indeed work well is the subject of \cite{schapire_strength_1990}.
|
||||
|
||||
\subsection{Further Approximants}
|
||||
|
||||
In theory the approximation schemes given in the case of $\xpn_n^{q,\ve}, \csn_n^{q,\ve}$, and $\sne_n^{q,\ve}$ given in the previous sections, could be used to approximate more transcendental functions, and identities such as alluded to in Remark \ref{rem:pyth_idt}. Indeed, recent attempts have been made to approximate backwards and forward Euler methods as in \cite{grohs2019spacetime}. In fact, we may this architecture was originally envisioned to approximate, Multi-Level Picard iterations, as seen in \cite{ackermann2023deep}. These neural network methods have been proven to beat the curse of dimensionality in the sense that the size of these networks (parameter and depth counts) grow only polynomially with respect to the desired accuracy. In practice, it remains to be seen whether for larger dimensions, the increased number of operations and architectures to contend with do not make up for the polynomial increase in parameter and depths, especially when it comes to computaiton time.
|
||||
|
||||
In a similar note, these architectures have so far lacked a consistent implementation in a widely available programming language. Part of the dissertation work has been focused on implementing these architectures as an $\texttt{R}$ package, available at \texttt{CRAN}.
|
||||
|
||||
\subsection{Algebraic Properties of this Framework}
|
||||
|
||||
It is quite straightforward to see that the instantiation operation has sufficiently functorial properties, at the very least, when instantiating with the identity function. More specifically consider the category \texttt{Mat} whose objects are natural numbers, $m,n$, and whose arrows $m \xleftarrow{A} n$ are matrices $A \in \R^{m\times n}$, i.e. a continuous function between vector spaces $\R^n$ and $\R^m$ respectively. Consider as well the set of neural networks $\nu \subsetneq \neu$ where $\inn\lp \nu \rp = n$ and $\out\lp \nu \rp = m$.
|
||||
\\
|
||||
In such a case, note that the instantiation operation preserves the axiom of functoriality, namely that composition is respected under instantiation. Note also that we have alluded to the fact that under neural network composition, with $\id$ (the appropriate one for our dimension) behaves like a monoid under instantiation.
|
||||
|
||||
A further exploration of the algebraic properties of this framework could present a fruitful avenue of future study.
|
||||
|
||||
This completes this Dissertation.
|
||||
|
||||
|
|
|
@ -614,7 +614,33 @@ archivePrefix = {arXiv},
|
|||
|
||||
@Manual{nnR-package,
title = {nnR: Neural Networks Made Algebraic},
author = {Shakil Rafi and Joshua Lee Padgett},
year = {2024},
note = {R package version 0.1.0},
url = {https://github.com/2shakilrafi/nnR/},
}
|
||||
|
||||
@software{Rafi_nnR_2024,
|
||||
@misc{ackermann2023deep,
|
||||
title={Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense},
|
||||
author={Julia Ackermann and Arnulf Jentzen and Thomas Kruse and Benno Kuckuck and Joshua Lee Padgett},
|
||||
year={2023},
|
||||
eprint={2309.13722},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={math.NA}
|
||||
|
||||
@book{graham_concrete_1994,
|
||||
address = {Upper Saddle River, NJ},
|
||||
edition = {2nd edition},
|
||||
title = {Concrete {Mathematics}: {A} {Foundation} for {Computer} {Science}},
|
||||
isbn = {978-0-201-55802-9},
|
||||
shorttitle = {Concrete {Mathematics}},
|
||||
abstract = {This book introduces the mathematics that supports advanced computer programming and the analysis of algorithms. The primary aim of its well-known authors is to provide a solid and relevant base of mathematical skills - the skills needed to solve complex problems, to evaluate horrendous sums, and to discover subtle patterns in data. It is an indispensable text and reference not only for computer scientists - the authors themselves rely heavily on it! - but for serious users of mathematics in virtually every discipline.Concrete Mathematics is a blending of CONtinuous and disCRETE mathematics. "More concretely," the authors explain, "it is the controlled manipulation of mathematical formulas, using a collection of techniques for solving problems." The subject matter is primarily an expansion of the Mathematical Preliminaries section in Knuth's classic Art of Computer Programming, but the style of presentation is more leisurely, and individual topics are covered more deeply. Several new topics have been added, and the most significant ideas have been traced to their historical roots. The book includes more than 500 exercises, divided into six categories. Complete answers are provided for all exercises, except research problems, making the book particularly valuable for self-study.Major topics include:SumsRecurrencesInteger functionsElementary number theoryBinomial coefficientsGenerating functionsDiscrete probabilityAsymptotic methodsThis second edition includes important new material about mechanical summation. In response to the widespread use of the first edition as a reference book, the bibliography and index have also been expanded, and additional nontrivial improvements can be found on almost every page. Readers will appreciate the informal style of Concrete Mathematics. Particularly enjoyable are the marginal graffiti contributed by students who have taken courses based on this material. The authors want to convey not only the importance of the techniques presented, but some of the fun in learning and using them.},
|
||||
language = {English},
|
||||
publisher = {Addison-Wesley Professional},
|
||||
author = {Graham, Ronald and Knuth, Donald and Patashnik, Oren},
|
||||
month = feb,
|
||||
year = {1994},
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@software{Rafi_nnR_2024,
|
||||
author = {Rafi, Shakil},
|
||||
license = {GPL-3.0},
|
||||
month = feb,
|
||||
|
@ -639,6 +665,26 @@ abstract = {Abstract Quantitative estimation of local mechanical properties rema
|
|||
year = {2021}
|
||||
}
|
||||
|
||||
@article{schapire_strength_1990,
|
||||
title = {The strength of weak learnability},
|
||||
volume = {5},
|
||||
issn = {1573-0565},
|
||||
url = {https://doi.org/10.1007/BF00116037},
|
||||
doi = {10.1007/BF00116037},
|
||||
abstract = {This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class islearnable (orstrongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class isweakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.},
|
||||
language = {en},
|
||||
number = {2},
|
||||
urldate = {2024-03-06},
|
||||
journal = {Mach Learn},
|
||||
author = {Schapire, Robert E.},
|
||||
month = jun,
|
||||
year = {1990},
|
||||
keywords = {learnability theory, learning from examples, Machine learning, PAC learning, polynomial-time identification},
|
||||
pages = {197--227},
|
||||
file = {Full Text PDF:/Users/shakilrafi/Zotero/storage/B4J2KPSN/Schapire - 1990 - The strength of weak learnability.pdf:application/pdf},
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
Binary file not shown.
|
@ -14,7 +14,7 @@ We seek here to introduce a unified framework for artificial neural networks. Th
|
|||
\begin{align}
|
||||
\neu = \bigcup_{L\in \N} \bigcup_{l_0,l_1,...,l_L \in \N} \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp
|
||||
\end{align}
|
||||
An artificial neural network is a tuple $\lp \nu, \param, \dep, \inn, \out, \hid, \lay, \wid \rp $ where $\nu \in \neu$ and is equipped with the following functions (referred to as auxiliary functions) satisfying for all $\nu \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$:
|
||||
An artificial neural network is a tuple $\lp \nu, \param, \dep, \inn, \out, \hid, \lay, \wid \rp $ where $\nu \in \neu$ and is equipped with the following functions (referred to as auxiliary functions) satisfying for all \\$\nu \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$:
|
||||
\begin{enumerate}[label = (\roman*)]
|
||||
\item $\param: \neu \rightarrow \N$ denoting the number of parameters of $\nu$, given by:
|
||||
\begin{align}\label{paramdef}
|
||||
|
@ -50,7 +50,7 @@ We seek here to introduce a unified framework for artificial neural networks. Th
|
|||
\end{align}
|
||||
\end{enumerate}
|
||||
\end{definition}
|
||||
Note that this implies that that $\nu = ((W_1,b_1),(W_2,b_2),...(W_L,b_L)) \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$. Note that we also denote by $\we_{(\cdot ), \nu}: (\we_{n,\nu})_{n\in \{1,2,...,L\}}: \{1,2,...,L\} \rightarrow \lp \bigcup_{m,k \in \N}\R^{m \times k} \rp $ and also $\bi_{(\cdot),\nu}: \lp \bi_{n,\nu} \rp_{\{1,2,...,L\}}: \{1,2,...,L\} \rightarrow \lp \bigcup_{m \in \N}\R^m \rp$ the functions that satisfy for all $n \in \{1,2,...,L\}$ that $\we_{i,\nu} = W_i$ i.e. the weights matrix for neural network $\nu$ at layer $i$ and $\bi_{i,\nu} = b_i$, i.e. the bias vector for neural network $\nu$ at layer $i$.
|
||||
Note that this implies that $\nu = ((W_1,b_1),(W_2,b_2),...(W_L,b_L)) \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$. Note that we denote by $\we_{(\cdot ), \nu}: (\we_{n,\nu})_{n\in \{1,2,...,L\}}: \{1,2,...,L\} \rightarrow \lp \bigcup_{m,k \in \N}\R^{m \times k} \rp $ and also $\bi_{(\cdot),\nu}: \lp \bi_{n,\nu} \rp_{\{1,2,...,L\}}: \{1,2,...,L\} \rightarrow \lp \bigcup_{m \in \N}\R^m \rp$ the functions that satisfy for all $n \in \{1,2,...,L\}$ that $\we_{i,\nu} = W_i$ i.e. the weights matrix for neural network $\nu$ at layer $i$ and $\bi_{i,\nu} = b_i$, i.e. the bias vector for neural network $\nu$ at layer $i$.
|
||||
|
||||
We will call $l_0$ the \textit{starting width} and $l_L$ the \textit{finishing width}. Together, they will be referred to as \textit{end-widths}.
|
||||
\begin{remark}
|
||||
|
@ -62,7 +62,7 @@ We seek here to introduce a unified framework for artificial neural networks. Th
|
|||
\end{remark}
|
||||
|
||||
|
||||
\begin{definition}[Instantiations of Artificial Neural Networks with Activation Functions]\label{def:rlz}
|
||||
\begin{definition}[Instantiations of Artificial Neural Networks with Activation Functions]\label{def:rlz}\label{def:inst}
|
||||
Let $\act \in C \lp \R, \R \rp$, we denote by $\real_{\act}: \neu \rightarrow \lp \bigcup_{k,l \in \N} C \lp \R^k, \R^l \rp \rp$ the function satisfying for all $L \in \N$, $l_0,l_1,...,l_L \in \N$, $\nu = \lp \lp W_1, b_1 \rp , \lp W_2, b_2\rp ,...,\lp W_L, b_L \rp \rp \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$, $x_0 \in \R^{l_0}, x_1 \in \R^{l_1},...,x_{L-1} \in \R^{l_L-1}$ and with $\forall k \in \N \cap (0,L):x_k = \act \lp \lb W_kx_k+b_k \rb_{*,*} \rp$such that:
|
||||
\begin{align}\label{5.1.11}
|
||||
\real_{\act}\lp \nu \rp \in C \lp \R^{l_0}, \R^{l_L} \rp & \text{ and } & \lp \real_{\act}\lp \nu\rp \rp \lp x_0 \rp = W_Lx_{L-1}+b_L
|
||||
|
@ -84,7 +84,7 @@ We seek here to introduce a unified framework for artificial neural networks. Th
|
|||
\caption{A neural network $\nu$ with $\lay \lp \nu \rp = \lp 4,4,3,2\rp$}
|
||||
\end{figure}
|
||||
\begin{remark}
|
||||
For an R implementation see Listings \ref{nn_creator}, \ref{aux_fun}, \ref{activations}, and \ref{instantiation}
|
||||
For an R implementation see Listings \ref{nn_creator}, \ref{aux_fun}, \ref{activations}, and \ref{instantiation}.
|
||||
\end{remark}
|
||||
|
||||
\begin{lemma}\label{5.1.8}
|
||||
|
@ -109,9 +109,9 @@ We seek here to introduce a unified framework for artificial neural networks. Th
|
|||
The first operation we want to be able to do is to compose neural networks. Note that the composition is not done in an obvious way; for instance, note that the last layer of the first component of the composition is superimposed with the first layer of the second component of the composition.
|
||||
\subsection{Composition}
|
||||
\begin{definition}[Compositions of ANNs]\label{5.2.1}\label{def:comp}
|
||||
We denote by $\lp \cdot \rp \bullet \lp \cdot \rp: \{ \lp \nu_1,\nu_2 \rp \in \neu \times \neu: \inn(\nu_1) = \out (\nu_1) \} \rightarrow \neu$ the function satisfying for all $L,M \in \N, l_0,l_1,...,l_L, m_0, m_1,...,m_M \in \N$, $\nu_1 = \lp \lp W_1, b_1 \rp, \lp W_2, b_2 \rp,...,\lp W_L,b_L \rp \rp \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$, and $\nu_2 = \lp \lp W'_1, b'_1 \rp, \lp W'_2, b'_2 \rp,... \lp W'_M, b'_M \rp \rp \in \lp \bigtimes^M_{k=1} \lb \R^{m_k \times m_{k-1}} \times \R^{m_k}\rb \rp$ with $l_0 = \inn(\nu_1)= \out(\nu_2) = m_M$ and :
|
||||
We denote by $\lp \cdot \rp \bullet \lp \cdot \rp: \{ \lp \nu_1,\nu_2 \rp \in \neu \times \neu: \inn(\nu_1) = \out (\nu_1) \} \rightarrow \neu$ the function satisfying for all $L,M \in \N, l_0,l_1,...,l_L, m_0, m_1,...,m_M \in \N$, $\nu_1 = \lp \lp W_1, b_1 \rp, \lp W_2, b_2 \rp,...,\lp W_L,b_L \rp \rp \in \lp \bigtimes^L_{k=1} \lb \R^{l_k \times l_{k-1}} \times \R^{l_k}\rb \rp$, and $\nu_2 = \\ \lp \lp W'_1, b'_1 \rp, \lp W'_2, b'_2 \rp,... \lp W'_M, b'_M \rp \rp \in \lp \bigtimes^M_{k=1} \lb \R^{m_k \times m_{k-1}} \times \R^{m_k}\rb \rp$ with $l_0 = \inn(\nu_1)= \out(\nu_2) = m_M$ and :
|
||||
\begin{align}\label{5.2.1}
|
||||
\nu_1 \bullet \nu_2 = \begin{cases}
|
||||
&\nu_1 \bullet \nu_2 =\\ &\begin{cases}
|
||||
(( W'_1,b'_1 ), ( W'_2,b'_2 ), ...( W'_{M-1}, b'_{M-1}), ( W_1W'_M, W_1b'_{M} + b_1), (W_2, b_2 ),\\..., ( W_L,b_L )) & :( L> 1 ) \land ( M > 1 ) \\
|
||||
((W_1W'_1,W_1b'_1+b_1),(W_2,b_2), (W_3,b_3),...,(W_Lb_L)) & :(L>1) \land (M=1) \\
|
||||
((W'_1, b'_1),(W'_2,b'_2), ..., (W'_{M-1}, b'_{M-1})(W_1, b'_M + b_1)) &:(L=1) \land (M>1) \\
|
||||
|
@ -253,7 +253,9 @@ The following Lemma will be important later on, referenced numerous times, and f
|
|||
\end{align}
|
||||
This and (\ref{comp_cont}) then prove Item (v), hence proving the lemma.
|
||||
\end{proof}
|
||||
\section{Stacking of ANNs of Equal Depth}
|
||||
\section{Stacking of ANNs}
|
||||
We will introduce here the important concept of stacking of ANNs. Given an input vector $x\in \R^d$, it is sometimes very helpful to imagine two neural networks working on them simultaneously, whence we have stacking. Because vectors are ordered tuples, stacking $\nu_1$ and $\nu_2$ is not necessarily the same as stacking $\nu_2$ and $\nu_1$.
|
||||
\subsection{Stacking of ANNs of Equal Depth}
|
||||
\begin{definition}[Stacking of ANNs of same depth]\label{5.2.5}\label{def:stacking}
|
||||
Let $L,n\in \N$, and let $\nu_1,\nu_2,\hdots, \nu_n \in \neu$, such that $\dep\lp \nu_1\rp= \dep \lp \nu_2\rp= \cdots = \dep\lp \nu_n\rp = L$. As such, for all $i \in \{1,\hdots,n\}$, let it also be the case that $\lay\lp \nu_i\rp = \lp \lp W_1^i,b^i_1\rp, \lp W^i_2,b^i_2\rp,\hdots, \lp W_L^i,b_L^i\rp \rp$. We then denote by $\boxminus^n_{i=1}\nu_i$, the neural network whose layer architecture is given by:
|
||||
\begin{align*}
|
||||
|
@ -269,7 +271,7 @@ The following Lemma will be important later on, referenced numerous times, and f
|
|||
Let $\nu_1,\nu_2\in \neu$, with $\dep\lp \nu_1\rp = \dep\lp \nu_2\rp$, $x_1 \in \R^{m_1}$, $x_2 \in \R^{m_2}$, and $\mathfrak{x} \in \R^{m_1+m_2}$. Let $\inst_{\rect}\lp \nu_1\rp: \R^{m_1} \rightarrow \R^{n_1}$, and $\inst_{\rect}:\R^{m_2} \rightarrow \R^{n_2}$. It is then the case that $\real_{\rect}\lp \nu_1\boxminus\nu_2\rp\lp \mathfrak{x}\rp = \inst_{\rect}\lp \nu_1\rp\lp x_1\rp \frown \inst_{\rect}\lp \nu_2\rp\lp x_2\rp$.
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Let $\lay\lp \nu_1\rp = \lp \lp W_1,b_1 \rp,\lp W_2,b_2\rp,\hdots, \lp W_L,b_L\rp\rp$ and $\lay \lp \nu_2\rp = \lp \lp \fW_1, \fb_1\rp, \lp \fW_2,\fb_2\rp,\hdots, \lp \fW_L,\fb_L\rp\rp$, and as such it is the case according to Definition \ref{def:stacking} that:
|
||||
Let $\lay\lp \nu_1\rp = \lp \lp W_1,b_1 \rp,\lp W_2,b_2\rp,\hdots, \lp W_L,b_L\rp\rp$ and \\ $\lay \lp \nu_2\rp = \lp \lp \fW_1, \fb_1\rp, \lp \fW_2,\fb_2\rp,\hdots,\lp \fW_L,\fb_L\rp\rp$, and as such it is the case according to Definition \ref{def:stacking} that:
|
||||
\begin{align*}
|
||||
\lay \lp \nu_1 \boxminus\nu_2\rp = \lp \lp \diag\lp W_1,\fW_1\rp , b_1 \frown \fb_1\rp,\right.\\ \left.\lp \diag\lp W_2,\fW_2\rp , b_2 \frown \fb_2\rp, \right.\\ \left. \vdots \hspace{2.5cm}\right.\\ \left. \lp \diag\lp W_L,\fW_L\rp , b_L^1 \frown \fb_L\rp\rp
|
||||
\end{align*}
|
||||
|
@ -369,11 +371,11 @@ The following Lemma will be important later on, referenced numerous times, and f
|
|||
Let $\act \in C \lp \R, \R \rp$, $n \in \N$, and $\nu = \boxminus_{i=1}^n \nu_i$ satisfy the condition that $\dep(\nu_1) = \dep(\nu_2) =...=\dep(\nu_n)$. It is then the case that $\real_{\act} \lp \nu \rp \in C \lp \R^{\sum_{i=1}^n \inn(\nu_i)}, \R^{\sum^n_{i=1}\out(\nu_i)} \rp $
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Let $L = \dep(\nu_1)$, and let $l_{i,0},l_{i,1}...l_{i,L} \in \N$ satisfy for all $i \in \{ 1,2,...,n\}$ that $\lay(\nu_i) = \lp l_{i,0}, l_{i,1},...,l_{i,L} \rp $. Furthermore let $\lp \lp W_{i,1},b_{i,1}\rp, \lp W_{i,2},b_{i,2} \rp , ..., \lp W_{i,L},b_{i,L} \rp \rp \in \lp \bigtimes^L_{j=1} \lb \R^{l_{i,j} \times l_{i,j-1}} \times \R^{l_{i,j}} \rb \rp $ satisfy for all $i \in \{ 1,2,...,n\}$ that:
|
||||
Let $L = \dep(\nu_1)$, and let $l_{i,0},l_{i,1}...l_{i,L} \in \N$ satisfy for all $i \in \{ 1,2,...,n\}$ that $\lay(\nu_i) = \lp l_{i,0}, l_{i,1},...,l_{i,L} \rp $. Furthermore let $\lp \lp W_{i,1},b_{i,1}\rp, \lp W_{i,2},b_{i,2} \rp , ..., \lp W_{i,L},b_{i,L} \rp \rp \in \\ \lp \bigtimes^L_{j=1} \lb \R^{l_{i,j} \times l_{i,j-1}} \times \R^{l_{i,j}} \rb \rp $ satisfy for all $i \in \{ 1,2,...,n\}$ that:
|
||||
\begin{align}
|
||||
\nu_i = \lp \lp W_{i,1},b_{i,1} \rp , \lp W_{i,2}, b_{i,2}\rp ,...,\lp W_{i,L},b_{i,L} \rp \rp
|
||||
\end{align}
|
||||
Let $\alpha_j \in \N$ with $j \in \{0,1,...,L\}$ satisfy that $\alpha_j = \sum^n_{i=1} l_{i,j}$ and let $\lp \lp A_1,b_1 \rp, \lp A_2,b_2 \rp,...,\lp A_L,b_L \rp \rp \in \lp \bigtimes^L_{j=1} \lb \R^{\alpha_{j} \times \alpha_{j-1}} \times \R^{\alpha_{j}} \rb \rp $ satisfy that:
|
||||
Let $\alpha_j \in \N$ with $j \in \{0,1,...,L\}$ satisfy that $\alpha_j = \sum^n_{i=1} l_{i,j}$ and let \\ $\lp \lp A_1,b_1 \rp, \lp A_2,b_2 \rp,...,\lp A_L,b_L \rp \rp \in \lp \bigtimes^L_{j=1} \lb \R^{\alpha_{j} \times \alpha_{j-1}} \times \R^{\alpha_{j}} \rb \rp $ satisfy that:
|
||||
\begin{align}\label{5.3.5}
|
||||
\boxminus_{i=1}^n \nu_i = \lp \lp A_1,b_1 \rp, \lp A_2,b_2 \rp,...,\lp A_L,b_L \rp \rp
|
||||
\end{align}
|
||||
|
@ -389,10 +391,10 @@ The following Lemma will be important later on, referenced numerous times, and f
|
|||
This proves the lemma.
|
||||
\end{proof}
|
||||
|
||||
\section{Stacking of ANNs of Unequal Depth}
|
||||
We will often encounter neural networks that we want to stack but have unequal depth. Definition \ref{5.2.5} only deals with neural networks of the same depth. We will facilitate this situation by introducing a form of ``padding" for our neural network. Hence, they come out to the same length before stacking them. This padding will be via the "tunneling" neural network, as shown below.
|
||||
\subsection{Stacking of ANNs of Unequal Depth}
|
||||
We will often encounter neural networks that we want to stack but have unequal depth. Definition \ref{5.2.5} only deals with neural networks of the same depth. We will facilitate this situation by introducing a form of padding for our shorter neural network. Hence, they come out to the same length before stacking them. This padding will be via the tunneling neural network, as shown below.
|
||||
\begin{definition}[Identity Neural Network]\label{7.2.1}
|
||||
We will denote by $\id_d \in \neu$ the neural network satisfying for all $d \in \N$ that:
|
||||
Let $d\in \N$. We will denote by $\id_d \in \neu$ the neural network satisfying for all $d \in \N$ that:
|
||||
\begin{enumerate}[label = (\roman*)]
|
||||
\item \begin{align}
|
||||
\id_1 = \lp \lp \begin{bmatrix}
|
||||
|
@ -411,7 +413,7 @@ We will often encounter neural networks that we want to stack but have unequal d
|
|||
For $d>1$.
|
||||
\end{enumerate}
|
||||
\begin{remark}
|
||||
We will discuss some properties of $\id$ in Section \ref{sec_tun}.
|
||||
We will discuss some properties of $\id_d$ in Section \ref{sec_tun}.
|
||||
\end{remark}
|
||||
\end{definition}
|
||||
\begin{definition}[The Tunneling Neural Network]
|
||||
|
@ -490,8 +492,9 @@ Diagrammatically, this can be thought of as:
|
|||
\end{proof}
|
||||
|
||||
\section{Affine Linear Transformations as ANNs and Their Properties.}
|
||||
Affine neural networks present an important class of neural networks. By virtue of them being only one layer deep, they may be instantiated with any activation function whatsoever and still retain their affine transformative properties, see Definition \ref{def:inst}. In addition, when composing, they are subsumed into the function being somposed to, i.e. they do not change the depth of a neural network once composed into it, see Lemma \ref{comp_prop}.
|
||||
\begin{definition}\label{5.3.1}\label{def:aff}
|
||||
Let $m,n \in \N$, $W \in \R^{m \times n}$, $b \in \R^m$.We denote by $\aff_{W,b} \in \lp \R^{m\times n} \times \R^m \rp \subseteq \neu$ the neural network given by $\aff_{W,b} = ((W,b))$.
|
||||
Let $m,n \in \N$, $W \in \R^{m \times n}$, $b \in \R^m$.We denote by $\aff_{W,b} \in \lp \R^{m\times n} \times \R^m \rp \subsetneq \neu$ the neural network given by $\aff_{W,b} = ((W,b))$.
|
||||
\end{definition}
|
||||
\begin{lemma}\label{5.3.2}\label{aff_prop}
|
||||
Let $m,n \in \N$, $W \in \R^{m\times n}$, $b \in \R^m$. It is then the case that:
|
||||
|
@ -502,7 +505,7 @@ Diagrammatically, this can be thought of as:
|
|||
\end{enumerate}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Note that $(i)$ is a consequence of Definition \ref{5.1.2} and \ref{5.3.1}. Note next that $\aff_{W,b} = (W,b) \in (\R^{m\times n} \times \R^m) \subseteq \neu$. Note that ($\ref{5.1.11}$) then tells us that $\real_{\act} (\aff_{W,b}) = Wx+b$ which in turn proves $(ii)$ and $(iii)$
|
||||
Note that $(i)$ is a consequence of Definition \ref{5.1.2} and \ref{5.3.1}. Note next that $\aff_{W,b} = (W,b) \in (\R^{m\times n} \times \R^m) \subsetneq \neu$. Note that ($\ref{5.1.11}$) then tells us that $\real_{\act} (\aff_{W,b}) = Wx+b$ which in turn proves $(ii)$ and $(iii)$
|
||||
\end{proof}
|
||||
\begin{remark}\label{remark:5.4.3}\label{param_of_aff}
|
||||
Given $W\in \R^{m\times n}$, and $b \in \R^{m \times 1}$, it is the case that according to Definition (\ref{paramdef}) we have: $\param(\aff_{W,b})= m\times n + m$
|
||||
|
@ -552,14 +555,14 @@ Diagrammatically, this can be thought of as:
|
|||
\begin{proof}
|
||||
Let it be the case that $\lay \lp \nu\rp = \lp l_0,l_1,...,l_L\rp$ for $l_0,l_1,...,l_L,L \in \N$. Lemma \ref{5.3.3}, Item (i), and Lemma \ref{comp_prop} then tells us that:
|
||||
\begin{align}
|
||||
\param \lp \aff_{W,b} \bullet \nu \rp &= \lb \sum^{L-1}_{m=1} l_m \lp l_{m-1}+1\rp\rb + \out \lp \aff_{W,b}\rp \lp l_{L-1}+1\rp \nonumber \\
|
||||
&\param \lp \aff_{W,b} \bullet \nu \rp\\ &= \lb \sum^{L-1}_{m=1} l_m \lp l_{m-1}+1\rp\rb + \out \lp \aff_{W,b}\rp \lp l_{L-1}+1\rp \nonumber \\
|
||||
&= \lb \sum^{L-1}_{m=1} l_m \lp l_{m-1}+1 \rp\rb+ \lb \frac{\out\lp \aff_{W,b}\rp}{l_L}\rb l_L\lp l_{L-1}+1 \rp \nonumber \\
|
||||
&\les \lb \max \left\{ 1, \frac{\out(\aff_{W,b})}{l_L}\right\}\rb \lb \sum^{L-1}_{m=1} l_m \lp l_{m-1}+1\rp\rb + \lb \max\left\{ 1,\frac{\out\lp \aff_{W,b}\rp}{l_L}\right\}\rb l_L \lp l_{L-1}+1\rp \nonumber\\
|
||||
&= \lb \max\left\{ 1, \frac{\out \lp \aff_{W,b}\rp}{l_L}\right\}\rb \lb \sum^L_{m=1}l_m \lp l_{m-1} +1\rp\rb = \lb \max\left\{ 1, \frac{\out \lp \aff_{W,b}\rp}{l_L}\right\}\rb \param \lp \nu\rp \nonumber
|
||||
\end{align}
|
||||
and further that:
|
||||
\begin{align}
|
||||
\param \lp \nu \bullet\aff_{W,b} \rp &= \lb \sum^{L}_{m=2} l_m \lp l_{m-1}+1\rp\rb + l_{1}\lp \inn \lp \aff_{W,b}\rp+1\rp \nonumber \\
|
||||
&\param \lp \nu \bullet\aff_{W,b} \rp \\ &= \lb \sum^{L}_{m=2} l_m \lp l_{m-1}+1\rp\rb + l_{1}\lp \inn \lp \aff_{W,b}\rp+1\rp \nonumber \\
|
||||
&= \lb \sum^{L}_{m=2} l_m \lp l_{m-1}+1 \rp\rb+ \lb \frac{\inn \lp \aff_{W,b}\rp+1}{l_0+1}\rb l_1\lp l_{0}+1 \rp \nonumber \\
|
||||
&\les \lb \max \left\{ 1, \frac{\inn(\aff_{W,b})+1}{l_0+1}\right\}\rb \lb \sum^{L}_{m=2} l_m \lp l_{m-1}+1\rp\rb + \lb \max\left\{ 1,\frac{\inn\lp \aff_{W,b}\rp+1}{l_0+1}\right\}\rb l_1 \lp l_{0}+1\rp \nonumber\\
|
||||
&= \lb \max\left\{ 1, \frac{\inn \lp \aff_{W,b}\rp+1}{l_0+1}\right\}\rb \lb \sum^L_{m=1}l_m \lp l_{m-1} +1\rp\rb = \lb \max\left\{ 1, \frac{\inn \lp \aff_{W,b}\rp+1}{\inn\lp \nu\rp+1}\right\}\rb \param \lp \nu\rp \nonumber
|
||||
|
@ -575,7 +578,7 @@ Diagrammatically, this can be thought of as:
|
|||
|
||||
\section{Sums of ANNs of Same End-widths}
|
||||
|
||||
\begin{definition}[The $\cpy$ Network]\label{def:cpy}
|
||||
\begin{definition}[The $\cpy_{n,k}$ Network]\label{def:cpy}
|
||||
We define the neural network, $\cpy_{n,k} \in \neu$ for $n,k\in \N$ as the neural network given by:
|
||||
\begin{align}
|
||||
\cpy_{n,k} = \aff_{\underbrace{\lb \mathbb{I}_{k} \: \mathbb{I}_k \: \cdots \: \mathbb{I}_k \rb^T}_{n-\text{many}},\mymathbb{0}_{nk}}
|
||||
|
@ -584,7 +587,7 @@ Diagrammatically, this can be thought of as:
|
|||
\end{definition}
|
||||
|
||||
\begin{remark}
|
||||
See Listing \ref{affn}
|
||||
See Listing \ref{affn}.
|
||||
\end{remark}
|
||||
\begin{lemma}\label{dep_cpy}\label{lem:param_cpy}
|
||||
Let $n,k \in \N$ and let $\cpy_{n,k} \in \neu$, it is then the case for all $n,k \in \N$ that:
|
||||
|
@ -593,11 +596,11 @@ Diagrammatically, this can be thought of as:
|
|||
\item $\param\lp \cpy_{n,k} \rp = nk^2+nk$
|
||||
\end{enumerate}
|
||||
\begin{proof}
|
||||
Note that $(i)$ is a consequence of Definition \ref{5.3.1} and (ii) follows from the structure of $\cpy_{n,k}$.
|
||||
Note that $(i)$ is a consequence of Definition \ref{5.3.1}, and (ii) follows from the structure of $\cpy_{n,k}$.
|
||||
\end{proof}
|
||||
|
||||
\end{lemma}
|
||||
\begin{definition}[The $\sm$ Network]\label{def:sm}
|
||||
\begin{definition}[The $\sm_{n,k}$ Network]\label{def:sm}
|
||||
We define the neural network $\sm_{n,k}$ for $n,k \in \N$ as the neural network given by:
|
||||
\begin{align}
|
||||
\sm_{n,k} = \aff_{\underbrace{\lb \mathbb{I}_k \: \mathbb{I}_k \: \cdots \: \mathbb{I}_k\rb}_{n-\text{many}}, \mymathbb{0}_{k}}
|
||||
|
@ -784,8 +787,8 @@ Diagrammatically, this can be thought of as:
|
|||
\end{align}
|
||||
Applying Claim \ref{5.4.5} and especially the third case of Definition \ref{5.2.1} to to the above then gives us:
|
||||
\begin{align}
|
||||
&\aff_{\lb \mathbb{I}_{\out(\nu_1)} \: \mathbb{I}_{\out(\nu_1)} \rb,0}\bullet \lb \nu_1 \boxminus \nu_2 \rb \bullet \aff_{\lb \mathbb{I}_{\inn \lp \nu_2 \rp } \: \mathbb{I}_{\inn \lp \nu_2 \rp} \rb^T,0} \nonumber\\
|
||||
&= \lp \lp \begin{bmatrix}
|
||||
&\aff_{\lb \mathbb{I}_{\out(\nu_1)} \: \mathbb{I}_{\out(\nu_1)} \rb,0}\bullet \lb \nu_1 \boxminus \nu_2 \rb \bullet \aff_{\lb \mathbb{I}_{\inn \lp \nu_2 \rp } \: \mathbb{I}_{\inn \lp \nu_2 \rp} \rb^T,0} = \nonumber\\
|
||||
&\lp \lp \begin{bmatrix}
|
||||
W_1 \\
|
||||
W'_1
|
||||
\end{bmatrix} ,\begin{bmatrix}
|
||||
|
@ -893,27 +896,29 @@ Diagrammatically, this can be thought of as:
|
|||
W'_1x+b_1'
|
||||
\end{bmatrix} \nonumber
|
||||
\end{align}
|
||||
The full instantiation of (\ref{5.4.10}) is then given by:
|
||||
The full instantiation of (\ref{5.4.10}) with activation function $\fa \in C \lp \R, \R\rp$ is then given by:
|
||||
\begin{align}
|
||||
\real \lp \begin{bmatrix}
|
||||
\begin{bmatrix}
|
||||
W_L \quad W'_L
|
||||
\end{bmatrix}\begin{bmatrix}
|
||||
W_{L-1}(...(W_2\lp W_1x+b_1 \rp + b_2) + ... )+ b_{L-1} \\
|
||||
W'_{L-1}(...(W'_2 \lp W'_1x + b'_1 \rp + b'_2)+...)+b'_{L-1}
|
||||
\end{bmatrix} + b_L+b'_L \rp \label{5.4.12}
|
||||
\act\lp W_{L-1}(...\act(W_2\lp \act\lp W_1x+b_1 \rp\rp + b_2) + ... )+ b_{L-1} \rp\\
|
||||
\act\lp W'_{L-1}(...\act(W'_2\lp\act\lp W'_1x + b'_1 \rp \rp + b'_2)+...)+b'_{L-1}\rp
|
||||
\end{bmatrix} + b_L+b'_L \label{5.4.12}
|
||||
\end{align}
|
||||
The full instantiation of (\ref{5.4.11}) is then given by:
|
||||
\begin{align}
|
||||
\real \lp \begin{bmatrix}
|
||||
\begin{bmatrix}
|
||||
W_L' \quad W_L
|
||||
\end{bmatrix}\begin{bmatrix}
|
||||
W'_{L-1}(...(W'_2\lp W'_1x+b'_1 \rp + b'_2) + ... )+ b'_{L-1} \\
|
||||
W_{L-1}(...(W_2 \lp W_1x + b_1 \rp + b_2)+...)+b_{L-1}
|
||||
\end{bmatrix} + b_L+b'_L \rp \label{5.4.13}
|
||||
\act\lp W'_{L-1}(...\act(W'_2\lp \act \lp W'_1x+b'_1 \rp\rp + b'_2) + ... )+ b'_{L-1}\rp \\
|
||||
\act\lp W_{L-1}(...\act(W_2 \lp\act\lp W_1x + b_1 \rp\rp + b_2)+...)+b_{L-1} \rp
|
||||
\end{bmatrix} + b_L+b'_L \label{5.4.13}
|
||||
\end{align}
|
||||
Since (\ref{5.4.12}) and (\ref{5.4.13}) are the same this proves that $\nu_1 \oplus \nu_2 = \nu_2 \oplus \nu_1$.
|
||||
\end{proof}
|
||||
\begin{remark}
|
||||
This is a special case of \cite[Lemma~3.28]{Grohs_2022}.
|
||||
\end{remark}
|
||||
\begin{lemma}\label{5.4.7}
|
||||
Let $ l_0,l_1,...,l_L \in \N$. Let $\nu \in \neu$ with $\lay(\nu) = \lp l_0,l_1,...,l_L \rp$. There then exists a neural network $\zero_{l_0,l_1,...,l_L} \in \neu$ such that $\real(\nu \oplus \zero_{l_0,l_1,...,l_L}) = \real(\zero_{l_0,l_1,...,l_L} \oplus \nu) = \nu $.
|
||||
\end{lemma}
|
||||
|
@ -1040,7 +1045,7 @@ Diagrammatically, this can be thought of as:
|
|||
\end{lemma}
|
||||
|
||||
\begin{proof}
|
||||
This is the consequence of a finite number of applications of Lemma \ref{5.5.11}.
|
||||
This is the consequence of a finite number of applications of Lemma \ref{5.5.11}. This proves the Lemma.
|
||||
\end{proof}
|
||||
|
||||
|
||||
|
@ -1076,7 +1081,7 @@ Diagrammatically, this can be thought of as:
|
|||
\end{align}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
This is a consequence of a finite number of applications of Lemma \ref{lem:diamondplus}.
|
||||
This is a consequence of a finite number of applications of Lemma \ref{lem:diamondplus}. This proves the Lemma.
|
||||
\end{proof}
|
||||
|
||||
\section{Linear Combinations of ANNs and Their Properties}
|
||||
|
@ -1112,7 +1117,7 @@ Diagrammatically, this can be thought of as:
|
|||
\begin{align}
|
||||
\lay \lp \lambda \triangleright \nu \rp = \lay \lp \aff_{\lambda \mathbb{I}_{\out(\nu)},0} \bullet \nu \rp = \lp l_0, l_1,...,l_{L-1}, \out(\nu) \rp = \lay(\nu)
|
||||
\end{align}
|
||||
Which proves $(i)$. Item $(ii)-(iii)$ of Lemma $\ref{5.3.2}$ then prove that for all $\act \in C(\R,\R)$, $x \in \R^{\inn(\nu)}$, that $\real_{\act} \lp \lambda \triangleright \nu \rp \in C \lp \R^{\inn(\nu),\out(\nu)} \rp$ given by:
|
||||
Which proves (i). Item (ii)\textemdash(iii) of Lemma $\ref{5.3.2}$ then prove that for all $\act \in C(\R,\R)$, $x \in \R^{\inn(\nu)}$, that $\real_{\act} \lp \lambda \triangleright \nu \rp \in C \lp \R^{\inn(\nu),\out(\nu)} \rp$ given by:
|
||||
\begin{align}
|
||||
\lp \real_{\act} \lp \lambda \triangleright \nu \rp \rp \lp x \rp &= \lp \real_{\act} \lp \aff_{\lambda \mathbb{I}_{\out(\nu),0}} \bullet \nu \rp \rp \lp x \rp \nonumber\\
|
||||
&= \lambda \mathbb{I}_{\out(\nu)} \lp \lp \real_{\act} \lp \nu \rp \rp \lp x \rp \rp = \lambda \lp \lp \real_{\act} \lp \nu \rp \rp \lp x \rp \rp
|
||||
|
@ -1140,7 +1145,7 @@ Diagrammatically, this can be thought of as:
|
|||
\begin{align}
|
||||
\lay(\nu \triangleleft\lambda) = \lay \lp \nu \bullet \aff_{\lambda \mathbb{I}_{\inn(\nu)}}\rp = \lp \inn(\nu), l_1,l_2,...,l_L \rp = \lay(\nu)
|
||||
\end{align}
|
||||
Which proves $(i)$. Item (v)--(vi) of Lemma \ref{5.3.3} then prove that for all $\act \in C(\R,\R)$, $x \in \R^{\inn(\nu)}$ that $\real_{\act} \lp \nu \triangleleft \lambda \rp \in C\lp \R^{\inn(\nu),\out(\nu)} \rp$ given by:
|
||||
Which proves $(i)$. Item (v)\textemdash(vi) of Lemma \ref{5.3.3} then prove that for all $\act \in C(\R,\R)$, $x \in \R^{\inn(\nu)}$ that $\real_{\act} \lp \nu \triangleleft \lambda \rp \in C\lp \R^{\inn(\nu),\out(\nu)} \rp$ given by:
|
||||
\begin{align}
|
||||
\lp \real_{\act} \lp \nu \triangleleft \lambda \rp \rp \lp x \rp &= \lp \real_{\act} \lp \nu \bullet \aff_{\lambda \mathbb{I}_{\inn(\nu),0}} \rp \rp \lp x \rp \nonumber\\
|
||||
&= \lp \real_{\act} \lp \nu \rp \rp \lp \aff_{\lambda \mathbb{I}_{\inn(\nu)}} \rp \lp x \rp \nonumber\\
|
||||
|
@ -1163,17 +1168,17 @@ Diagrammatically, this can be thought of as:
|
|||
&= \begin{bmatrix}
|
||||
W_L \quad W'_L
|
||||
\end{bmatrix}\begin{bmatrix}
|
||||
\inst_{\rect} \lp W_{L-1}(...(\inst_{\rect} \lp W_2\lp \inst_{\rect} \lp W_1\lambda x+b_1 \rp \rp + b_2)\rp + ... )+ b_{L-1}\rp \\
|
||||
\inst_{\rect} \lp W'_{L-1}(...(\inst_{\rect} \lp W'_2\lp \inst_{\rect} \lp W'_1\lambda x+b'_1 \rp \rp + b'_2)\rp + ... )+ b'_{L-1}\rp \\
|
||||
\act \lp W_{L-1}(...(\act \lp W_2\lp \act \lp W_1\lambda x+b_1 \rp \rp + b_2)\rp + ... )+ b_{L-1}\rp \\
|
||||
\act \lp W'_{L-1}(...(\act \lp W'_2\lp \act \lp W'_1\lambda x+b'_1 \rp \rp + b'_2)\rp + ... )+ b'_{L-1}\rp \\
|
||||
\end{bmatrix} + b_L+b'_L \nonumber
|
||||
\end{align}
|
||||
Note that:
|
||||
\begin{align}
|
||||
\lp \real_{\act} \lp \nu \rp \rp \lp \lambda x \rp = W_L \cdot \inst_{\rect} \lp W_{L-1}(...(\inst_{\rect} \lp W_2\lp \inst_{\rect} \lp W_1\lambda x+b_1 \rp \rp + b_2)\rp + ... )+ b_{L-1}\rp + b_L
|
||||
\lp \real_{\act} \lp \nu \rp \rp \lp \lambda x \rp = W_L \cdot \act \lp W_{L-1}(...(\act \lp W_2\lp \act \lp W_1\lambda x+b_1 \rp \rp + b_2)\rp + ... )+ b_{L-1}\rp + b_L
|
||||
\end{align}
|
||||
and that:
|
||||
\begin{align}
|
||||
\lp \real_{\act} \lp \mu \rp \rp \lp \lambda x \rp = W'_L\cdot\inst_{\rect} \lp W'_{L-1}(...(\inst_{\rect} \lp W'_2\lp \inst_{\rect} \lp W'_1\lambda x+b'_1 \rp \rp + b'_2)\rp + ... )+ b'_{L-1}\rp + b'_L
|
||||
\lp \real_{\act} \lp \mu \rp \rp \lp \lambda x \rp = W'_L\cdot\act \lp W'_{L-1}(...(\act \lp W'_2\lp \act \lp W'_1\lambda x+b'_1 \rp \rp + b'_2)\rp + ... )+ b'_{L-1}\rp + b'_L
|
||||
\end{align}
|
||||
This, together with Lemma \ref{5.5.11}, completes the proof.
|
||||
\end{proof}
|
||||
|
@ -1320,19 +1325,16 @@ Diagrammatically, this can be thought of as:
|
|||
&= \sum^v_{i=u} \lp \real_{\act} \lp \lp \aff_{\mathbb{I}_{\inn(\nu_i)},b_i} \bullet \nu_i\rp \triangleleft c_i \rp \rp \lp x \rp\\
|
||||
&=\sum^v_{i=u} \lp \real_{\act} \lp \nu_i \rp \rp \lp c_i x+b_i \rp \nonumber
|
||||
\end{align}
|
||||
This establishes items (ii)--(iii); thus, the proof is complete.
|
||||
This establishes items (ii)\textemdash(iii); thus, the proof is complete.
|
||||
\end{proof}
|
||||
|
||||
\begin{lemma}\label{5.6.9}
|
||||
Let $L \in \N$, $u,v \in \Z$ with $u\leqslant v$. Let $c_u, c_{u+1},...,c_v \in \R$. $\nu_u, \nu_{u+1},...,\nu_v, \mu, \mathfrak{I} \in \neu$, $B_u, B_{u+1},...,B_v \in \R^{\inn(\nu_u)}$, $\act \in C\lp \R, \R \rp$, satisfy for all $j \in \N \cap [u,v]$ that $L = \max_{i\in \N \cap \lb u,v \rb} \dep(\nu_i)$, $\inn(\nu_j) = \inn(\nu_u)$, $\out(\nu_j) = \inn(\mathfrak{I})= \out(\mathfrak{I})$, $\hid(\mathfrak{I}) = 1$, $\real_{\act} (\mathfrak{I}) = \mathbb{I}_\R$, and that:
|
||||
Let $L \in \N$, $u,v \in \Z$ with $u\leqslant v$. Let $c_u, c_{u+1},...,c_v \in \R$. $\nu_u, \nu_{u+1},...,\nu_v, \mu \in \neu$, $B_u, B_{u+1},...,B_v \in \R^{\inn(\nu_u)}$, $\act \in C\lp \R, \R \rp$, satisfy for all $j \in \N \cap [u,v]$ that $L = \max_{i\in \N \cap \lb u,v \rb} \dep(\nu_i)$, $\inn(\nu_j) = \inn(\nu_u)$, $\out(\nu_j) = \inn(\mathfrak{I})= \out(\mathfrak{I})$, $\hid(\mathfrak{I}) = 1$, $\real_{\act} (\mathfrak{I}) = \mathbb{I}_\R$, and that:
|
||||
\begin{align}
|
||||
\mu = \boxplus^v_{i = u, \mathfrak{I}} \lp c_i \triangleright \lp \nu_i \bullet \aff_{\mathbb{I}_{\inn(\nu_i), },b_i} \rp \rp
|
||||
\mu = \dplus^v_{i = u, \mathfrak{I}} \lp c_i \triangleright \lp \nu_i \bullet \aff_{\mathbb{I}_{\inn(\nu_i), },b_i} \rp \rp
|
||||
\end{align}
|
||||
We then have:
|
||||
We then have that:
|
||||
\begin{enumerate}[label = (\roman*)]
|
||||
\item it holds that:
|
||||
\begin{align}
|
||||
\lay(\mu) = \lp \inn(\nu_u ), \sum^v_{i=u}\wid_1 \lp \ex_{L,\mathfrak{I}} \lp \nu_i \rp \rp ,\sum^v_{i=u}\wid_2 \lp \ex_{L,\mathfrak{I}} \lp \nu_i\rp\rp,...,\sum^v_{i=u} \wid_{L-1} \lp \ex_{I,\mathfrak{I}} \lp \nu_i \rp , \out \lp \nu_u \rp \rp \rp
|
||||
\end{align}
|
||||
\item it holds that $\real_{\act}(\mu) \in C \lp \R^{\inn(\nu_u)}, \R^{\out(\nu_u)} \rp $, and that,
|
||||
\item it holds for all $ x \in \R^{\inn(\nu_u)}$ that:
|
||||
\begin{align}
|
||||
|
@ -1341,7 +1343,8 @@ Diagrammatically, this can be thought of as:
|
|||
\end{enumerate}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Note that Item(i) from Lemma \ref{5.6.5} establish Item(i) and (\ref{5.5.20}); in addition, items (v) and (vi) from Lemma \ref{5.3.3} tell us that for all $i \in \N \cap [u,v]$, $x \in \R^{\inn(\nu_u}$, it holds that $\real_{\act} \lp \nu_i \bullet \aff_{\mathbb{I}_{\inn(\nu_i)}, B_i} \in C \lp \R^{\inn(\nu_u)}, \R^{\out(\nu_u)}\rp \rp $ and further that:
|
||||
Note that Item(i) from Lemma \ref{5.6.5} establish Item(i) and (\ref{5.5.20}); in addition, items (v) \textemdash (vi) from Lemma \ref{5.3.3} tell us that for all $i \in \N \cap [u,v]$, $x \in \R^{\inn(\nu_u}$, it holds that \\
|
||||
$\real_{\act} \lp \nu_i \bullet \aff_{\mathbb{I}_{\inn(\nu_i)}, B_i} \in C \lp \R^{\inn(\nu_u)}, \R^{\out(\nu_u)}\rp \rp $ and further that:
|
||||
\begin{align}
|
||||
\lp \real_{\act} \lp \nu_i\bullet \aff_{\mathbb{I}_{\inn(\nu_i)},B_i} \rp \rp \lp x \rp = \lp \real_{\act} \lp \nu_i \rp \rp \lp x + b_k \rp
|
||||
\end{align}
|
||||
|
@ -1363,7 +1366,7 @@ Diagrammatically, this can be thought of as:
|
|||
This establishes Items(ii)--(iii), thus proving the lemma.
|
||||
\end{proof}
|
||||
\begin{lemma}
|
||||
Let $L \in \N$, $u,v \in \Z$ with $u\leqslant v$. Let $c_u, c_{u+1},...,c_v \in \R$. $\nu_u, \nu_{u+1},...,\nu_v, \mu, \mathfrak{I} \in \neu$, $B_u, B_{u+1},...,B_v \in \R^{\inn(\nu_u)}$, $\act \in C\lp \R, \R \rp$, satisfy for all $j \in \N \cap [u,v]$ that $L = \max_{i\in \N \cap \lb u,v \rb} \dep(\nu_i)$, $\inn(\nu_j) = \inn(\nu_u)$, $\out(\nu_j) = \inn(\mathfrak{I})= \out(\mathfrak{I})$, $\hid(\mathfrak{I}) = 1$, $\real_{\act} (\mathfrak{I}) = \mathbb{I}_\R$, and that:
|
||||
Let $L \in \N$, $u,v \in \Z$ with $u\leqslant v$. Let $c_u, c_{u+1},...,c_v \in \R$. $\nu_u, \nu_{u+1},...,\nu_v, \mu, \mathfrak{I} \in \neu$, $B_u, B_{u+1},...,B_v \in \R^{\inn(\nu_u)}$, $\act \in C\lp \R, \R \rp$, satisfy for all $j \in \N \cap [u,v]$ that $L =\\ \max_{i\in \N \cap \lb u,v \rb} \dep(\nu_i)$, $\inn(\nu_j) = \inn(\nu_u)$, $\out(\nu_j) = \inn(\mathfrak{I})= \out(\mathfrak{I})$, $\hid(\mathfrak{I}) = 1$, $\real_{\act} (\mathfrak{I}) = \mathbb{I}_\R$, and that:
|
||||
\begin{align}
|
||||
\mu = \boxplus^v_{i = u, \mathfrak{I}} \lp \lp \aff_{\mathbb{I} _{\inn(\nu_i)},b_i} \bullet \nu_i\rp \triangleleft c_i \rp
|
||||
\end{align}
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
\documentclass[11pt]{report}
|
||||
\documentclass[12pt]{report}
|
||||
\usepackage{setspace}
|
||||
\doublespacing
|
||||
\usepackage[toc,page]{appendix}
|
||||
|
@ -9,7 +9,14 @@
|
|||
\usepackage{mathtools}
|
||||
\numberwithin{equation}{section}
|
||||
\usepackage[]{amssymb}
|
||||
\usepackage[margin=1in]{geometry}
|
||||
\usepackage{geometry}
|
||||
\geometry{
|
||||
left=1in,
|
||||
right=1in,
|
||||
top=1in,
|
||||
bottom=1in
|
||||
}
|
||||
|
||||
\usepackage[]{soul}
|
||||
\usepackage[]{bbm}
|
||||
\usepackage[]{cancel}
|
||||
|
@ -27,7 +34,7 @@
|
|||
pdfauthor={Shakil Rafi},
|
||||
pdftitle={Dissertation},
|
||||
pdfkeywords={neural-networks, stochastic-processes},
|
||||
colorlinks = true,
|
||||
colorlinks = false,
|
||||
filecolor = magenta,
|
||||
urlcolor = cyan
|
||||
}
|
||||
|
|
|
@ -394,7 +394,8 @@ Note that (\ref{2.13}) and (\ref{2.14}) together prove that $u(T,x) = g(x)$. Thi
|
|||
%For each row $j$ we therefore have $x_k + \sqrt{2} \mathfrak{W}^{i,j,d}_{t-s}$
|
||||
%
|
||||
\begin{lemma} \label{maxviscosity}
|
||||
Let $d\in \N$, $T \in \lp 0,\infty \rp$, $\mathfrak{t} \in \lp 0,T \rp$, let $\mathcal{O} \subseteq \R^d$ be an open set, let $\mathfrak{r} \in \mathcal{O}$, $\phi \in C^{1,2}\lp \lp 0,T \rp \times \mathcal{O},\R \rp$, let $G: \lp 0,T \rp \times \mathcal{O} \times \R \times \R^d \times \mathbb{S}_d \rightarrow \R$ be degenerate elliptic and let $u_d (0,T) \times \mathcal{O} \rightarrow \R$ be a viscosity solution of $\lp \frac{\partial}{\partial t} u_d \rp \lp t,x \rp + G \lp t,x,u(t,x), \lp \nabla_x u_D \rp \lp t,x \rp, \lp \Hess_x u_d \rp \lp t,x \rp \rp \geqslant 0$ for $(t,x) \in (0,T) \times \mathcal{O}$, and assume that $u-\phi$ has a local maximum at $(\mathfrak{t}, \mathfrak{r}) \in (0,T) \times \mathcal{O}$, then:
|
||||
Let $d\in \N$, $T \in \lp 0,\infty \rp$, $\mathfrak{t} \in \lp 0,T \rp$, let $\mathcal{O} \subseteq \R^d$ be an open set, let $\mathfrak{r} \in \mathcal{O}$, $\phi \in C^{1,2}\lp \lp 0,T \rp \times \mathcal{O},\R \rp$, let $G: \lp 0,T \rp \times \mathcal{O} \times \R \times \R^d \times \mathbb{S}_d \rightarrow \R$ be degenerate elliptic and let $u_d: (0,T) \times \mathcal{O} \rightarrow \R$ be a viscosity solution of \\
|
||||
$\lp \frac{\partial}{\partial t} u_d \rp \lp t,x \rp + G \lp t,x,u(t,x), \lp \nabla_x u_D \rp \lp t,x \rp, \lp \Hess_x u_d \rp \lp t,x \rp \rp \geqslant 0$ for $(t,x) \in (0,T) \times \mathcal{O}$, and assume that $u-\phi$ has a local maximum at $(\mathfrak{t}, \mathfrak{r}) \in (0,T) \times \mathcal{O}$, then:
|
||||
\begin{align}
|
||||
\lp \frac{\partial}{\partial t} \phi \rp \lp \mathfrak{t},\mathfrak{r}\rp + G \lp \mathfrak{t}, \mathfrak{r}, u(\mathfrak{t}, \mathfrak{r}), \lp \nabla _x \phi \rp \lp \mathfrak{t}, \mathfrak{r} \rp, \lp \Hess_x \phi\rp\lp \mathfrak{t}, \mathfrak{r} \rp \rp \geqslant 0
|
||||
\end{align}
|
||||
|
@ -426,7 +427,7 @@ Note that (\ref{2.13}) and (\ref{2.14}) together prove that $u(T,x) = g(x)$. Thi
|
|||
\end{align}
|
||||
\end{lemma}
|
||||
\begin{proof}
|
||||
Let $(t_o, x_o) \in (0,T) \times \mathcal{O}$. Let $\phi_\epsilon \in C^{1,2}((0,T) \times \mathcal{O}, \R)$ satisfy for all $\epsilon \in (0, \infty)$, $s \in (0,T)$, $y \in \mathcal{O}$ that $\phi_0(t_0,x_0) = u_0(t_0,x_0)$, $\phi_0(t_0,x_0) \geqslant u_0(t_0,x_0)$, and:
|
||||
Let $(t_0, x_0) \in (0,T) \times \mathcal{O}$. Let $\phi_\epsilon \in C^{1,2}((0,T) \times \mathcal{O}, \R)$ satisfy for all $\epsilon \in (0, \infty)$, $s \in (0,T)$, $y \in \mathcal{O}$ that $\phi_0(t_0,x_0) = u_0(t_0,x_0)$, $\phi_0(t_0,x_0) \geqslant u_0(t_0,x_0)$, and:
|
||||
\begin{align}\label{phieps}
|
||||
\phi_\varepsilon(s,y) = \phi_o(s,y) + \varepsilon\lp \lv s - t_0 \rv + \| y - x_0 \|_E \rp
|
||||
\end{align}
|
||||
|
@ -635,12 +636,12 @@ Taken together these prove the corollary.
|
|||
&\leqslant 4T(T+1) \lb \sup_{r\in [0,T]}\sup_{y\in \R^d} \lp \| \mu_n(r,y)-\mu_0(r,y) \|_E^2 + \|\sigma_n(r,y) - \sigma_)(r,y) \|_F^2 \rp \rb e^{4L^2T(T+1)}
|
||||
\end{align}
|
||||
Applying $\limsup_{n\rightarrow \infty}$ to both sides and applying (\ref{limsupis0}) gives us for all $n \in \N$, $t \in [0,T]$, $s\in [t,T]$, $x \in \mathcal{O}$ that:
|
||||
\begin{align}
|
||||
&\limsup_{n\rightarrow \infty} \E \lb \left\| \mathcal{X}^{n,t,x}_s - \mathcal{X}^{0,t,x}_s \right\|_E^2 \rb \nonumber\\
|
||||
&\leqslant \limsup_{n\rightarrow \infty} \lb 4T(T+1) \lb \sup_{r\in [0,T]}\sup_{y\in \R^d} \lp \left\| \mu_n(r,y)-\mu_0(r,y) \right\|_E^2 + \left\|\sigma_n(r,y) - \sigma_0(r,y) \right\|_F^2 \rp \rb e^{4L^2T(T+1)} \rb \nonumber \\
|
||||
&\leqslant 4T(T+1) \lb \limsup_{n\rightarrow \infty} \lb\sup_{r\in [0,T]}\sup_{y\in \R^d} \lp \left\| \mu_n(r,y)-\mu_0(r,y) \right\|_E^2 + \left\|\sigma_n(r,y) - \sigma_0(r,y) \right\|_F^2 \rp \rb\rb e^{4L^2T(T+1)} \nonumber \\
|
||||
\begin{align*}
|
||||
&\limsup_{n\rightarrow \infty} \E \lb \left\| \mathcal{X}^{n,t,x}_s - \mathcal{X}^{0,t,x}_s \right\|_E^2 \rb \nonumber
|
||||
\leqslant \\ &\limsup_{n\rightarrow \infty} \lb 4T(T+1) \lb \sup_{r\in [0,T]}\sup_{y\in \R^d} \lp \left\| \mu_n(r,y)-\mu_0(r,y) \right\|_E^2 + \left\|\sigma_n(r,y) - \sigma_0(r,y) \right\|_F^2 \rp \rb e^{4L^2T(T+1)} \rb \nonumber
|
||||
\nonumber\\ &\leqslant \\ &4T(T+1) \lb \limsup_{n\rightarrow \infty} \lb\sup_{r\in [0,T]}\sup_{y\in \R^d} \lp \left\| \mu_n(r,y)-\mu_0(r,y) \right\|_E^2 + \left\|\sigma_n(r,y) - \sigma_0(r,y) \right\|_F^2 \rp \rb\rb e^{4L^2T(T+1)} \nonumber \\
|
||||
&\leqslant 0 \nonumber
|
||||
\end{align}
|
||||
\end{align*}
|
||||
This completes the proof.
|
||||
\end{proof}
|
||||
|
||||
|
@ -797,7 +798,7 @@ Since for all $n\in \N$, it is the case that $\mathcal{S} = \lp \supp(\mathfrak{
|
|||
\end{align}
|
||||
and finally let, for every $n\in \N$, $t \in [0,T]$, $x \in \mathcal{O}$, there be $\mathfrak{t}^{t,x}_n: \Omega \rightarrow [t,T]$ which satisfy $\mathfrak{t}^{t,x}_n = \inf \lp \{ s \in [t,T], \max \{V(s,\mathfrak{X}^{t,x}_s),V(s,\mathcal{X}^{t,x}_s)\} \geqslant n \} \cup \{T\} \rp$. We may apply Lemma \ref{2.19} with $\mu \curvearrowleft \mathfrak{m}_n$, $\sigma \curvearrowleft \mathfrak{s}_n$, $g \curvearrowleft \mathfrak{g}_k$ to show that for all $n,k \in \N$ we have that $\mathfrak{u}^{n,k}$ is a viscosity solution to:
|
||||
\begin{align}\label{2.89}
|
||||
\lp \frac{\partial}{\partial t} \mathfrak{u}^{n,k} \rp (t,x) + \frac{1}{2} \Trace \lp \mathfrak{s}_n(t,x) \lb \mathfrak{s}_n(t,x) \rb^* \lp \Hess_x \mathfrak{u}^{n,k} \rp (t,x) \rp + \la \mathfrak{m}_n(t,x), \lp \nabla_x(\mathfrak{u}^{n,k} \rp(t,x) \ra = 0
|
||||
&\lp \frac{\partial}{\partial t} \mathfrak{u}^{n,k} \rp (t,x) + \frac{1}{2} \Trace \lp \mathfrak{s}_n(t,x) \lb \mathfrak{s}_n(t,x) \rb^* \lp \Hess_x \mathfrak{u}^{n,k} \rp (t,x) \rp + \la \mathfrak{m}_n(t,x), \lp \nabla_x(\mathfrak{u}^{n,k} \rp(t,x) \ra \nonumber\\& = 0 \nonumber
|
||||
\end{align}
|
||||
for $(t,x) \in (0,T) \times \R^d$. But note that items (i)-(iii) and \ref{2.86} give us that, in line with \cite[Lemma~3.5]{Beck_2021}:
|
||||
\begin{align}
|
||||
|
@ -841,7 +842,7 @@ Since for all $n\in \N$, it is the case that $\mathcal{S} = \lp \supp(\mathfrak{
|
|||
\end{proof}
|
||||
|
||||
|
||||
\section{Solutions, Characterization, and Computational Bounds to the Kolmogorov Backward Equations}
|
||||
\section{Solutions, Characterization, and Computational\\ Bounds to the Kolmogorov Backward Equations}
|
||||
% \begin{proof}
|
||||
% From Feynman-Kac, especially from \cite[(1.5)]{hutzenthaler_strong_2021} and setting $f=0$ in the notation of \cite[(1.5)]{hutzenthaler_strong_2021} we have that:
|
||||
% \begin{align}
|
||||
|
@ -891,7 +892,7 @@ Let $T \in (0,\infty)$. Let $\lp \Omega, \mathcal{F}, \mathbb{P} \rp$ be a proba
|
|||
This is a consequence of Lemma \ref{lem:3.4} and \ref{2.19}.
|
||||
\end{proof}
|
||||
\newpage
|
||||
\begin{corollary}\label{lem:3.19} Let $T \in (0,\infty)$, let $\left( \Omega, \mathcal{F}, \mathbb{P} \right)$ be a probability space, let $u_d \in C^{1,2} \left( \left[ 0,T \right] \times \R^d, \R \right)$, $d \in \N$ satisfy for all $d \in \N$, $t \in [0,T]$, $x \in \R^d$ that:
|
||||
\begin{corollary}\label{lem:3.19} Let $T \in (0,\infty)$,\\ let $\left( \Omega, \mathcal{F}, \mathbb{P} \right)$ be a probability space, let $u_d \in C^{1,2} \left( \left[ 0,T \right] \times \R^d, \R \right)$, $d \in \N$ satisfy for all $d \in \N$, $t \in [0,T]$, $x \in \R^d$ that:
|
||||
\begin{align}
|
||||
\left( \frac{\partial}{\partial t} u_d \right) \left(t,x\right) + \frac{1}{2}\left(\nabla^2_x u_d\right) \left(t,x\right) = 0
|
||||
\end{align}
|
||||
|
@ -905,7 +906,7 @@ Then for all $d\in \N$, $t \in [0,T]$, $x \in \R^d$ it holds that:
|
|||
\end{align}
|
||||
\end{corollary}
|
||||
\begin{proof}
|
||||
This is a special case of Theorem \ref{thm:3.21}. It is the case where $\sigma_d(x) = \mathbb{I}_d$, the uniform identity function where $\mathbb{I}_d$ is the identity matrix in dimension $d$ for $d \in \N$, and $\mu_d(x) = \mymathbb{0}_{d,1}$ where $\mymathbb{0}_d$ is the zero vector in dimension $d$ for $d \in \N$.
|
||||
This is a special case of Theorem \ref{thm:3.21}. It is the case where $\sigma_d(x) = \mathbb{I}_d$, the uniform identity function where $\mathbb{I}_d$ is the identity matrix in dimension $d$ for $d \in \N$, and $\mu_d(x) = \mymathbb{0}_{d}$ where $\mymathbb{0}_d$ is the zero vector in dimension $d$ for $d \in \N$.
|
||||
\end{proof}
|
||||
|
||||
|
||||
|
|
Binary file not shown.
Loading…
Reference in New Issue