48 lines
5.0 KiB
TeX
48 lines
5.0 KiB
TeX
\chapter{Conclusions and Further Research}
|
|
|
|
We will present three avenues of further research and related work on parameter estimates here.
|
|
|
|
\section{Further operations and further kinds of neural networks}
|
|
|
|
Note, for instance, that several classical operations are done on neural networks that have yet to be accounted for in this framework and talked about in the literature. We will discuss two of them \textit{dropout} and \textit{merger} and discuss how they may be brought into this framework.
|
|
|
|
\subsection{Dropout}
|
|
Overfitting presents an important challenge for all machine learning models, including deep learning. There ex
|
|
\begin{definition}[Hadamard Product]
|
|
Let $m,n \in \N$. Let $A,B \in \R^{m \times n}$. For all $i \in \{ 1,2,\hdots,m\}$ and $j \in \{ 1,2,\hdots,n\}$ define the Hadamard product $\odot: \R^{m\times n} \times \R^{m \times n} \rightarrow \R^{m \times n}$ as:
|
|
\begin{align}
|
|
A \odot B \coloneqq \lb A \odot B \rb _{i,j} = \lb A \rb_{i,j} \times \lb B \rb_{i,j} \quad \forall i,j
|
|
\end{align}
|
|
\end{definition}
|
|
We will also define the dropout operator introduced in \cite{srivastava_dropout_2014}, and explained further in \cite{Goodfellow-et-al-2016}.
|
|
|
|
|
|
\begin{definition}[Realization with dropout]
|
|
Let $\nu \in \neu$, $L,n \in \N$, $p \in \lp 0,1\rp$, $\lay \lp \nu\rp = \lp l_0,l_1,\hdots, \l_L\rp$, and that $\nu = \lp \lp W_1,b_1\rp, \lp W_2,b_2\rp, \hdots , \lp W_L,b_L\rp \rp$. Let it be the case that for each $n\in \N$, $\rho_n = \{ x_1,x_2,\hdots,x_n\} \in \R^n$ where for each $i \in \{1,2,\hdots,n\}$ it is the case that $x_i \sim \bern(p)$. We will then denote $\real_{\act}^{D,p} \lp \nu \rp \in C\lp \R^{\inn\lp \nu\rp},\R^{\out\lp \nu \rp}\rp$, the continuous function given by:
|
|
\begin{align}
|
|
\real_{\act}^{D,p}\lp \nu \rp = \rho_{l_L}\odot \act \lp W_l\lp \rho_{l_{L-1}} \odot \act \lp W_{L-1}\lp \hdots\rp + b_{L-1}\rp\rp + b_L\rp
|
|
\end{align}
|
|
\end{definition}
|
|
Dropout is an example of \textit{ensemble learning}, a form of learning where versions of our model (e.g. random forests or neural networks) are made (e.g. by dropout for neural networks or by enforcing a maximum depth to the trees in our forest), and a weighted average of the predictions of our different models is taken to be the predictive model. That such a model can work, and indeed work well, is the subject of \cite{schapire_strength_1990}.
|
|
|
|
\section{Further Approximants}
|
|
|
|
In theory the approximation schemes given in the case of $\xpn_n^{q,\ve}, \csn_n^{q,\ve}$, and $\sne_n^{q,\ve}$ given in the previous sections, could be used to approximate more transcendental functions, and identities such as alluded to in Remark \ref{rem:pyth_idt}. Indeed, recent attempts have been made to approximate backwards and forward Euler methods as in \cite{grohs2019spacetime}. In fact, this architecture was originally envisioned to approximate, Multi-Level Picard iterations, as seen in \cite{ackermann2023deep}. These neural network methods have been proven to beat the curse of dimensionality in the sense that the size of these networks (parameter and depth counts) grow only polynomially with respect to the desired accuracy. In practice, it remains to be seen whether for larger dimensions, the increased number of operations and architectures to contend with do not make up for the polynomial increase in parameter and depths, especially when it comes to computaiton time.
|
|
|
|
In a similar vein, these architectures have so far lacked a consistent implementation in a widely available programming language. Part of the dissertation work has been focused on implementing these architectures as an $\texttt{R}$ package, available at \texttt{CRAN}.
|
|
|
|
\section{Algebraic Properties of this Framework}
|
|
|
|
It is quite straightforward to see that the instantiation operation has sufficiently functorial properties, at the very least, when instantiating with the identity function. More specifically consider the category \texttt{Mat} whose objects are natural numbers, $m,n$, and whose arrows $m \xleftarrow{A} n$ are matrices $A \in \R^{m\times n}$, i.e. a continuous function between vector spaces $\R^n$ and $\R^m$ respectively. Consider as well the set of neural networks $\nu \subsetneq \neu$ where $\inn\lp \nu \rp = n$ and $\out\lp \nu \rp = m$.
|
|
\\
|
|
In such a case, note that the instantiation operation preserves the axiom of functoriality, namely that composition is respected under instantiation. Note also that we have alluded to the fact that under neural network composition, with $\id$ (the appropriate one for our dimension) behaves like a monoid under instantiation.
|
|
|
|
Note for example that a neural network analog for derivatives, one that respects the chain rule under instantiation already exist in the literature, e.g. \cite{nn_diff}. Thus there is a growing and rather rich and growing set of algebraic operations that are and have been proposed for neural networks.
|
|
|
|
|
|
A further exploration of the algebraic properties of this artificial neural network framework could present a fruitful avenue of future study.
|
|
|
|
This completes this Dissertation.
|
|
|
|
|