Causal fusion refers to the indirect prediction of a target variable using a selection of explanatory variables. A number of causal fusion methods have been developed for use within the SANY project, including multiple linear regressions and neural networks. In most cases, historical target and explanatory variables along with real-time explanatory variables from in-situ sensors held in OGC compliant SOSs or from spatial fusion processes are accessed via an OGC compliant WPS. The resultant predictions are supplied to an OGC compliant SOS, and/or viewed through a web-interface. Two types of casual fusion algorithms were used in SANY: multi-linear regressions and neural networks.
Linear regression is used to construct a prediction formula for the target variable, given values of explanatory variables, by minimizing the sum of squared errors of linear fitting. Before constructing the linear regression formula, each explanatory variable is tested in order to determine whether a linear relationship to the target variable exists. The target variable is then predicted as a linear combination of the explanatory variables.
Linear regression is one of the most widely used modelling methods because of its effectiveness and completeness. Although the majority of processes are nonlinear in nature, many of them are well-approximated by linear models.
Linear regression estimates unknown parameters and assesses whether these parameters are statistically significant, which often has a clear meaning to scientific questions. Linear regression also assesses whether the model is statistically significant. The resulting model can be used to predict the target variable and confidence intervals.
Neural networks are mathematical structures which are analogous to biological neural networks. The artificial neurons are set in layers and interconnected with each other. The neural networks are capable of processing non-linear statistical data and modelling complex relationships between inputs and outputs.
The most basic radial basis network consists of three separate layers. The input layer is the explanatory variables. The second layer is a hidden layer of high dimension. The output layer is the response of the network. The network topology is determined by the number of hidden units. One response is involved in this application.
Neural Networks are generally considered a ‘black box’ approach since the model parameters are hard to interpret in terms of physical meanings.