Online Covariance

Given the following set of two-dimensional inputs:

$\left\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\dots ,\left({x}_{n-1},{y}_{n-1}\right),\left({x}_{n},{y}_{n}\right)\right\}$

Let $n$ be the number of two-dimensional inputs, $X$ represent the $x$ dimension, $Y$ represent the $y$ dimension, $Co{v}_{n}\left(X,Y\right)$ be the biased sample covariance of the $x$ and $y$ dimensions for the first $n$ two-dimensional inputs, $Co{v}_{n-1}\left(X,Y\right)$ be the biased sample covariance of the $x$ and $y$ dimensions for the first $n-1$ two-dimensional inputs, ${x}_{n}$ be the $x$ value of the $n$-th two-dimensional input, ${\overline{x}}_{n}$ be the sample mean of the $x$ values for the first $n$ two-dimensional inputs, ${y}_{n}$ be the $y$ value of the $n$-th two-dimensional input, and ${\overline{y}}_{n-1}$ be the sample mean of the $y$ values for the first $n-1$ two-dimensional inputs. Then, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:

$Co{v}_{n}\left(X,Y\right)=Co{v}_{n-1}\left(X,Y\right)-\frac{Co{v}_{n-1}\left(X,Y\right)-\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}$

Note: The recurrence equation above also applies when computing the online covariance matrix:

${\Sigma }_{n}\left[j,k\right]={\Sigma }_{n-1}\left[j,k\right]-\frac{{\Sigma }_{n-1}\left[j,k\right]-\left({x}_{n}\left[j\right]-{\overline{x\left[j\right]}}_{n}\right)\left({x}_{n}\left[k\right]-{\overline{x\left[k\right]}}_{n-1}\right)}{n}$.

However, we will restrict ourselves to the online covariance computation of two-dimensional input in this post and explore the online covariance matrix computation of $m$-dimensional input in a later post.

Proof:

The definition of the biased sample covariance of the $x$ and $y$ dimensions for the first $n$ two-dimensional inputs is defined as:

$Co{v}_{n}\left(X,Y\right)=\frac{\sum _{i=1}^{n}\left({x}_{i}-{\overline{x}}_{n}\right)\left({y}_{i}-{\overline{y}}_{n}\right)}{n}$.

If we expand this definition, we have:

$Co{v}_{n}\left(X,Y\right)=\frac{\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n}\right)\left({y}_{i}-{\overline{y}}_{n}\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)}{n}$.

Since the recurrence equations for the sample mean of the $x$ and $y$ values are:

${\overline{x}}_{n}={\overline{x}}_{n-1}-\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}$ and ${\overline{y}}_{n}={\overline{y}}_{n-1}-\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}$,

then we have:

$\begin{array}{l}Co{v}_{n}\left(X,Y\right)=\frac{\sum _{i=1}^{n-1}\left({x}_{i}-\left({\overline{x}}_{n-1}-\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\right)\left({y}_{i}-\left({\overline{y}}_{n-1}-\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}+\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left({y}_{i}-{\overline{y}}_{n-1}+\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\sum _{i=1}^{n-1}\left(\begin{array}{l}{x}_{i}{y}_{i}-{x}_{i}{\overline{y}}_{n-1}+{x}_{i}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)-{\overline{x}}_{n-1}{y}_{i}+{\overline{x}}_{n-1}{\overline{y}}_{n-1}-{\overline{x}}_{n-1}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right){y}_{i}-\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right){\overline{y}}_{n-1}+\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\sum _{i=1}^{n-1}\left({x}_{i}{y}_{i}-{x}_{i}{\overline{y}}_{n-1}-{\overline{x}}_{n-1}{y}_{i}+{\overline{x}}_{n-1}{\overline{y}}_{n-1}\right)+\sum _{i=1}^{n-1}\left({x}_{i}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)-{\overline{x}}_{n-1}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)\\ +\sum _{i=1}^{n-1}\left(\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right){y}_{i}-\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right){\overline{y}}_{n-1}+\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}\right)\left({y}_{i}-{\overline{y}}_{n-1}\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\sum _{i=1}^{n-1}\left({y}_{i}-{\overline{y}}_{n-1}+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\end{array}$

Since the biased sample covariance of the $x$ and $y$ dimensions for the first $n-1$ two-dimensional inputs is defined as:

$Co{v}_{n-1}\left(X,Y\right)=\frac{\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}\right)\left({y}_{i}-{\overline{y}}_{n-1}\right)}{n-1}$,

then we also have:

$\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}\right)\left({y}_{i}-{\overline{y}}_{n-1}\right)=\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)$.

With this, we have:

$\begin{array}{c}Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\sum _{i=1}^{n-1}\left({x}_{i}-{\overline{x}}_{n-1}\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\sum _{i=1}^{n-1}\left({y}_{i}-{\overline{y}}_{n-1}+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({x}_{i}\right)+\sum _{i=1}^{n-1}\left(-{\overline{x}}_{n-1}\right)\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({y}_{i}\right)+\sum _{i=1}^{n-1}\left(-{\overline{y}}_{n-1}\right)+\sum _{i=1}^{n-1}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({x}_{i}\right)-{\overline{x}}_{n-1}\sum _{i=1}^{n-1}\left(1\right)\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({y}_{i}\right)-{\overline{y}}_{n-1}\sum _{i=1}^{n-1}\left(1\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\sum _{i=1}^{n-1}\left(1\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({x}_{i}\right)-{\overline{x}}_{n-1}\left(n-1\right)\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\sum _{i=1}^{n-1}\left({y}_{i}\right)-{\overline{y}}_{n-1}\left(n-1\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(n-1\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\end{array}$

Since the sample mean for the first $n-1$ $x$ and $y$ values are defined as:

${\overline{x}}_{n-1}=\frac{\sum _{i=1}^{n-1}{x}_{i}}{n-1}$ and ${\overline{y}}_{n-1}=\frac{\sum _{i=1}^{n-1}{y}_{i}}{n-1}$,

then we also have:

$\sum _{i=1}^{n-1}{x}_{i}={\overline{x}}_{n-1}\left(n-1\right)$ and $\sum _{i=1}^{n-1}{y}_{i}={\overline{y}}_{n-1}\left(n-1\right)$.

With that, we have:

$\begin{array}{l}Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left({\overline{x}}_{n-1}\left(n-1\right)-{\overline{x}}_{n-1}\left(n-1\right)\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left({\overline{y}}_{n-1}\left(n-1\right)-{\overline{y}}_{n-1}\left(n-1\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(n-1\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(\overline{){\overline{x}}_{n-1}\left(n-1\right)}\overline{)-{\overline{x}}_{n-1}\left(n-1\right)}\right)\\ +\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\overline{){\overline{y}}_{n-1}\left(n-1\right)}\overline{)-{\overline{y}}_{n-1}\left(n-1\right)}+\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(n-1\right)\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\left(n-1\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n}\right)\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +{x}_{n}{y}_{n}-{x}_{n}{\overline{y}}_{n}-{\overline{x}}_{n}{y}_{n}+{\overline{x}}_{n}{\overline{y}}_{n}\end{array}\right)}{n}\end{array}$

Since the recurrence equation for the sample mean of the $y$ values is:

${\overline{y}}_{n}={\overline{y}}_{n-1}-\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}$,

then we have:

$\begin{array}{c}Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +{x}_{n}{y}_{n}-{x}_{n}\left({\overline{y}}_{n-1}-\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)-{\overline{x}}_{n}{y}_{n}+{\overline{x}}_{n}\left({\overline{y}}_{n-1}-\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +{x}_{n}{y}_{n}-{x}_{n}{\overline{y}}_{n-1}+{x}_{n}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)-{\overline{x}}_{n}{y}_{n}+{\overline{x}}_{n}{\overline{y}}_{n-1}-{\overline{x}}_{n}\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +{x}_{n}{y}_{n}-{x}_{n}{\overline{y}}_{n-1}-{\overline{x}}_{n}{y}_{n}+{\overline{x}}_{n}{\overline{y}}_{n-1}+\left({x}_{n}-{\overline{x}}_{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\end{array}$

Since the recurrence equation for the sample mean of the $x$ values is:

$\begin{array}{l}{\overline{x}}_{n}={\overline{x}}_{n-1}-\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\\ {\overline{x}}_{n}=\frac{n{\overline{x}}_{n-1}}{n}+\frac{-{\overline{x}}_{n-1}+{x}_{n}}{n}\\ {\overline{x}}_{n}=\frac{\left(n-1\right){\overline{x}}_{n-1}+{x}_{n}}{n},\end{array}$

then we have:

$\begin{array}{l}Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)+\left({x}_{n}-\left(\frac{\left(n-1\right){\overline{x}}_{n-1}+{x}_{n}}{n}\right)\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)+\left(\frac{n{x}_{n}}{n}+\frac{-\left(n-1\right){\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)+\left(\frac{-\left(n-1\right){\overline{x}}_{n-1}+\left(n-1\right){x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)-\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(\begin{array}{l}\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)\overline{)+\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)}\\ +\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)\overline{)-\left(n-1\right)\left(\frac{{\overline{x}}_{n-1}-{x}_{n}}{n}\right)\left(\frac{{\overline{y}}_{n-1}-{y}_{n}}{n}\right)}\end{array}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{\left(n-1\right)Co{v}_{n-1}\left(X,Y\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{nCo{v}_{n-1}\left(X,Y\right)-Co{v}_{n-1}\left(X,Y\right)+\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=\frac{nCo{v}_{n-1}\left(X,Y\right)}{n}+\frac{-\left(Co{v}_{n-1}\left(X,Y\right)-\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)\right)}{n}\\ Co{v}_{n}\left(X,Y\right)=Co{v}_{n-1}\left(X,Y\right)-\frac{Co{v}_{n-1}\left(X,Y\right)-\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}\end{array}$

Therefore, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:

$Co{v}_{n}\left(X,Y\right)=Co{v}_{n-1}\left(X,Y\right)-\frac{Co{v}_{n-1}\left(X,Y\right)-\left({x}_{n}-{\overline{x}}_{n}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}$

Note: We can manipulate this recurrence equation such as that we also have:

$Co{v}_{n}\left(X,Y\right)=Co{v}_{n-1}\left(X,Y\right)-\frac{Co{v}_{n-1}\left(X,Y\right)-\left({x}_{n}-{\overline{x}}_{n-1}\right)\left({y}_{n}-{\overline{y}}_{n}\right)}{n}$,

$Co{v}_{n}\left(X,Y\right)=Co{v}_{n-1}\left(X,Y\right)-\frac{Co{v}_{n-1}\left(X,Y\right)-\left(\frac{n-1}{n}\right)\left({x}_{n}-{\overline{x}}_{n-1}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}$,

and

$Co{v}_{n}\left(X,Y\right)=\frac{\left(n-1\right)\left(Co{v}_{n-1}\left(X,Y\right)+\frac{\left({x}_{n}-{\overline{x}}_{n-1}\right)\left({y}_{n}-{\overline{y}}_{n-1}\right)}{n}\right)}{n}$.

Example of C++ code that computes the online covariance:

 // Filename: main.cpp #include <iostream> #include <iomanip> int main() {     double x;     double y;     double n = 0;     double mean_x = 0;  // mean of the x values     double mean_y = 0;  // mean of the y values     double cov = 0;     // covariance of the x and y values     double prev_mean_x; // previous mean of the x values     double prev_mean_y; // previous mean of the y values     double prev_cov;    // previous covariance of the x and y values     if ( std::cin >> x && std::cin >> y ) {         ++n;         mean_x = x;         mean_y = y;         cov = 0;         while ( std::cin >> x && std::cin >> y ) {             prev_mean_x = mean_x;             prev_mean_y = mean_y;             prev_cov = cov;             ++n;             mean_x = prev_mean_x - ( prev_mean_x - x ) / n;             mean_y = prev_mean_y - ( prev_mean_y - y ) / n;             cov = prev_cov - ( prev_cov - ( x - mean_x ) * ( y - prev_mean_y ) ) / n;         }     }     std::cout << "n:      " << n << '\n';     std::cout << "mean_x: " << std::setprecision( 17 ) << mean_x << '\n';     std::cout << "mean_y: " << std::setprecision( 17 ) << mean_y << '\n';     std::cout << "cov:    " << std::setprecision( 17 ) << cov << '\n'; } 

Example of data.txt:

 -281.189       612.083 974.663        -24.0965 25.8526        401.539 .              . .              . .              . 

Command Line:

 g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static ./main.exe < data.txt 

Note: Mathematica’s Covariance[] function computes the unbiased sample covariance matrix, not the biased sample covariance matrix; therefore, the biased sample covariance matrix is computed in Mathematica as:

 ( ( Length[ list ] - 1 ) / Length[ list ] ) * Covariance[ list ] 

Tagged on:

3 thoughts on “Online Covariance”

1. Joshua Burkholder Post author

This is also derived from the following:

2. Lavinius

Hello,

I just tried your online covariance formula and it is very precise. Thank you for it, and for the mathematical demonstration. Do you have anything published in which you included it (just the formula), or should I just reference the website?

This site uses Akismet to reduce spam. Learn how your comment data is processed.