Online Covariance

Given the following set of two-dimensional inputs:

{(x1,y1),(x2,y2),,(xn1,yn1),(xn,yn)}

Let n be the number of two-dimensional inputs, X represent the x dimension, Y represent the y dimension, Covn(X,Y) be the biased sample covariance of the x and y dimensions for the first n two-dimensional inputs, Covn1(X,Y) be the biased sample covariance of the x and y dimensions for the first n1 two-dimensional inputs, xn be the x value of the n-th two-dimensional input, x¯n be the sample mean of the x values for the first n two-dimensional inputs, yn be the y value of the n-th two-dimensional input, and y¯n1 be the sample mean of the y values for the first n1 two-dimensional inputs. Then, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:

Covn(X,Y)=Covn1(X,Y)Covn1(X,Y)(xnx¯n)(yny¯n1)n

Note: The recurrence equation above also applies when computing the online covariance matrix:

Σn[j,k]=Σn1[j,k]Σn1[j,k](xn[j]x[j]¯n)(xn[k]x[k]¯n1)n.

However, we will restrict ourselves to the online covariance computation of two-dimensional input in this post and explore the online covariance matrix computation of m-dimensional input in a later post.

 

Proof:

The definition of the biased sample covariance of the x and y dimensions for the first n two-dimensional inputs is defined as:

Covn(X,Y)=i=1n(xix¯n)(yiy¯n)n.

If we expand this definition, we have:

Covn(X,Y)=i=1n1(xix¯n)(yiy¯n)+(xnx¯n)(yny¯n)n.

Since the recurrence equations for the sample mean of the x and y values are:

x¯n=x¯n1x¯n1xnn and y¯n=y¯n1y¯n1ynn,

then we have:

Covn(X,Y)=i=1n1(xi(x¯n1x¯n1xnn))(yi(y¯n1y¯n1ynn))+(xnx¯n)(yny¯n)nCovn(X,Y)=i=1n1(xix¯n1+x¯n1xnn)(yiy¯n1+y¯n1ynn)+(xnx¯n)(yny¯n)nCovn(X,Y)=i=1n1(xiyixiy¯n1+xi(y¯n1ynn)x¯n1yi+x¯n1y¯n1x¯n1(y¯n1ynn)+(x¯n1xnn)yi(x¯n1xnn)y¯n1+(x¯n1xnn)(y¯n1ynn))+(xnx¯n)(yny¯n)nCovn(X,Y)=(i=1n1(xiyixiy¯n1x¯n1yi+x¯n1y¯n1)+i=1n1(xi(y¯n1ynn)x¯n1(y¯n1ynn))+i=1n1((x¯n1xnn)yi(x¯n1xnn)y¯n1+(x¯n1xnn)(y¯n1ynn))+(xnx¯n)(yny¯n))nCovn(X,Y)=(i=1n1(xix¯n1)(yiy¯n1)+(y¯n1ynn)i=1n1(xix¯n1)+(x¯n1xnn)i=1n1(yiy¯n1+(y¯n1ynn))+(xnx¯n)(yny¯n))n

Since the biased sample covariance of the x and y dimensions for the first n1 two-dimensional inputs is defined as:

Covn1(X,Y)=i=1n1(xix¯n1)(yiy¯n1)n1,

then we also have:

i=1n1(xix¯n1)(yiy¯n1)=(n1)Covn1(X,Y).

With this, we have:

Covn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)i=1n1(xix¯n1)+(x¯n1xnn)i=1n1(yiy¯n1+(y¯n1ynn))+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)(i=1n1(xi)+i=1n1(x¯n1))+(x¯n1xnn)(i=1n1(yi)+i=1n1(y¯n1)+i=1n1(y¯n1ynn))+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)(i=1n1(xi)x¯n1i=1n1(1))+(x¯n1xnn)(i=1n1(yi)y¯n1i=1n1(1)+(y¯n1ynn)i=1n1(1))+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)(i=1n1(xi)x¯n1(n1))+(x¯n1xnn)(i=1n1(yi)y¯n1(n1)+(y¯n1ynn)(n1))+(xnx¯n)(yny¯n))n

Since the sample mean for the first n1 x and y values are defined as:

x¯n1=i=1n1xin1 and y¯n1=i=1n1yin1,

then we also have:

i=1n1xi=x¯n1(n1) and i=1n1yi=y¯n1(n1).

With that, we have:

Covn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)(x¯n1(n1)x¯n1(n1))+(x¯n1xnn)(y¯n1(n1)y¯n1(n1)+(y¯n1ynn)(n1))+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(y¯n1ynn)(x¯n1(n1)x¯n1(n1))+(x¯n1xnn)(y¯n1(n1)y¯n1(n1)+(y¯n1ynn)(n1))+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(x¯n1xnn)(y¯n1ynn)(n1)+(xnx¯n)(yny¯n))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+xnynxny¯nx¯nyn+x¯ny¯n)n

Since the recurrence equation for the sample mean of the y values is:

y¯n=y¯n1y¯n1ynn,

then we have:

Covn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+xnynxn(y¯n1y¯n1ynn)x¯nyn+x¯n(y¯n1y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+xnynxny¯n1+xn(y¯n1ynn)x¯nyn+x¯ny¯n1x¯n(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+xnynxny¯n1x¯nyn+x¯ny¯n1+(xnx¯n)(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)+(xnx¯n)(y¯n1ynn))n

Since the recurrence equation for the sample mean of the x values is:

x¯n=x¯n1x¯n1xnnx¯n=nx¯n1n+x¯n1+xnnx¯n=(n1)x¯n1+xnn,

then we have:

Covn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)+(xn((n1)x¯n1+xnn))(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)+(nxnn+(n1)x¯n1xnn)(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)+((n1)x¯n1+(n1)xnn)(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)(n1)(x¯n1xnn)(y¯n1ynn))nCovn(X,Y)=((n1)Covn1(X,Y)+(n1)(x¯n1xnn)(y¯n1ynn)+(xnx¯n)(yny¯n1)(n1)(x¯n1xnn)(y¯n1ynn))nCovn(X,Y)=(n1)Covn1(X,Y)+(xnx¯n)(yny¯n1)nCovn(X,Y)=nCovn1(X,Y)Covn1(X,Y)+(xnx¯n)(yny¯n1)nCovn(X,Y)=nCovn1(X,Y)n+(Covn1(X,Y)(xnx¯n)(yny¯n1))nCovn(X,Y)=Covn1(X,Y)Covn1(X,Y)(xnx¯n)(yny¯n1)n

Therefore, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:

Covn(X,Y)=Covn1(X,Y)Covn1(X,Y)(xnx¯n)(yny¯n1)n

 

Note: We can manipulate this recurrence equation such as that we also have:

Covn(X,Y)=Covn1(X,Y)Covn1(X,Y)(xnx¯n1)(yny¯n)n,

Covn(X,Y)=Covn1(X,Y)Covn1(X,Y)(n1n)(xnx¯n1)(yny¯n1)n,

and

Covn(X,Y)=(n1)(Covn1(X,Y)+(xnx¯n1)(yny¯n1)n)n.

Reference:
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance


Example of C++ code that computes the online covariance:

// Filename: main.cpp
#include <iostream>
#include <iomanip>
 
int main() {
 
    double x;
    double y;
    double n = 0;
    double mean_x = 0;  // mean of the x values
    double mean_y = 0;  // mean of the y values
    double cov = 0;     // covariance of the x and y values
    double prev_mean_x; // previous mean of the x values
    double prev_mean_y; // previous mean of the y values
    double prev_cov;    // previous covariance of the x and y values
 
    if ( std::cin >> x && std::cin >> y ) {
        ++n;
        mean_x = x;
        mean_y = y;
        cov = 0;
        while ( std::cin >> x && std::cin >> y ) {
            prev_mean_x = mean_x;
            prev_mean_y = mean_y;
            prev_cov = cov;
            ++n;
            mean_x = prev_mean_x - ( prev_mean_x - x ) / n;
            mean_y = prev_mean_y - ( prev_mean_y - y ) / n;
            cov = prev_cov - ( prev_cov - ( x - mean_x ) * ( y - prev_mean_y ) ) / n;
        }
    }
 
    std::cout << "n:      " << n << '\n';
    std::cout << "mean_x: " << std::setprecision( 17 ) << mean_x << '\n';
    std::cout << "mean_y: " << std::setprecision( 17 ) << mean_y << '\n';
    std::cout << "cov:    " << std::setprecision( 17 ) << cov << '\n';
 
}

Example of data.txt:

-281.189       612.083
974.663        -24.0965
25.8526        401.539
.              .
.              .
.              .

Command Line:

g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static
./main.exe < data.txt

 

Note: Mathematica’s Covariance[] function computes the unbiased sample covariance matrix, not the biased sample covariance matrix; therefore, the biased sample covariance matrix is computed in Mathematica as:

( ( Length[ list ] - 1 ) / Length[ list ] ) * Covariance[ list ]

 

Online Covariance
online_covariance.pdf
online_covariance.docx

Tagged on:

One thought on “Online Covariance

  1. Joshua Burkholder Post author

    This is also derived from the following:

Leave a Reply

Your email address will not be published. Required fields are marked *