Online Variance

Let n be the number of values, vn be the biased sample variance of the first n values, vn1 be the biased sample variance for the first n1 values, xn be the n -th value, x¯n be the sample mean of the first n values, and x¯n1 be the sample mean of the first n1 values. Then, the recurrence equation for the biased sample variance (a.k.a. online variance) is:

vn=vn1vn1(xnx¯n)(xnx¯n1)n

 

Proof:

The definition of the biased sample variance of the first n values is defined as:

vn=i=1n(xix¯n)2n

If we expand this definition, we have:

vn=i=1n(xi22xix¯n+x¯n2)nvn=i=1n1(xi22xix¯n+x¯n2)+xn22xnx¯n+x¯n2n

Since the recurrence equation for the sample mean is:

x¯n=x¯n1x¯n1xnn ,

then we also have:

x¯n2=(x¯n1x¯n1xnn)2x¯n2=x¯n122x¯n1(x¯n1xnn)+(x¯n1xnn)2

With these, we have:

vn=i=1n1(xi22xi(x¯n1x¯n1xnn)+(x¯n122x¯n1(x¯n1xnn)+(x¯n1xnn)2))+xn22xnx¯n+x¯n2nvn=i=1n1(xi22xix¯n1+2xi(x¯n1xnn)+x¯n122x¯n1(x¯n1xnn)+(x¯n1xnn)2)+xn22xnx¯n+x¯n2nvn=i=1n1(xi22xix¯n1+x¯n12+2xi(x¯n1xnn)2x¯n1(x¯n1xnn)+(x¯n1xnn)2)+xn22xnx¯n+x¯n2nvn=i=1n1((xix¯n1)2+2xi(x¯n1xnn)2x¯n1(x¯n1xnn)+(x¯n1xnn)2)+xn22xnx¯n+x¯n2nvn=i=1n1(xix¯n1)2+i=1n1(2xi(x¯n1xnn)2x¯n1(x¯n1xnn)+(x¯n1xnn)2)+xn22xnx¯n+x¯n2n

Since the biased sample variance for the first n1 values is:

vn1=i=1n1(xix¯n1)2n1 ,

then we also have:

i=1n1(xix¯n1)2=(n1)vn1 .

With this, we have:

vn=(n1)vn1+i=1n1(2xi(x¯n1xnn)2x¯n1(x¯n1xnn)+(x¯n1xnn)2)+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(i=1n1(2xi2x¯n1+(x¯n1xnn)))+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(i=1n1(2xi)+i=1n1(2x¯n1)+i=1n1(x¯n1xnn))+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(2i=1n1(xi)2x¯n1i=1n1(1)+(x¯n1xnn)i=1n1(1))+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(2i=1n1(xi)2x¯n1(n1)+(x¯n1xnn)(n1))+xn22xnx¯n+x¯n2n

Since the definition of the sample mean for the first n1 values is:

x¯n1=i=1n1xin1 ,

then we also have:

i=1n1xi=(n1)x¯n1.

With this, we have:

vn=(n1)vn1+(x¯n1xnn)(2(n1)x¯n12x¯n1(n1)+(x¯n1xnn)(n1))+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(2(n1)x¯n12(n1)x¯n1+(x¯n1xnn)(n1))+xn22xnx¯n+x¯n2nvn=(n1)vn1+(x¯n1xnn)(x¯n1xnn)(n1)+xn22xnx¯n+x¯n2nvn=(n1)vn1+xn2+(n1)(x¯n122xnx¯n1+xn2n2)2xnx¯n+x¯n2nvn=(n1)vn1+xn2+(n1)x¯n12n22(n1)xnx¯n1n2+(n1)xn2n22xnx¯n+x¯n2n

Since the recurrence equation for the sample mean is:

x¯n=x¯n1x¯n1xnn,

then we also have:

x¯n=nx¯n1x¯n1+xnnx¯n=(n1)x¯n1+xnn

Moreover, we have:

x¯n2=((n1)x¯n1+xnn)2x¯n2=((n1)x¯n1+xn)2n2x¯n2=(n1)2x¯n12+2(n1)xnx¯n1+xn2n2x¯n2=(n1)2x¯n12n2+2(n1)xnx¯n1n2+xn2n2

With this, we have:

vn=(n1)vn1+xn2+(n1)x¯n12n22(n1)xnx¯n1n2+(n1)xn2n22xnx¯n+((n1)2x¯n12n2+2(n1)xnx¯n1n2+xn2n2)nvn=(n1)vn1+xn2+(n1)x¯n12n22(n1)xnx¯n1n2+(n1)xn2n22xnx¯n+(n1)2x¯n12n2+2(n1)xnx¯n1n2+xn2n2nvn=(n1)vn1+xn2+(n1)2x¯n12n2+(n1)x¯n12n2+(n1)xn2n2+xn2n22xnx¯nnvn=(n1)vn1+xn2+((n1)2+(n1))(x¯n12n2)+((n1)+1)(xn2n2)2xnx¯nnvn=(n1)vn1+xn2+((n1)((n1)+1))(x¯n12n2)+(n)(xn2n2)2xnx¯nnvn=(n1)vn1+xn2+((n1)(n))(x¯n12n2)+(n)(xn2n2)2xnx¯nnvn=(n1)vn1+xn2+(n1)(x¯n12n)+xn2n2xnx¯nnvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12n+xn2nxnx¯nnvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12nnxnx¯nn+xn2nnvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12nnxnx¯nxn2nnvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12n(nx¯nxn)xnnnvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12n(n1n1)((nx¯nxn)xnn)nvn=(n1)vn1+xn2xnx¯n+(n1)x¯n12n((n1)xnn)(nx¯nxnn1)n

As previously noted, the recurrence equation for the sample mean can be rewritten as:

x¯n=(n1)x¯n1+xnn,

then we have:

(n1)x¯n1+xnn=x¯n(n1)x¯n1+xn=nx¯n(n1)x¯n1=nx¯nxnx¯n1=nx¯nxnn1

With this, we have:

vn=(n1)vn1+xn2xnx¯n+(n1)x¯n12n((n1)xnn)(x¯n1)nvn=(n1)vn1+xn2xnx¯n+((n1)x¯n1n(nxnxnn))(x¯n1)nvn=(n1)vn1+xn2xnx¯n+((n1)x¯n1nnxnn+xnn)(x¯n1)nvn=(n1)vn1+xn2xnx¯n+(((n1)x¯n1+xnn)xn)(x¯n1)n

Since the recurrence equation of the sample mean can be rewritten as:

x¯n=(n1)x¯n1+xnn,

then we have:

vn=(n1)vn1+xn2xnx¯n+(x¯nxn)(x¯n1)nvn=(n1)vn1+xn2xnx¯n+x¯nx¯n1xnx¯n1nvn=(n1)vn1+xn2xnx¯nxnx¯n1+x¯nx¯n1nvn=(n1)vn1+(xnx¯n)(xnx¯n1)nvn=nvn1vn1+(xnx¯n)(xnx¯n1)nvn=nvn1n+vn1+(xnx¯n)(xnx¯n1)nvn=vn1vn1(xnx¯n)(xnx¯n1)n

Therefore, the recurrence equation for the biased sample variance (a.k.a. online variance) is:

vn=vn1vn1(xnx¯n)(xnx¯n1)n


Example C++ code that computes the online variance:

// Filename: main.cpp
#include <iostream>
#include <iomanip>
 
int main() {
 
    double x;
    double n = 0;
    double mean = 0;
    double variance = 0;
    double prev_mean; // previous mean
    double prev_variance; // previous variance
 
    if ( std::cin >> x ) {
        ++n;
        mean = x;
        variance = 0;
        while ( std::cin >> x ) {
            prev_mean = mean;
            prev_variance = variance;
            ++n;
            mean = prev_mean - ( prev_mean - x ) / n;
            variance = prev_variance - ( prev_variance - ( x - mean ) * ( x - prev_mean ) ) / n;
        }
    }
 
    std::cout << "n:        " << n << '\n';
    std::cout << "mean:     " << std::setprecision( 17 ) << mean << '\n';
    std::cout << "variance: " << std::setprecision( 17 ) << variance << '\n';
 
}

Example of data.txt:

-281.189
974.663
25.8526
.
.
.

Command Line:

g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static
./main.exe < data.txt

Note: Mathematica’s Variance[] function computes the unbiased sample variance, not the biased sample variance; therefore, the biased sample variance is computed in Mathematica as:

( ( Length[ list ] – 1 ) / Length[ list ] ) * Variance[ list ]

 

Online Variance
online_variance.pdf
online_variance.docx

Tagged on:

One thought on “Online Variance

  1. Joshua Burkholder Post author

    This is also derived from the following:

Leave a Reply

Your email address will not be published. Required fields are marked *