Online Covariance
Given the following set of two-dimensional inputs:
Let be the number of two-dimensional inputs, represent the dimension, represent the dimension, be the biased sample covariance of the and dimensions for the first two-dimensional inputs, be the biased sample covariance of the and dimensions for the first two-dimensional inputs, be the value of the -th two-dimensional input, be the sample mean of the values for the first two-dimensional inputs, be the value of the -th two-dimensional input, and be the sample mean of the values for the first two-dimensional inputs. Then, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:
Note: The recurrence equation above also applies when computing the online covariance matrix:
.
However, we will restrict ourselves to the online covariance computation of two-dimensional input in this post and explore the online covariance matrix computation of -dimensional input in a later post.
Proof:
The definition of the biased sample covariance of the and dimensions for the first two-dimensional inputs is defined as:
.
If we expand this definition, we have:
.
Since the recurrence equations for the sample mean of the and values are:
and ,
then we have:
Since the biased sample covariance of the and dimensions for the first two-dimensional inputs is defined as:
,
then we also have:
.
With this, we have:
Since the sample mean for the first and values are defined as:
and ,
then we also have:
and .
With that, we have:
Since the recurrence equation for the sample mean of the values is:
,
then we have:
Since the recurrence equation for the sample mean of the values is:
then we have:
Therefore, the recurrence equation for the biased sample covariance (a.k.a. online covariance) is:
Note: We can manipulate this recurrence equation such as that we also have:
,
,
and
.
Reference:
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Example of C++ code that computes the online covariance:
// Filename: main.cpp #include <iostream> #include <iomanip> int main() { double x; double y; double n = 0; double mean_x = 0; // mean of the x values double mean_y = 0; // mean of the y values double cov = 0; // covariance of the x and y values double prev_mean_x; // previous mean of the x values double prev_mean_y; // previous mean of the y values double prev_cov; // previous covariance of the x and y values if ( std::cin >> x && std::cin >> y ) { ++n; mean_x = x; mean_y = y; cov = 0; while ( std::cin >> x && std::cin >> y ) { prev_mean_x = mean_x; prev_mean_y = mean_y; prev_cov = cov; ++n; mean_x = prev_mean_x - ( prev_mean_x - x ) / n; mean_y = prev_mean_y - ( prev_mean_y - y ) / n; cov = prev_cov - ( prev_cov - ( x - mean_x ) * ( y - prev_mean_y ) ) / n; } } std::cout << "n: " << n << '\n'; std::cout << "mean_x: " << std::setprecision( 17 ) << mean_x << '\n'; std::cout << "mean_y: " << std::setprecision( 17 ) << mean_y << '\n'; std::cout << "cov: " << std::setprecision( 17 ) << cov << '\n'; }
Example of data.txt:
-281.189 612.083 974.663 -24.0965 25.8526 401.539 . . . . . .
Command Line:
g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static ./main.exe < data.txt
Note: Mathematica’s Covariance[]
function computes the unbiased sample covariance matrix, not the biased sample covariance matrix; therefore, the biased sample covariance matrix is computed in Mathematica as:
( ( Length[ list ] - 1 ) / Length[ list ] ) * Covariance[ list ]
Online Covariance
online_covariance.pdf
online_covariance.docx
- Online Variance
- Online Weighted Mean
This is also derived from the following:
Link: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.7503
Link: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.302.7503&rep=rep1&type=pdf
Related Link: http://webmail.cs.yale.edu/publications/techreports/tr222.pdf
Hello,
I just tried your online covariance formula and it is very precise. Thank you for it, and for the mathematical demonstration. Do you have anything published in which you included it (just the formula), or should I just reference the website?
Please reference the website. Thanks for your interest.