Online Weighted Mean

Given the following set of inputs and their associated weights:

{(x1,w1),(x2,w2),,(xn1,wn1),(xn,wn)}

Let n be the number of inputs and their associated weights, x¯nweighted is the weighted sample mean for the first n inputs and their associated weights, x¯n1weighted be the weighted sample mean of the first n1 inputs and their associated weights, wn be the n -th weight associated with input xn, and xn be the n -th input associated with wn. Then, the recurrence equation for the weighted sample mean (a.k.a. online weighted mean) is:

x¯nweighted=x¯n1weightedwn(x¯n1weightedxn)i=1nwi

where i=1nwi0. Preferably, all the weights are positive such that i=1nwi>0.

 

Proof:

The definition of the sample mean is:

x¯n=i=1nxin

The definition of the weighted sample mean is:

x¯nweighted=i=1nwixii=1nwi

If we expand this definition, we have:

x¯nweighted=i=1n1wixi+wnxni=1n1wi+wn

From algebra, we know that for arbitrary a, b, c, and d:

a+bc+d=a+bc+d+acac=ac+a+bc+dac=ac+(a+bc+d)(cc)(ac)(c+dc+d)=ac+ac+bcacadc(c+d)=ac+ac+bcacadc(c+d)=ac+bcadc(c+d)=ac+bcc(c+d)+adc(c+d)=ac+bcc(c+d)+adc(c+d)=ac+adc(c+d)+b(c+d)

Hence, we have:

x¯nweighted=i=1n1wixia+wnxnbi=1n1wic+wnd=(i=1n1wixi)(i=1n1wi)+(i=1n1wixi)(wn)(i=1n1wi)((i=1n1wi)+(wn))+(wnxn)((i=1n1wi)+(wn))=(i=1n1wixii=1n1wi)(i=1n1wixii=1n1wi)(wni=1nwi)+(wnxn)(i=1nwi)

Since the weighted sample mean for the first n1 inputs and their associated weights is defined as x¯n1weighted=i=1n1wixii=1n1wi , we have:

x¯nweighted=(x¯n1weighted)(x¯n1weighted)(wni=1nwi)+(wnxn)(i=1nwi)

Factoring out the 1, we have:

x¯nweighted=(x¯n1weighted)((x¯n1weighted)(wni=1nwi)(wnxn)(i=1nwi))

Combining the fractions and factoring out the wn, we have:

x¯nweighted=x¯n1weighted(x¯n1weightedwnwnxni=1nwi)=x¯n1weightedwn(x¯n1weightedxn)i=1nwi

Therefore, the recurrence equation for the weighted sample mean (a.k.a. online weighted mean) is:

x¯nweighted=x¯n1weightedwn(x¯n1weightedxn)i=1nwi

where i=1nwi0.

Note: If all the weights are the same constant value c (i.e. wi=c for i=1,,n ), the weighted sample mean would be:

x¯weighted=i=1nwixii=1nwi=i=1ncxii=1nc=c(i=1nxi)c(i=1n1)=c(i=1nxi)c(n)=i=1nxin=x¯

For instance, if all the weights are 1, then the weighted sample mean is the sample mean:

x¯weighted=i=1nwixii=1nwi=i=1n(1)xii=1n(1)=i=1nxin=x¯

Similarly, the online weighted mean with weights of the same constant value c would be:

x¯nweighted=x¯n1weightedwn(x¯n1weightedxn)i=1nwi=x¯n1weightedc(x¯n1weightedxn)i=1nc=x¯n1weightedc(x¯n1weightedxn)c(i=1n1)=x¯n1weightedc(x¯n1weightedxn)c(n)=x¯n1weighted(x¯n1weightedxn)n=x¯n1(x¯n1xn)n=x¯n

Therefore, if all the weights are the same constant value c, the online weighted mean is the same as the online mean.


Example of C++ code that computes the online weighted mean:

#include <iostream>
#include <iomanip>
 
int main() {
 
    double x;
    double weight;
    double sum_of_weights = 0;
    double weighted_mean = 0;
    double prev_weighted_mean;
 
    if ( std::cin >> x && std::cin >> weight ) {
        sum_of_weights += weight;
        weighted_mean = x;
        while ( std::cin >> x && std::cin >> weight ) {
            prev_weighted_mean = weighted_mean;
            sum_of_weights += weight;
            weighted_mean = (
                prev_weighted_mean - weight * ( prev_weighted_mean - x ) / sum_of_weights
            );
        }
    }
 
    std::cout << "sum_of_weights: " << std::setprecision( 17 ) << sum_of_weights << '\n';
    std::cout << "weighted_mean:  " << std::setprecision( 17 ) << weighted_mean << '\n';
}

Example of data.txt:

-19.313117172629575    2.718281828459045
-34.14656787734913     7.38905609893065
-14.117521595690334    20.085536923187668
.                      .
.                      .
.                      .

Command line:

g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static
./main.exe < data.txt

Sample Output:

sum_of_weights: 34843.773845331321
weighted_mean:  -28.368899576339764

 

Online Weighted Mean
online_weighted_mean.pdf
online_weighted_mean.docx

Tagged on:

Leave a Reply

Your email address will not be published. Required fields are marked *