Online Weighted Mean

by Joshua Burkholder

online_weighted_mean.pdf

online_weighted_mean.docx

Given the following set of inputs and their associated weights:

$\left\{\left({x}_{1},{w}_{1}\right),\left({x}_{2},{w}_{2}\right),\dots ,\left({x}_{n-1},{w}_{n-1}\right),\left({x}_{n},{w}_{n}\right)\right\}$

Let $n$ be the number of inputs and their associated weights, ${\overline{x}}_{n}^{\text{weighted}}$ is the weighted sample mean for the first $n$ inputs and their associated weights, ${\overline{x}}_{n-1}^{\text{weighted}}$ be the weighted sample mean of the first $n-1$ inputs and their associated weights, ${w}_{n}$ be the $n$ -th weight associated with input ${x}_{n}$, and ${x}_{n}$ be the $n$ -th input associated with ${w}_{n}$.  Then, the recurrence equation for the weighted sample mean (a.k.a. online weighted mean) is:

${\overline{x}}_{n}^{\text{weighted}}={\overline{x}}_{n-1}^{\text{weighted}}-\frac{{w}_{n}\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\sum _{i=1}^{n}{w}_{i}}$

where $\sum _{i=1}^{n}{w}_{i}\ne 0$.  Preferably, all the weights are positive such that $\sum _{i=1}^{n}{w}_{i}>0$.

Proof:

The definition of the sample mean is:

${\overline{x}}_{n}=\frac{\sum _{i=1}^{n}{x}_{i}}{n}$

The definition of the weighted sample mean is:

${\overline{x}}_{n}^{\text{weighted}}=\frac{\sum _{i=1}^{n}{w}_{i}{x}_{i}}{\sum _{i=1}^{n}{w}_{i}}$

If we expand this definition, we have:

${\overline{x}}_{n}^{\text{weighted}}=\frac{\sum _{i=1}^{n-1}{w}_{i}{x}_{i}+{w}_{n}{x}_{n}}{\sum _{i=1}^{n-1}{w}_{i}+{w}_{n}}$

From algebra, we know that for arbitrary $a$, $b$, $c$, and $d$:

$\begin{array}{c}\frac{a+b}{c+d}=\frac{a+b}{c+d}+\frac{a}{c}-\frac{a}{c}\\ =\frac{a}{c}+\frac{a+b}{c+d}-\frac{a}{c}\\ =\frac{a}{c}+\left(\frac{a+b}{c+d}\right)\left(\frac{c}{c}\right)-\left(\frac{a}{c}\right)\left(\frac{c+d}{c+d}\right)\\ =\frac{a}{c}+\frac{ac+bc-ac-ad}{c\left(c+d\right)}\\ =\frac{a}{c}+\frac{\overline{)ac}+bc-\overline{)ac}-ad}{c\left(c+d\right)}\\ =\frac{a}{c}+\frac{bc-ad}{c\left(c+d\right)}\\ =\frac{a}{c}+\frac{bc}{c\left(c+d\right)}+\frac{-ad}{c\left(c+d\right)}\\ =\frac{a}{c}+\frac{b\overline{)c}}{\overline{)c}\left(c+d\right)}+\frac{-ad}{c\left(c+d\right)}\\ =\frac{a}{c}+\frac{-ad}{c\left(c+d\right)}+\frac{b}{\left(c+d\right)}\end{array}$

Hence, we have:

$\begin{array}{c}{\overline{x}}_{n}^{\text{weighted}}=\frac{\stackrel{a}{\overbrace{\sum _{i=1}^{n-1}{w}_{i}{x}_{i}}}+\stackrel{b}{\overbrace{{w}_{n}{x}_{n}}}}{\underset{c}{\underbrace{\sum _{i=1}^{n-1}{w}_{i}}}+\underset{d}{\underbrace{{w}_{n}}}}\\ =\frac{\left(\sum _{i=1}^{n-1}{w}_{i}{x}_{i}\right)}{\left(\sum _{i=1}^{n-1}{w}_{i}\right)}+\frac{-\left(\sum _{i=1}^{n-1}{w}_{i}{x}_{i}\right)\left({w}_{n}\right)}{\left(\sum _{i=1}^{n-1}{w}_{i}\right)\left(\left(\sum _{i=1}^{n-1}{w}_{i}\right)+\left({w}_{n}\right)\right)}+\frac{\left({w}_{n}{x}_{n}\right)}{\left(\left(\sum _{i=1}^{n-1}{w}_{i}\right)+\left({w}_{n}\right)\right)}\\ =\left(\frac{\sum _{i=1}^{n-1}{w}_{i}{x}_{i}}{\sum _{i=1}^{n-1}{w}_{i}}\right)-\left(\frac{\sum _{i=1}^{n-1}{w}_{i}{x}_{i}}{\sum _{i=1}^{n-1}{w}_{i}}\right)\left(\frac{{w}_{n}}{\sum _{i=1}^{n}{w}_{i}}\right)+\frac{\left({w}_{n}{x}_{n}\right)}{\left(\sum _{i=1}^{n}{w}_{i}\right)}\end{array}$

Since the weighted sample mean for the first $n-1$ inputs and their associated weights is defined as ${\overline{x}}_{n-1}^{\text{weighted}}=\frac{\sum _{i=1}^{n-1}{w}_{i}{x}_{i}}{\sum _{i=1}^{n-1}{w}_{i}}$ , we have:

${\overline{x}}_{n}^{\text{weighted}}=\left({\overline{x}}_{n-1}^{\text{weighted}}\right)-\left({\overline{x}}_{n-1}^{\text{weighted}}\right)\left(\frac{{w}_{n}}{\sum _{i=1}^{n}{w}_{i}}\right)+\frac{\left({w}_{n}{x}_{n}\right)}{\left(\sum _{i=1}^{n}{w}_{i}\right)}$

Factoring out the $-1$, we have:

${\overline{x}}_{n}^{\text{weighted}}=\left({\overline{x}}_{n-1}^{\text{weighted}}\right)-\left(\left({\overline{x}}_{n-1}^{\text{weighted}}\right)\left(\frac{{w}_{n}}{\sum _{i=1}^{n}{w}_{i}}\right)-\frac{\left({w}_{n}{x}_{n}\right)}{\left(\sum _{i=1}^{n}{w}_{i}\right)}\right)$

Combining the fractions and factoring out the ${w}_{n}$, we have:

$\begin{array}{c}{\overline{x}}_{n}^{\text{weighted}}={\overline{x}}_{n-1}^{\text{weighted}}-\left(\frac{{\overline{x}}_{n-1}^{\text{weighted}}{w}_{n}-{w}_{n}{x}_{n}}{\sum _{i=1}^{n}{w}_{i}}\right)\\ ={\overline{x}}_{n-1}^{\text{weighted}}-\frac{{w}_{n}\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\sum _{i=1}^{n}{w}_{i}}\end{array}$

Therefore, the recurrence equation for the weighted sample mean (a.k.a. online weighted mean) is:

${\overline{x}}_{n}^{\text{weighted}}={\overline{x}}_{n-1}^{\text{weighted}}-\frac{{w}_{n}\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\sum _{i=1}^{n}{w}_{i}}$

where $\sum _{i=1}^{n}{w}_{i}\ne 0$.

Note: If all the weights are the same constant value $c$ (i.e. ${w}_{i}=c$ for $i=1,\dots ,n$ ), the weighted sample mean would be:

$\begin{array}{c}{\overline{x}}^{\text{weighted}}=\frac{\sum _{i=1}^{n}{w}_{i}{x}_{i}}{\sum _{i=1}^{n}{w}_{i}}\\ =\frac{\sum _{i=1}^{n}c{x}_{i}}{\sum _{i=1}^{n}c}\\ =\frac{c\left(\sum _{i=1}^{n}{x}_{i}\right)}{c\left(\sum _{i=1}^{n}1\right)}\\ =\frac{\overline{)c}\left(\sum _{i=1}^{n}{x}_{i}\right)}{\overline{)c}\left(n\right)}\\ =\frac{\sum _{i=1}^{n}{x}_{i}}{n}\\ =\overline{x}\end{array}$

For instance, if all the weights are $1$, then the weighted sample mean is the sample mean:

$\begin{array}{c}{\overline{x}}^{\text{weighted}}=\frac{\sum _{i=1}^{n}{w}_{i}{x}_{i}}{\sum _{i=1}^{n}{w}_{i}}\\ =\frac{\sum _{i=1}^{n}\left(1\right){x}_{i}}{\sum _{i=1}^{n}\left(1\right)}\\ =\frac{\sum _{i=1}^{n}{x}_{i}}{n}\\ =\overline{x}\end{array}$

Similarly, the online weighted mean with weights of the same constant value $c$ would be:

$\begin{array}{c}{\overline{x}}_{n}^{\text{weighted}}={\overline{x}}_{n-1}^{\text{weighted}}-\frac{{w}_{n}\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\sum _{i=1}^{n}{w}_{i}}\\ ={\overline{x}}_{n-1}^{\text{weighted}}-\frac{c\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\sum _{i=1}^{n}c}\\ ={\overline{x}}_{n-1}^{\text{weighted}}-\frac{c\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{c\left(\sum _{i=1}^{n}1\right)}\\ ={\overline{x}}_{n-1}^{\text{weighted}}-\frac{\overline{)c}\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{\overline{)c}\left(n\right)}\\ ={\overline{x}}_{n-1}^{\text{weighted}}-\frac{\left({\overline{x}}_{n-1}^{\text{weighted}}-{x}_{n}\right)}{n}\\ ={\overline{x}}_{n-1}-\frac{\left({\overline{x}}_{n-1}-{x}_{n}\right)}{n}\\ ={\overline{x}}_{n}\end{array}$

Therefore, if all the weights are the same constant value $c$, the online weighted mean is the same as the online mean.

Example of C++ code that computes the online weighted mean:

#include <iostream>

#include <iomanip>

int main () {

double x;

double weight;

double sum_of_weights = 0;

double weighted_mean = 0;

double prev_weighted_mean;

if ( std::cin >> x && std::cin >> weight ) {

sum_of_weights += weight;

weighted_mean = x;

while ( std::cin >> x && std::cin >> weight ) {

prev_weighted_mean = weighted_mean;

sum_of_weights += weight;

weighted_mean = (

prev_weighted_mean - weight * ( prev_weighted_mean - x ) / sum_of_weights

);

}

}

std::cout << "sum_of_weights: " << std::setprecision( 17 ) << sum_of_weights << '\n';

std::cout << "weighted_mean:  " << std::setprecision( 17 ) << weighted_mean  << '\n';

}

Example of data.txt:

-19.313117172629575    2.718281828459045

-34.14656787734913     7.38905609893065

-14.117521595690334    20.085536923187668

.                      .

.                      .

.                      .

Command line:

g++ -o main.exe main.cpp -std=c++11 -march=native -O3 -Wall -Wextra -Werror -static

./main.exe < data.txt

Sample Output:

sum_of_weights: 34843.773845331321

weighted_mean:  -28.368899576339764