Archibald Haddock (123456789) COMP 202-A, Section 0 (Fall 2007) Instructor: Cuthbert Calculus Assignment 3, README The formula for standard deviation is the following: sd = sqrt(s/n) where sqrt(x) represents the square root of x. In the above, s is defined as: s = sum((xi - xbar)^2) where sum(x) represents the sum of all x terms for i from 1 to n, and xbar represents the average of all n xi values, that is, sum(xi) / n Expanding the term (xi - xbar)^2, we get: (xi - xbar)^2 = (xi - xbar)(xi - xbar) = xi*xi - xi*xbar - xi*xbar + xbar*xbar = xi^2 - 2*xi*xbar + xbar^2 Therefore, s = sum(xi^2 - 2*xi*xbar + xbar^2) For any x and y, sum(x + y) = sum(x) + sum(y). Therefore, s = sum(xi^2) - sum(2*xi*xbar) + sum(xbar^2) For any constant c, sum(c * x) = c * sum(x). Therefore, s = sum(xi^2) - 2*xbar*sum(xi) + (xbar^2)*sum(1) sum(1) = n. Therefore, s = sum(xi^2) - 2*xbar*sum(xi) + (xbar^2)*n But xbar = sum(xi) / n. Thus, sum(xi) = xbar * n. Therefore, s = sum(xi^2) - 2*xbar*xbar*n + (xbar^2)*n = sum(xi^2) - 2*(xbar^2)*n + (xbar^2)*n = sum(xi^2) - (xbar^2)*n Thus, at every iteration, the only value that absolutely needs to be updated is the sum of the squares of the word lengths. The square of the average word length times the number of words can be subtracted from this sum of squares after all the words have been isolated, and is thus not needed before the average has been computed, avoiding the need for two passes through the String.