#### • Class 11 Physics Demo

Explore Related Concepts

# formula for cumulative frequency distribution

From Wikipedia

Cumulative distribution function

In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function, describes the probability that a real-valued random variableX with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution. Cumulative distribution functions are also used to specify the distribution of multivariaterandom variables.

## Definition

For every real numberx, the CDF of a real-valued random variableX is given by

x \mapsto F_X(x) = \operatorname{P}(X\leq x),

where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x. The probability that X lies in the interval (a,&nbsp;b) is therefore F_X(b)-F_X(a) if a&nbsp;<&nbsp;b.

If treating several random variables X,&nbsp;Y,&nbsp;... etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is omitted. It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions. This applies when discussing general distributions: some specific distributions have their own conventional notation, for example the normal distribution.

The CDF of X can be defined in terms of the probability density functionÆ’ as follows:

F(x) = \int_{-\infty}^x f(t)\,dt.

Note that in the definition above, the "less than or equal to" sign, "â‰¤", is a convention, not a universally used one (e.g. Hungarian literature uses "<"), but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depend upon this convention. Moreover, important formulas like Levy's inversion formula for the characteristic function also rely on the "less or equal" formulation.

## Properties

Every cumulative distribution function F is (not necessarily strictly) monotone non-decreasing (see monotone increasing) and right-continuous. Furthermore, we have

Every function with these four properties is a CDF. The properties imply that all CDFs are cÃ dlÃ g functions.

If X is a discrete random variable, then it attains values x1, x2, ... with probability pi = P(xi), and the CDF of X will be discontinuous at the points xi and constant in between:

F(x) = \operatorname{P}(X\leq x) = \sum_{x_i \leq x} \operatorname{P}(X = x_i) = \sum_{x_i \leq x} p(x_i).

If the CDF F of X is continuous, then X is a continuous random variable; if furthermore F is absolutely continuous, then there exists a Lebesgue-integrable function f(x) such that

F(b)-F(a) = \operatorname{P}(a\leq X\leq b) = \int_a^b f(x)\,dx

for all real numbers a and b. (The first of the two equalities displayed above would not be correct in general if we had not said that the distribution is continuous. Continuity of the distribution implies that P (X = a) = P (X = b) = 0, so the difference between "<" and "â‰¤" ceases to be important in this context.) The function f is equal to the derivative of Falmost everywhere, and it is called the probability density function of the distribution of X.

### Point probability

The "point probability" that X is exactly b can be found as

\operatorname{P}(X=b) = F(b) - \lim_{x \to b^{-}} F(x).

## Kolmogorov&ndash;Smirnov and Kuiper's tests

The Kolmogorov&ndash;Smirnov test is based on cumulative distribution functions and can be used to test to see whether two empirical distributions are different or whether an empirical distribution is different from an ideal distribution. The closely related Kuiper's test (ËˆkaÉªpÉ™rz) is useful if the domain of the distribution is cyclic as in day of the week. For instance we might use Kuiper's test to see if the number of tornadoes varies during the year or if sales of a product vary by day of the week or day of the month.

## Complementary cumulative distribution function

Sometimes, it is useful to study the opposite question and ask how often the random variable is above a particular level. This is called the complementary cumulative distribution function (ccdf) or exceedance, and is defined as

F_c(x) = \operatorname{P}(X > x) = 1 - F(x).

This has applications in statisticalhypothesis testing, for example, because one-sided P-value is the probability of observing a test statistic at least as extreme as the one observed; hence, the one-sided P-value is simply given by the ccdf.

Question:The following figures give the numbers of children born to 50 women in a certain locality up to the age of 40 years; 1, 5, 1, 1, 2, 5, 9, 2, 6, 3, 5, 7, 8, 4, 6, 8,9, 10, 9, 3, 5, 7, 9, 9, 4, 5, 4, 5, 5, 7, 3,4, 2, 3, 4, 6, 3, 4, 2, 5, 6, 4, 0, 5, 6, 8, 5,4, 7, 6 Find the cumulative frequency distribution.

Answers:The power of excel ValueCountCumSum 133 247 3512 4820 51030 6636 7440 8343 9548 10149 11049 12049

Question:& cumulative relative frequency distribution showing "greather than or within" relative frequencies, in excel. oops I made a typo, I meant to say "greater than."

Answers:I will give you an example, by using the prime numbers up to 50. They are 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47. We begin by dividing into appropriate intervals, usually of the same length. We will divide into five equal intervals, The choice of how many intervals to use is purely stylistic. We make a rectangular chart. Down the left side, we list our intervals. Across the top, we list headings for columns that are, in order, frequency (how many in the interval), relative frequency, and cumulative frequency: Interval Freq. Rel. Freq. Cumul. Freq. 1-10.........4......4/15.=78.7 %. 4 11-20.......4......4/15=78.7%.. 8 21-30.......2......2/15=39.3%.. 10 31-40.......2......2/15=39.3%. 12 41-50.......3......3/15=20.0%. 15 Total.......15... ..1 = 100.0%. 15 To explain, we begin with frequency. There are 4 primes between 11 and 20, for example, so the frequency is 4. There are 15 primes in our whole sample, so the relative frequency for the intereval 11-20 is 4/15, or frequency divided by total. The cumulative frequency is the SUM of the frequencies up to here, or 4 + 4 = 8 (the first 4 is for the interval 1-10, and the second 4 is for the interval 11-20). Cumulative frequency, thus, is like keeping a running total. To check it, for the last interval in your table, it will always be the total number of data points in your sample. There are two ways to do this in excel. One is to compute it by hand as I have shown above, and then just enter these data in a spreadsheet. Then, use the usual commands for making pie charts and histograms. The second is to enter the data for frequency, and label your intervals. Then, you can insert the next two columns, with the insert column command. By using arrays, you can call a(n) the nth entry in the first column, and in this example, set b(n)=a(n)/15 for all n (I cannot remember the exact excel command here, but it is a lot like this). For the third column, you would put the first entry in (call it c(1), and then let c(n+1)=c(n)+a(n+1) for the rest of your table. That will give cumulative frequency. The commands are slightly dependent on your system and on how you begin, but this is the key idea.

Question:e.g. x 20 30 40 50 65 100 f 660 2140 1810 2150 2490 2770 the answer is 30 + (145/570) * 10 = 32.54 How do you get to it?

Answers:Your question appears to contain errors in it. First, f cannot be a cumulative frequency distribution since it decreases for certain values of x. A cumulative distribution should always be monotonically increasing. Therefore, I believe what you have is a frequency distribution. Second, you need to identify how the x bins are set up. That is, x1 x2 x3 f1 f2 f3 Are f1, f2, and f3 the #s less than x1, between x1 and x2, and between x2 and x3 respectively? Or are they between x1 and x2, between x2 and x3, and beyond x3 respectively? I'm going to assume the former scenario. Let N be the total number of observations. Let F(x) be the *cumulative* frequency, the number of observations less than x. Thus we have: x 20 30 40 50 65 100 F 660 2800 4610 6760 9250 12020 The median is the x value for which F(x) = (N+1)/2 (+1 because N is even in this case). From the values of F, and since (N+1)/2 = 6010.5, the median must be between 40 and 50. Typically, to get the value, one linearly interpolates the value using the bracketing F values. Thus, (N+1)/2 = F(x1) + (F(x2)-F(x1))/(x2-x1) * (x - x1) Solve for x: x = x1 + ( (N+1)/2 - F(x1) )*(x2-x1)/(F(x2)-F(x1)) = 40 + (12021 / 2 - 4610)*(10)/(6760-4610) = 40 + (2801 / 2150) * 10 = 46.51395 Obviously this is not what you are quoting, and perhaps I have misunderstood your setup somehow. The above should be correct given the discussion I provided, and it should give you enough to figure out your problem if I in fact have misunderstood your question. Good luck.

Question:Hi, I am calculating statistics ie, median, upper and lower quartiles from histogram data and have been given class intervals and frequencies. I know how to calculate the cumulative frequency but when it comes to the written working for calculating the upper and lower quartiles it's asking for required cumulative frequencies?? The equation is: Lower end (of interval) + Interval Width x (Required Cumulative Frequency - Cumulative frequency at start of interval, divided by frequency in interval) Sorry if this is confusing, if I just knew what the required cumulative frequency was I'd know how to do it. Any help much appreciated.

Answers:At the median the cumulative frequency should be 50% of the total (that is, the total frequency), at the lower quartile it should be 25% and at the upper quartile it should be 75%. I think this is what the formula means by 'Required Cumulative Frequency'. If you choose the interval which should contain those cumulative frequencies (say, for the lower quartile, one that ranges from 23% to 31%) then that formula should give you an estimate of the value for that cumulative frequency.