frequency distribution calculator

Best Results From Wikipedia Yahoo Answers Youtube

From Wikipedia

Frequency distribution

In statistics, a frequency distribution is a tabulation of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way the table summarizes the distribution of values in the sample.

Univariate frequency tables

Univariate frequency distributions are often presented as lists ordered by quantity showing the number of times each value appears. For example, if 100 people rate a five-point Likert scale assessing their agreement with a statement on a scale on which 1 denotes strong agreement and 5 strong disagreement, the frequency distribution of their responses might look like:

A different tabulation scheme aggregates values into bins such that each bin encompasses a range of values. For example, the heights of the students in a class could be organized into the following frequency table.

A Frequency Distribution shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. It is a way of showing unorganized data e.g. to show results of an election, income of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc. Some of the graphs that can be used with frequency distributions are histograms, line graphs, bar charts and pie charts. Frequency distributions are used for both qualitative and quantitative data..

Joint frequency distributions

Bivariate joint frequency distributions are often presented as (two-way) contingency tables:

The total row and total column report the marginal frequencies or marginal distribution, while the body of the table reports the joint frequencies.


Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables.

Statistical hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of central tendency or averages, such as the mean and median, and measures of variability or statistical dispersion, such as the standard deviation or variance.

A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically—for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic.

Letter frequency distributions are also used in frequency analysis to crack codes and refer to the relative frequency of letters in different languages.

Letter frequency

The frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. Linotype machines sorted the letters' frequencies as etaoin shrdlu cmfwyp vbgkqj xz based on the experience and custom of manual compositors. Likewise,Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order, yields e it san hurdm wgvlfbk opjxcz yq. Similar ideas are used in modern data-compression techniques such as Huffman coding.

More recent analyses show that letter frequencies, like word frequencies, tend to vary, both by writer and by subject. One cannot write an essay about x-rays without using frequent Xs, and the essay will have an especially strange letter frequency if the essay is about the frequent use of x-rays to treat zebras in Qatar. Different authors have habits which can be reflected in their use of letters. Hemingway's writing style, for example, is visibly different from Faulkner's. Letter, bigram, trigram, word frequencies, word length, and sentence length can be calculated for specific authors, and used to prove or disprove authorship of texts, even for authors whose styles aren't so divergent.

Accurate average letter frequencies can only be gleaned by analyzing a large amount of representative text. With the availability of modern computing and collections of large text corpora, such calculations are easily made. This [ Deafandblind link] details examples from a variety of sources, (press reporting, religious text, scientific text and general fiction) and there are differences especially for general fiction with the position of 'h' and 'i'. The example differs from the linotype 'etaoin shrdlu' to come out as 'etaoHn Isrdlu'. There is an unproven statement that conversation is similar in frequency to general fiction.

Herbert S. Zim, in his classic introductory cryptography text "Codes and Secret Writing", gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKXJQ Z", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC".

The 'top twelve' letters comprise about 80% of the total usage. The 'top eight" letters comprise about 65% of the total usage. A spy using the VIC cipher or some other cipher based on a straddling checkerboard typically uses a mnemonic such as "a sin to err" (dropping the second "r") to remember the top 8 characters.

The use of letter frequencies and frequency analysis plays a fundamental role in several games, including hangman, Scrabble, Wheel of Fortune,Definition,Bananagrams, and cryptograms.

Letter frequencies had a strong effect on the design of some keyboard layouts. The most-frequent letters are on the bottom row of the Blickensderfer typewriter. The most-frequent letters are on the home row of the Dvorak Simplified Keyboard.

Relative frequencies of letters in the English language

The letter frequencies for English are listed below. However, this table differs slightly from others, such as Cornell University Math Explorer's Project, which produced [ this table] after measuring over 40,000 words.

In English, the space is slightly more frequent than the top letter (7% more frequent than, or 107% as frequent as, e), and the non-alphabetic characters (digits, punctuation, etc.) occupy the fourth position, between t and a.

Relative frequencies of the first letters of a word in the English language

First Letter of a word frequencies:

Relative frequencies of letters in other languages

*See Turkish dotted and dotless I

The figure below illustrates the frequency distributions of the 26 most common Latin letters across some languages.

Based on these tables, the 'etaoin shrdlu'-equivalent results for each language is as follows:

  • French: 'esait nrulo'; (Indo-European: Romance; traditionally, 'esartinulop' is used, in part for its ease of pronunciation)
  • Spanish: 'eaosr nidlc'; (Indo-European: Romance)
  • Portuguese: 'aeosr indmt' (Indo-European: Romance)
  • Italian: 'eaion lrtsc'; (Indo-European: Romance)
  • Esperanto: 'aieon lsrtk' (artificial language – influenced by Indo-European languages, Romance, Germanic mostly)
  • German: 'enisr atdhu'; (Indo-European: Germanic)
  • Swedish: 'eantr slido'; (Indo-European: Germanic)
  • Turkish: 'aeinr ldkmu'; (Turkic: a non Indo-European language)
  • Dutch: 'enati rodsl'; (Indo-European: Germanic)
  • Polish: 'aoiez nscwr'; (Indo-European: Slavic)

All these languages use a basically similar 25+ character alphabet.

From Yahoo Answers

Question:Also, is the cumulative frequency just all the frequencies added up? Or is there some equation to it?

Answers:frequency is the number of observations of a given type. relative frequency is the number of observations of a given type divided by the total number of observations. for example, if you are looking at the color of cars passing through an intersection in a time unit and observe { green, blue, red, red, white, black, green, red, black, black, black, red, blue} there are 13 observations the frequency of green cars is: 2 the frequency of blue cars is: 2 the frequency of red cars is: 4 the frequency of white cars is: 1 the frequency of black cars is: 4 the relative frequency of green cars is: 2/13 the relative frequency of blue cars is: 2/13 the relative frequency of red cars is: 4/13 the relative frequency of white cars is: 1/13 the relative frequency of black cars is: 4/13

Question:I do not get to use a graphing calculator... so, with a regular calc. how would i find the median in a Frequency distribution chart? i know how to when there is no freq. distribution chart, but with one i dont know how to find the median....

Answers:It would be in the center of the data on the frequency distribution chart. Very similar to finding it when you just have the raw data listed in order.

Question:e.g. x 20 30 40 50 65 100 f 660 2140 1810 2150 2490 2770 the answer is 30 + (145/570) * 10 = 32.54 How do you get to it?

Answers:Your question appears to contain errors in it. First, f cannot be a cumulative frequency distribution since it decreases for certain values of x. A cumulative distribution should always be monotonically increasing. Therefore, I believe what you have is a frequency distribution. Second, you need to identify how the x bins are set up. That is, x1 x2 x3 f1 f2 f3 Are f1, f2, and f3 the #s less than x1, between x1 and x2, and between x2 and x3 respectively? Or are they between x1 and x2, between x2 and x3, and beyond x3 respectively? I'm going to assume the former scenario. Let N be the total number of observations. Let F(x) be the *cumulative* frequency, the number of observations less than x. Thus we have: x 20 30 40 50 65 100 F 660 2800 4610 6760 9250 12020 The median is the x value for which F(x) = (N+1)/2 (+1 because N is even in this case). From the values of F, and since (N+1)/2 = 6010.5, the median must be between 40 and 50. Typically, to get the value, one linearly interpolates the value using the bracketing F values. Thus, (N+1)/2 = F(x1) + (F(x2)-F(x1))/(x2-x1) * (x - x1) Solve for x: x = x1 + ( (N+1)/2 - F(x1) )*(x2-x1)/(F(x2)-F(x1)) = 40 + (12021 / 2 - 4610)*(10)/(6760-4610) = 40 + (2801 / 2150) * 10 = 46.51395 Obviously this is not what you are quoting, and perhaps I have misunderstood your setup somehow. The above should be correct given the discussion I provided, and it should give you enough to figure out your problem if I in fact have misunderstood your question. Good luck.

Question:Steve is a film buff and likes movies of all kinds.he watches movies on a regular basis. here is a record of the number of films he watched per week over the last year? number of films frequency [0,3[ 21 [3,6[ 17 [6,9[ 7 [9,12[ 3 [12,15[ 2 [15,18[ 2 total= 52 find the mean median and mode of this distribution a)mean=4.84 median=3.88 mode=3 b)mean=4.85 median=3.97 mode=1.5 c)mean=252 median=6 mode=21 d)mean=4.85 median=3.88 mode=21 e)mean=252 median=3.97 mode1.5 i know how to calculate the mean of this distribution.i take the midpoints of the number of the films watched ,and multiplying them with the frequency numbers and divide the pruduct with number of the weeks in that gives :4.846. but when i attempt to calculate the median, cumulative think involves and dont know how to apply to this distribution.there am stuck,and the mode is ,the highest frequency which is 21 in this distribution,if am not right about the mode please let me know about that too thanks alot for the helps

Answers:So you have the mean of 4.846. For the median, arrange them, either in the ascending or descending order, and find the middle: Since there are 52 films in all, 52/2 = 26. So, the middle is the 26th film. in the ascending order: [0,3] == 1st to 21st film [3,6] == 22nd to 38th (21+17) film So, 26 is between 3 and 6. The gap from 3 to 6 has a range of 3, which includes 17 films. Thus every film occupies a range of 3/17: 22nd film = 3 to 3+3/17 23rd film = 3+3/17 to 3+6/17 24th film = 3+6/17 to 3+9/17 25th film = 3+9/17 to 3+12/17 26th film = 3+12/17 to 3+15/17 The midpoint of 3+12/17 and 3+15/17 is 3+13.5/17 = 0.794118 The mode is the midpoint of the interval with the largest frequency: [0,3] 21 So the mode is (0+3)/2 = 1.5 So the correct answer would be b).

From Youtube

Mean from a data frequency distribution :Illustration of the calculation of a frequency distribution mean

Frequency Distributions :Excerpted from a lecture by Professor Lisa Dierker QAC 201: Applied Data Analysis October 5, 2009 Wesleyan University