advantages descriptive statistics
Best Results From Wikipedia Yahoo Answers Youtube
Descriptive statistics describe the main features of a collection of data quantitatively. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set quantitatively without employing a probabilistic formulation, rather than use the data to make inferences about the population that the data are thought to represent. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, and the proportion of subjects with related comorbidities.
Inferential statistics tries to make inferences about a population from the sample data. We also use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one, or that it might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.
Use in statistical analyses
Descriptive statistics provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of quantitative analysis of data.
Descriptive statistics summarize data. For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. A player who shoots 33% is making approximately one shot in every three. One making 25% is hitting once in four. The percentage summarizes or describes multiple discrete events. Or, consider the scourge of many students, the grade point average. This single number describes the general performance of a student across the range of their course experiences.
Describing a large set of observations with a single indicator risks distorting the original data or losing important detail. For example, the shooting percentage doesn't tell you whether the shots are three-pointers or lay-ups, and GPA doesn't tell you whether the student was in difficult or easy courses. Despite these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.
Univariate analysis involves the examination across cases of a single variable, focusing on three characteristics: the distribution; the central tendency; and the dispersion. It is common to compute all three for each study variable.
The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful (income of 50,000 is typically not meaningfully different from 51,000). Grouping the raw scores using ranges of values reduces the number of categories to something for meaningful. For instance, we might group incomes into ranges of 0-10,000, 10,001-30,000, etc.
Frequency distributions are depicted as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart.
The mean is the most commonly used method of describing central tendency. To compute the mean, take the sum of the values and divide by the count. For example, the mean quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:
15, 20, 21, 36, 15, 25, 15
The sum of these 7 values is 147, so the mean is 147/7 =21.
The median is the score found at the middle of the set of values, i.e., that has as many cases with a larger value as have a smaller value. One way to compute the median is to sort the values in numerical order, and then locate the value in the middle of the list. For example, if there are 500 values, the median is the average of the two values in 250th and 251st positions. If there are 501 values, the value in 250th position is the median. Sorting the 7 scores above produces:
15, 15, 15, 20, 21, 25, 36
There are 7 scores and score #4 represents the halfway point. The median is 20. If there are an even number of observations, then the median is the mean of the two middle scores. In the example, if there were an 8th observation, with a value of 25, the median becomes the average of the 4th and 5th scores, in this case 20.5.
The mode is the most frequently occurring value in the set. To determine the mode, compute the distribution as above. The mode is the value with the greatest frequency. In the example, the modal value 15, occurs three times. In some distributions there is a "tie" for the highest frequency, i.e., there are multiple modal values. These are called multi-modal distributions.
Notice that the three measures typically produce different results. The term "average" obscures the difference between them and is better avoided. The three values are equal if the distribution is perfectly "normal" (i.e., bell-shaped).
Dispersion is the spread of values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 − 15 = 21.
In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution under the null hypothesis, which allows us to calculate p-values.
For example, suppose we wish to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If we flip the coin 100 times and record the results, the raw data can be represented as a sequence of 100 Heads and Tails. If our interest is in the marginal probability of obtaining a head, we only need to record the number T out of the 100 flips that produced a head, and use T0 = 50 as our null value. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.
A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution.
In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation (sample minimum) from the greatest (sample maximum) and provides an indication of statistical dispersion.
It is measured in the same units as the data. Since it only depends on two of the observations, it is a poor and weak measure of dispersion except when the sample size is large. (example:a,b,c= range=c-a)
The range, in the sense of the difference between the highest and lowest scores, is also called the crude range. When a new scale for measurement is developed, then a potential maximum or minimum will emanate from this scale. This is called the potential (crude) range. Of course this range should not be chosen too small, in order to avoid a ceiling effect. When the measurement is obtained, the resulting smallest or greatest observation, will provide the observed (crude) range.
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term "mathematical statistics" is closely related to the term "statistical theory" but also embraces modelling for actuarial science and non-statistical probability theory, particularly in Scandinavia.
Statistical science is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling. The initial analysis of the data from properly randomized studies often follows the study protocol.
Of course, the data from a randomized study can be analyzed to consider secondary hypotheses or to suggest new ideas. A secondary analysis of the data from a planned study uses tools from data analysis.
Data analysis is divided into:
- descriptive statistics - the part of statistics that describes data, i.e. summarises the data and their typical properties.
- inferential statistics - the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. using confidence intervals).
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data --- for example, from natural experiments and observational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective.
Mathematical statistics has been inspired by and has extended many procedures in applied statistics.
Statistics, mathematics, and mathematical statistics
Mathematical statistics has substantial overlap with the discipline of statistics. Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions. Statistical theory relies on probability and decision theory. Mathematicians and statisticians like Gauss, Laplace, and C. S. Peirce used decision theory with probability distributions and loss functions (or utility functions). The decision-theoretic approach to statistical inference was reinvigorated by Abraham Wald and his successors., and makes extensive use of scientific computing, analysis, and optimization; for the design of experiments, statisticians use algebra and combinatorics.
From Yahoo Answers
Answers:Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the natural and social sciences to the humanities, and to government and business. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject. The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in economic statistics, crime statistics, etc.
Answers:one advantage is that it is a quick and easy way to get a number that is representative of a set of numbers a disadvatage can exist if the set of numbers is not evenly distributed (eg. 1,2,3,4,5,9999 the mean would be close to 1700 but would that be representative). for this reason salaries often use median.
Answers:"The advantage of a histogram is that it shows the shape of the distribution for a large set of data; however the original data cannot be retrieved from a histogram." That was a direct quote from the following web document:
Answers:To elaborate further, your mode does tell you what the most common value in your data is, but, like the median, it's a somewhat arbitrary value. For example, take the data set 1,2,2,3,7. Both the mode and the median are 2, and the mean is 3. Now take the data set 1,2,2,25,50. Again, the mode and the median are both 2, but the mean is now 16. Now, say that second data set represents the number of shark attacks that have happened in the past 5 years off the coast of a beachfront property you're thinking about buying. I'd much rather know the mean number of shark attacks, not the mode or the median, wouldn't you?