Explore Related Concepts

explain inferential statistics

Best Results From Wikipedia Yahoo Answers Youtube


From Wikipedia

Descriptive statistics

Descriptive statistics describe the main features of a collection of data quantitatively. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set quantitatively without employing a probabilistic formulation, rather than use the data to make inferences about the population that the data are thought to represent. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, and the proportion of subjects with related comorbidities.

Inferential statistics

Inferential statistics tries to make inferences about a population from the sample data. We also use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one, or that it might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.

Use in statistical analyses

Descriptive statistics provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of quantitative analysis of data.

Descriptive statistics summarize data. For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. A player who shoots 33% is making approximately one shot in every three. One making 25% is hitting once in four. The percentage summarizes or describes multiple discrete events. Or, consider the scourge of many students, the grade point average. This single number describes the general performance of a student across the range of their course experiences.

Describing a large set of observations with a single indicator risks distorting the original data or losing important detail. For example, the shooting percentage doesn't tell you whether the shots are three-pointers or lay-ups, and GPA doesn't tell you whether the student was in difficult or easy courses. Despite these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.

Univariate analysis

Univariate analysis involves the examination across cases of a single variable, focusing on three characteristics: the distribution; the central tendency; and the dispersion. It is common to compute all three for each study variable.

Distribution

The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful (income of 50,000 is typically not meaningfully different from 51,000). Grouping the raw scores using ranges of values reduces the number of categories to something for meaningful. For instance, we might group incomes into ranges of 0-10,000, 10,001-30,000, etc.

Frequency distributions are depicted as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart.

Central tendency

The central tendency of a distribution locates the "center" of a distribution of values. The three major types of estimates of central tendency are the mean, themedian, and themode.

The mean is the most commonly used method of describing central tendency. To compute the mean, take the sum of the values and divide by the count. For example, the mean quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:

15, 20, 21, 36, 15, 25, 15

The sum of these 7 values is 147, so the mean is 147/7 =21.

The median is the score found at the middle of the set of values, i.e., that has as many cases with a larger value as have a smaller value. One way to compute the median is to sort the values in numerical order, and then locate the value in the middle of the list. For example, if there are 500 values, the median is the average of the two values in 250th and 251st positions. If there are 501 values, the value in 250th position is the median. Sorting the 7 scores above produces:

15, 15, 15, 20, 21, 25, 36

There are 7 scores and score #4 represents the halfway point. The median is 20. If there are an even number of observations, then the median is the mean of the two middle scores. In the example, if there were an 8th observation, with a value of 25, the median becomes the average of the 4th and 5th scores, in this case 20.5.

The mode is the most frequently occurring value in the set. To determine the mode, compute the distribution as above. The mode is the value with the greatest frequency. In the example, the modal value 15, occurs three times. In some distributions there is a "tie" for the highest frequency, i.e., there are multiple modal values. These are called multi-modal distributions.

Notice that the three measures typically produce different results. The term "average" obscures the difference between them and is better avoided. The three values are equal if the distribution is perfectly "normal" (i.e., bell-shaped).

Dispersion

Dispersion is the spread of values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 − 15 = 21.

Th

Mathematical statistics

Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term "mathematical statistics" is closely related to the term "statistical theory" but also embraces modelling for actuarial science and non-statistical probability theory, particularly in Scandinavia.

Statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory.

Introduction

Statistical science is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling. The initial analysis of the data from properly randomized studies often follows the study protocol.

Of course, the data from a randomized study can be analyzed to consider secondary hypotheses or to suggest new ideas. A secondary analysis of the data from a planned study uses tools from data analysis.

Data analysis is divided into:

  • descriptive statistics - the part of statistics that describes data, i.e. summarises the data and their typical properties.
  • inferential statistics - the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. using confidence intervals).

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data --- for example, from natural experiments and observational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective.

Mathematical statistics has been inspired by and has extended many procedures in applied statistics.

Statistics, mathematics, and mathematical statistics

Mathematical statistics has substantial overlap with the discipline of statistics. Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions. Statistical theory relies on probability and decision theory. Mathematicians and statisticians like Gauss, Laplace, and C. S. Peirce used decision theory with probability distributions and loss functions (or utility functions). The decision-theoretic approach to statistical inference was reinvigorated by Abraham Wald and his successors., and makes extensive use of scientific computing, analysis, and optimization; for the design of experiments, statisticians use algebra and combinatorics.


Advanced Placement Statistics

Advanced Placement Statistics (AP Statistics, AP Stats or AP Stat) is a college-level high schoolstatistics course offered in the United States through the College Board's Advanced Placement program. This course is equivalent to a one semester, non-calculus-based introductory college statistics course and is normally offered to juniors and seniors in high school.

One of the College Board's more recent additions, the AP Statistics exam was first administered in May 1997 to supplement the AP program's math offerings, which had previously consisted of only AP Calculus AB and BC. In the United States, enrollment in AP Statistics classes has increased at a higher rate than in any other AP class.

Students may receive college credit or upper-level college course placement upon the successful completion of a three-hour exam ordinarily administered in May. The exam consists of a multiple choice section and a free response section that are both 90 minutes long. Each section is weighted equally in determining the students' composite scores.

History

The Advanced Placement program has offered students the opportunity to pursue college-level courses while in high school. Along with the Educational Testing Service, the College Board administered the first AP Statistics exam in May 1997. The course was first taught to students in the 1996-1997 academic year. Prior to that, the only mathematics courses offered in the AP program included AP Calculus AB and BC. Students who didn't have a strong background in college-level math, however, found the AP Calculus program inaccessible and sometimes declined to take a math course in their senior year. Since the number of students required to take statistics in college is almost as large as the number of students required to take calculus, the College Board decided to add an introductory statistics course to the AP program. Since the prerequisites for such a program doesn't require mathematical concepts beyond those typically taught in a second-year algebra course, the AP program's math offerings became accessible to a much wider audience of high school students. The AP Statistics program addressed a practical need as well, since the number of students enrolling in majors that use statistics has grown. A total of 7,667 students took the exam during the first administration, which is the highest number of students to take an AP exam in its first year. Since then, the number of students taking the exam rapidly grew to 98,033 in 2007, making it one of the 10 largest AP exams.

Course

If the course is provided by their school, students normally take AP Statistics in their junior or senior year and may decide to take it concurrently with a pre-calculus course. This offering is intended to imitate a one-semester, non-calculus based college statistics course, but high schools can decide to offer the course over one semester, two trimesters, or a full academic year.

The six-member AP Statistics Test Development Committee is responsible for developing the curriculum. Appointed by the College Board, the committee consists of three college statistics teachers and three high school statistics teachers who are typically asked to serve for terms of three years.

Curriculum

Emphasis is placed not on actual arithmetic computation, but rather on conceptual understanding and interpretation. The course curriculum is organized around four basic themes; the first involves exploring data and covers 20–30% of the exam. Students are expected to use graphical and numerical techniques to analyze distributions of data, including univariate, bivariate, and categorical data. The second theme involves planning and conducting a study and covers 10–15% of the exam. Students must be aware of the various methods of data collection through sampling or experimentation and the sorts of conclusions that can be drawn from the results. The third theme involves probability and its role in anticipating patterns in distributions of data. This theme covers 20–30% of the exam. The fourth theme, which covers 30–40% of the exam, involves statistical inference using point estimation, confidence intervals, and significance tests.

Exam

Along with the course curriculum, the exam is developed by the AP Statistics Test Development Committee as well. With the help of other college professors, the committee creates a large pool of possible questions that is pre-tested with college students taking statistics courses. The test is then refined to an appropriate level of difficulty and clarity. Afterwards, the Educational Testing Service is responsible for printing and administering the exam.

Structure

The exam is offered every year in May. Students are not expected to memorize any formulas. Therefore, a list of common statistical formulas related to descriptive statistics, probability, and inferential statistics are provided. Moreover, tables for the normal, Student's t and chi-square distributions are given as well. Students are also expected to use graphing calculators with statistical capabilities. The exam is three hours long w

Non-parametric statistics

In statistics, the term non-parametric statistics has at least two different meanings:

  1. The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:
    *distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests.
    *non-parametric statistics (in the sense of a statistic over data, which is defined to be a function on a sample that has no dependency on a parameter), whose interpretation does not depend on the population fitting any parametrized distributions. Statistics based on the ranks of observations are one example of such statistics and these play a central role in many non-parametric approaches.
  2. The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions, and assumptions about the types of connections among variables are also made. These techniques include, among others:
    *non-parametric regression, which refers to modeling where the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.
    *non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.

Applications and purpose

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences; in terms of levels of measurement, for data on an ordinal scale.

As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.

Non-parametric models

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

Methods

Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include