explain inferential statistics
Best Results From Wikipedia Yahoo Answers Youtube
Descriptive statistics describe the main features of a collection of data quantitatively. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set quantitatively without employing a probabilistic formulation, rather than use the data to make inferences about the population that the data are thought to represent. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, and the proportion of subjects with related comorbidities.
Inferential statistics tries to make inferences about a population from the sample data. We also use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one, or that it might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.
Use in statistical analyses
Descriptive statistics provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of quantitative analysis of data.
Descriptive statistics summarize data. For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. A player who shoots 33% is making approximately one shot in every three. One making 25% is hitting once in four. The percentage summarizes or describes multiple discrete events. Or, consider the scourge of many students, the grade point average. This single number describes the general performance of a student across the range of their course experiences.
Describing a large set of observations with a single indicator risks distorting the original data or losing important detail. For example, the shooting percentage doesn't tell you whether the shots are three-pointers or lay-ups, and GPA doesn't tell you whether the student was in difficult or easy courses. Despite these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.
Univariate analysis involves the examination across cases of a single variable, focusing on three characteristics: the distribution; the central tendency; and the dispersion. It is common to compute all three for each study variable.
The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful (income of 50,000 is typically not meaningfully different from 51,000). Grouping the raw scores using ranges of values reduces the number of categories to something for meaningful. For instance, we might group incomes into ranges of 0-10,000, 10,001-30,000, etc.
Frequency distributions are depicted as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart.
The mean is the most commonly used method of describing central tendency. To compute the mean, take the sum of the values and divide by the count. For example, the mean quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:
15, 20, 21, 36, 15, 25, 15
The sum of these 7 values is 147, so the mean is 147/7 =21.
The median is the score found at the middle of the set of values, i.e., that has as many cases with a larger value as have a smaller value. One way to compute the median is to sort the values in numerical order, and then locate the value in the middle of the list. For example, if there are 500 values, the median is the average of the two values in 250th and 251st positions. If there are 501 values, the value in 250th position is the median. Sorting the 7 scores above produces:
15, 15, 15, 20, 21, 25, 36
There are 7 scores and score #4 represents the halfway point. The median is 20. If there are an even number of observations, then the median is the mean of the two middle scores. In the example, if there were an 8th observation, with a value of 25, the median becomes the average of the 4th and 5th scores, in this case 20.5.
The mode is the most frequently occurring value in the set. To determine the mode, compute the distribution as above. The mode is the value with the greatest frequency. In the example, the modal value 15, occurs three times. In some distributions there is a "tie" for the highest frequency, i.e., there are multiple modal values. These are called multi-modal distributions.
Notice that the three measures typically produce different results. The term "average" obscures the difference between them and is better avoided. The three values are equal if the distribution is perfectly "normal" (i.e., bell-shaped).
Dispersion is the spread of values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 − 15 = 21.
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term "mathematical statistics" is closely related to the term "statistical theory" but also embraces modelling for actuarial science and non-statistical probability theory, particularly in Scandinavia.
Statistical science is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling. The initial analysis of the data from properly randomized studies often follows the study protocol.
Of course, the data from a randomized study can be analyzed to consider secondary hypotheses or to suggest new ideas. A secondary analysis of the data from a planned study uses tools from data analysis.
Data analysis is divided into:
- descriptive statistics - the part of statistics that describes data, i.e. summarises the data and their typical properties.
- inferential statistics - the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. using confidence intervals).
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data --- for example, from natural experiments and observational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective.
Mathematical statistics has been inspired by and has extended many procedures in applied statistics.
Statistics, mathematics, and mathematical statistics
Mathematical statistics has substantial overlap with the discipline of statistics. Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions. Statistical theory relies on probability and decision theory. Mathematicians and statisticians like Gauss, Laplace, and C. S. Peirce used decision theory with probability distributions and loss functions (or utility functions). The decision-theoretic approach to statistical inference was reinvigorated by Abraham Wald and his successors., and makes extensive use of scientific computing, analysis, and optimization; for the design of experiments, statisticians use algebra and combinatorics.
Advanced Placement Statistics (AP Statistics, AP Stats or AP Stat) is a college-level high schoolstatistics course offered in the United States through the College Board's Advanced Placement program. This course is equivalent to a one semester, non-calculus-based introductory college statistics course and is normally offered to juniors and seniors in high school.
One of the College Board's more recent additions, the AP Statistics exam was first administered in May 1997 to supplement the AP program's math offerings, which had previously consisted of only AP Calculus AB and BC. In the United States, enrollment in AP Statistics classes has increased at a higher rate than in any other AP class.
Students may receive college credit or upper-level college course placement upon the successful completion of a three-hour exam ordinarily administered in May. The exam consists of a multiple choice section and a free response section that are both 90 minutes long. Each section is weighted equally in determining the students' composite scores.
The Advanced Placement program has offered students the opportunity to pursue college-level courses while in high school. Along with the Educational Testing Service, the College Board administered the first AP Statistics exam in May 1997. The course was first taught to students in the 1996-1997 academic year. Prior to that, the only mathematics courses offered in the AP program included AP Calculus AB and BC. Students who didn't have a strong background in college-level math, however, found the AP Calculus program inaccessible and sometimes declined to take a math course in their senior year. Since the number of students required to take statistics in college is almost as large as the number of students required to take calculus, the College Board decided to add an introductory statistics course to the AP program. Since the prerequisites for such a program doesn't require mathematical concepts beyond those typically taught in a second-year algebra course, the AP program's math offerings became accessible to a much wider audience of high school students. The AP Statistics program addressed a practical need as well, since the number of students enrolling in majors that use statistics has grown. A total of 7,667 students took the exam during the first administration, which is the highest number of students to take an AP exam in its first year. Since then, the number of students taking the exam rapidly grew to 98,033 in 2007, making it one of the 10 largest AP exams.
If the course is provided by their school, students normally take AP Statistics in their junior or senior year and may decide to take it concurrently with a pre-calculus course. This offering is intended to imitate a one-semester, non-calculus based college statistics course, but high schools can decide to offer the course over one semester, two trimesters, or a full academic year.
The six-member AP Statistics Test Development Committee is responsible for developing the curriculum. Appointed by the College Board, the committee consists of three college statistics teachers and three high school statistics teachers who are typically asked to serve for terms of three years.
Emphasis is placed not on actual arithmetic computation, but rather on conceptual understanding and interpretation. The course curriculum is organized around four basic themes; the first involves exploring data and covers 20â€“30% of the exam. Students are expected to use graphical and numerical techniques to analyze distributions of data, including univariate, bivariate, and categorical data. The second theme involves planning and conducting a study and covers 10â€“15% of the exam. Students must be aware of the various methods of data collection through sampling or experimentation and the sorts of conclusions that can be drawn from the results. The third theme involves probability and its role in anticipating patterns in distributions of data. This theme covers 20â€“30% of the exam. The fourth theme, which covers 30â€“40% of the exam, involves statistical inference using point estimation, confidence intervals, and significance tests.
Along with the course curriculum, the exam is developed by the AP Statistics Test Development Committee as well. With the help of other college professors, the committee creates a large pool of possible questions that is pre-tested with college students taking statistics courses. The test is then refined to an appropriate level of difficulty and clarity. Afterwards, the Educational Testing Service is responsible for printing and administering the exam.
The exam is offered every year in May. Students are not expected to memorize any formulas. Therefore, a list of common statistical formulas related to descriptive statistics, probability, and inferential statistics are provided. Moreover, tables for the normal, Student's t and chi-square distributions are given as well. Students are also expected to use graphing calculators with statistical capabilities. The exam is three hours long w
In statistics, the term non-parametric statistics has at least two different meanings:
- The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:
- *distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests.
- *non-parametric statistics (in the sense of a statistic over data, which is defined to be a function on a sample that has no dependency on a parameter), whose interpretation does not depend on the population fitting any parametrized distributions. Statistics based on the ranks of observations are one example of such statistics and these play a central role in many non-parametric approaches.
- The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions, and assumptions about the types of connections among variables are also made. These techniques include, among others:
- *non-parametric regression, which refers to modeling where the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.
- *non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.
Applications and purpose
Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences; in terms of levels of measurement, for data on an ordinal scale.
As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.
Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.
The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.
Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
- A histogram is a simple nonparametric estimate of a probability distribution
- Kernel density estimation provides better estimates of the density than histograms.
- Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.
- Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption.
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include
- Andersonâ€“Darling test
- Statistical Bootstrap Methods
- Cochran's Q
- Cohen's kappa
- Friedman two-way analysis of variance by ranks
- Kendall's tau
- From Yahoo Answers
Question:in the manufactuing industry there is some terminology that goes like this: at 3 sigma you have 66,800 defects per million opportunities at 4 sigma you have 6,210 defects per million opportunities at 5 sigma you have 233 defects per million at 6 sigma you only have 3.4 defects per million If you have already guessed it, this is the definition for the business model of quality managment known as 6-Sigma. I ALREADY TOOK THE ONLINE TRAINING OF SIX SIGMA, but it never discussed theTYPE OF Distribution that those sigma levels represent. ( Chi-square, Bell curve, ect. ) I can tell by the sigma and the frequency that the data is skewed to the right. but other than that, I need someone to explain a little bit more. Also, one CEO stated, "....reduce laboratory errors to attain 6-Sigma...99.9997% accuracy or 3.4 ppm errors." (Dave Dexter, CEO of Sonora Quest Laboratories) Sooooo, is the data from above a percentile or a curve?
Answers:The distribution that gives the numbers you quoted above is the normal (Gaussian) distribution. In standard 6-sigma methodology there is also an assumption that, over time, the process mean will drift by about 1.5 times the standard deviation. Therefore, you will get 3.4 defects per million opportunities when the upper and lower specification limits are +/- 6-sigma from the target, and the process mean has drifted 1.5-sigma to either side of the target. If the process mean did not drift by the 1.5 sigma, then a 6-sigma process would yield only 2 defects per BILLION opportunities.Question:What is a regular introductory Statistics class like...?? In college.
Answers:Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the natural and social sciences to the humanities, and to government and business. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject. The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in economic statistics, crime statistics, etc.Question:I have a Stats project due tomorrow. My hypothesis is; At SJRCC how many professors graduated with their BA from an in-state school? I asked 10 professors and 6 out of the 10 graduated from an in-state school with their BA. My collected data is; n=10 x= 6 null = p = ,5 p > 0.05 I need the following: Data - explain your variable(s) Statistical Analysis - explain what stats method was used (descriptive, inferential) Six-Step Hypothesis test I have StatDisk to calculate my date, but I dont know how to input it into the program or where to even start! is this a Z-distribution or a T-distribution? what is my signifance level? Thanks
Answers:p=.5 q=1-.5=.5 n=10 Claim p>.5 It's a t-dist because, n, the sample size is <30 Degrees of freedom is n-1=9 It's a Right tail test because claim is p>.5 (you show 0.05? The value must match the null hyp value !) What is your level of significance, alpha? Let's use alpha=0.05 From the t-dist table you get the cutoff for the reject/do not reject regions: t=1.833 Any t<1.833 means:do not reject(null) Any t>1.833 means reject (null) Compute the test value: phat=6/10=.6 qhat=1-phat=1-.6=.4 t=(phat-p)/sqrt(pq/n) t=(.6-.5)/sqrt(.5*.5/10) t=(.1)/sqrt(.025) t=.1/0.15811 t=0.632 Decision test value is to left of t cutoff 1.833 it's in the Do not reject Null Hyp Region Summarize There is not enough evidence to support the claim that p>.5Question:I am in my Statistics class and I need help understanding this variable. I know how to make a tree with it, but I can't do a tree everytime during a quiz.
Answers:I do not see where your question is?
From YoutubeThe Basics: Descriptive and Inferential Statistics :statisticslectures.com - where you can find free lectures, videos, and exercises, as well as get your questions answered on our forums!Excel Statistics 11: Descriptive & Inferential Statistics :See Excel Charts for Cross-Sectional Data and Time Series data. Learn about the three types of Descriptive Statistics: Numerical, Tabula and Graphical. See the AVEARGE function, a percentage formula and a finished Histogram. Learn about Populations and Samples in regards to Inferential Statistics. This is a beginning to end video series for the Business & Economics Statistics/Excel class, Busn 210 at Highline Community College taught by Michael Gel ExcelIsFun Girvin