Explore Related Concepts

formula for sample size determination

Best Results From Wikipedia Yahoo Answers Youtube

From Wikipedia

Sample size

The sample size of a statistical sample is the number of observations that constitute it. It is typically denoted n, a positive integer (natural number).

Typically, all else being equal, a larger sample size leads to increased precision in estimates of various properties of the population, though the results will become less accurate if there is a systematic error in the experiment. This can be seen in such statistical rules as the law of large numbers and the central limit theorem. Repeated measurements and replication of independent samples are often required in measurement and experiments to reach a desired precision.

A typical example would be when a statistician wishes to estimate the arithmetic mean of a quantitative random variable (for example, the height of a person). Assuming that they have a randomsample with independent observations, and also that the variability of the population (as measured by the standard deviationσ) is known, then the standard error of the sample mean is given by the formula:


It is easy to show that as n becomes very large, this variability becomes small. This leads to more sensitive hypothesis tests with greater statistical power and smaller confidence intervals.

Implications of sample size

Central limit theorem

The central limit theorem states that as the size of a sample of independent observations approaches infinity, provided data come from a distribution with finite variance, that the sampling distribution of the sample mean approaches a normal distribution.

Estimating proportions

A typical statistical aim is to demonstrate with 95% certainty that the true value of a parameter is within a distance B of the estimate: B is an error range that decreases with increasing sample size (n). Typically B is generated in such a way that the range of values of that are within a distance B of the estimated parameter value will be a 95% confidence interval, at least in an approximate sense.

For example, a simple situation is estimating a proportion in a population. To do so, a statistician will estimate the bounds of a 95% confidence interval for an unknown proportion.

The rule of thumb for (a maximum or 'conservative') B for a proportion derives from the fact the estimator of a proportion, \hat p = X/n, (where X is the number of 'positive' observations) has a (scaled) binomial distribution and is also a form of samplemean (from a Bernoulli distribution [0,1] which has a maximum variance of 0.25 for parameterp = 0.5). So, the sample mean X/n has maximum variance 0.25/n. For sufficiently large n (usually this means that we need to have observed at least 10 positive and 10 negative responses), this distribution will be closely approximated by a normal distribution with the same mean and variance.

Using this approximation, it can be shown that the confidence interval (+/- the margin of error) is:

At 99% confidence: (\hat p - 1.29/\sqrt{n} ,~~ \hat p + 1.29/\sqrt{n})

At 95% confidence: (\hat p - 0.98/\sqrt{n} ,~~ \hat p + 0.98/\sqrt{n})

At 90% confidence: (\hat p - 0.82/\sqrt{n} ,~~ \hat p + 0.82/\sqrt{n})

One sees these numbers quoted often in news reports of opinion polls and other sample surveys.

Extension to other cases

In general, if a populationmean is estimated using the samplemean from n observations from a distribution with variance σ², then if n is large enough (typically >30) the central limit theorem can be applied to obtain an approximate 95% confidence interval of the form

(\bar x - B,\bar x + B), B=2\sigma/\sqrt{n}

If the sampling errorε is required to be no larger than bound B, as above, then

4\sigma^2/\varepsilon^2 \approx 4\sigma^2/B^2=n

Note, if the

Acceptance sampling

Acceptance sampling uses statistical sampling to determine whether to accept or reject a production lot of material. It has been a common quality control technique used in industry and particularly the military for contracts and procurement.

A wide variety of acceptance sampling plans are available.


Acceptance sampling procedures became common during WWII. Sampling plans, such as MIL-STD-105, were developed by Harold F. Dodge and others and became frequently used as standards.

More recently, quality assurance broadened the scope beyond final inspection to include all aspects of manufacturing. Broader quality management systems include methodologies such as statistical process control, HACCP, six sigma, and ISO 9000. Some use of acceptance sampling still remains.


Sampling provides one rational means of verification that a production lot conforms with the requirements of technical specifications. 100% inspection does not guarantee 100% compliance and is too time consuming and costly. Rather than evaluating all items, a specified sample is taken, inspected or tested, and a decision is made about accepting or rejecting the entire production lot.

Plans have known risks: an acceptable quality limit and a rejectable quality level (LTDP) are part of the operating characteristic curve of the sampling plan. These are primarily statistical risks and do not necessarily imply that defective product is intentionally being made or accepted. Plans can have a known average outgoing quality limit (AOQL).

Attribute plans

MIL-STD-105 was a United States defense standard that provided procedures and tables for sampling by attributes (pass or fail characteristic). MIL-STD-105E was cancelled in 1995 but is available in related documents such as ANSI/ASQ Z1.4, "Sampling Procedures and Tables for Inspection by Attributes". Several levels of inspection are provided and can be indexed to several AQLs. The sample size is specified and the basis for acceptance or rejection (number of defects) is provided.

Variables plans

When a measured characteristic produces a number, other sampling plans such as those based on MIL-STD-414 are often used. Compared with attriute sampling plans, these often use a smaller sample size for the same indexed AQL.

From Yahoo Answers

Question:Iron is obtained from the oxide in iron ore by heating with carbon in the form of coke. If a sample of the oxide, 10.78 g, is decomposed in this way, 8.378 g of iron is obtained. Determine the empirical formula for the iron oxide in this sample of the ore.(Hint: first determine the number of moles in iron and oxygen in this sample.)

Answers:according to info, in 10.78g iron oxide there is 8.378g iron and therefore (10.78-8.378)g O (which is 2.402g) moles iron= 8.378/55.85 (or 56, if rounded further) =0.15 moles oxygen= 2.402/16 = 0.150125 they are both about the same, so ratio of oxy to iron is 1:1 so empirical formula is FeO !!

Question:A central university has a student population of 60,000. The university is interested in determining what proportion of them is in favour of a new grading system. Determine a sample size with confidence level of 95% that will show the true proportion of population in favour of the new system within plus and minus 0.02.

Answers:Margin of error at 95% confidence = 0.98/ n 0.98/ n = 0.02 0.98 = .02 n n = 49 n = 2401

Question:My company would like to use a third party measurement device in our product. The device measures distance between 1 to 10cm. What would be a good sample size to validate the manufacturer's spec on the device and to determine a good standard deviation? Thanks.

Answers:see the following website

Question:The scores of high school seniors on the ACT exam in 2003 had mean=20.8 and standard deviation=4.8. The distribution of scores is only roughly Normal. a.) What is the approximate probability that a single student randomly chosen from all those taking the test scores 23 or higher? How can I find this answer without knowing the number of students who took the test?

Answers:The scores of that high school is represented by letter "X" This score is Normal Distribution with = 20.8 and = 4.8 Event A is "The single student randomly chosen from all those taking the test score 23 or higher" I think you'll know this formula (for only Normal Distribution) : p{ X > a } = 1/2 - ((a - )/ ) ( is the Laplace function) Use this formula for event A like this : P(A) = P{ X > 22 } = 1/2 - ((22 - 20.8)/4.8) = 1/2 - (0.25) By using the table of Laplace function values, you can determine (0.25) easily, in this case it's 0.09871 Thus P(A) = 0.5 - 0.09871 = 0.4013

From Youtube

5.5 How sample size is determined :The sample size for any study depends on the required precision; the size and nature of the population under study. Procedural aspects such as time, budget and resources available will dictate the size, as will publishing aspects, in terms of importance placed on the results by the audience. The main ways of deciding on sample size are: by calculation; by using accepted industry standards; by budget (time or money available); by building analysis cells. The calculation method takes account of the population size and the expected accuracy of results. In theory, this is the best way to arrive at a sample size; in practice, other methods are used. Many sample sizes for research studies are decided on the basis of what is feasible within time or money available. The sample size is often built up from the minimum numbers expected in each analysis cell. www.oxfordtextbooks.co.uk