From Wikipedia

Cluster sampling

Cluster sampling is a sampling technique used when "natural" groupings are evident in a statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups (or clusters) and a sample of the groups is selected. Then the required information is collected from the elements within each selected group. This may be done for every element in these groups or a subsample of elements may be selected within each of these groups. A common motivation for cluster sampling is to reduce the average cost per interview. Given a fixed budget, this can allow an increased sample size. Assuming a fixed sample size, the technique gives more accurate results when most of the variation in the population is within the groups, not between them.

Cluster elements

Elements within a cluster should ideally be as heterogeneous as possible, but there should be homogeneity between cluster means. Each cluster should be a small scale representation of the total population. The clusters should be mutually exclusive and collectively exhaustive. A random sampling technique is then used on any relevant clusters to choose which clusters to include in the study. In single-stage cluster sampling, all the elements from each of the selected clusters are used. In two-stage cluster sampling, a random sampling technique is applied to the elements from each of the selected clusters.

The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage). In stratified sampling, the analysis is done on elements within strata. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.

There also exists multistage sampling, where more than two steps are taken in selecting clusters from clusters.

Aspects of cluster sampling

One version of cluster sampling is area sampling or geographical cluster sampling. Clusters consist of geographical areas. Because a geographically dispersed population can be expensive to survey, greater economy than simple random sampling can be achieved by treating several respondents within a local area as a cluster. It is usually necessary to increase the total sample size to achieve equivalent precision in the estimators, but cost savings may make that feasible.

In some situations, cluster analysis is only appropriate when the clusters are approximately the same size. This can be achieved by combining clusters. If this is not possible, probability proportionate to size sampling is used. In this method, the probability of selecting any cluster varies with the size of the cluster, giving larger clusters a greater probability of selection and smaller clusters a lower probability. However, if clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the same probability of selection.

Cluster sampling is used to estimate high mortalities in cases such as wars, famines and natural disasters.


  • Can be cheaper than other methods - e.g. fewer travel expenses, administration costs


  • Higher sampling error, which can expressed in the so-called "design effect", the ratio between the number of subjects in the cluster study and the number of subjects in an equally reliable, randomly sampled unclustered study.

Library classification

A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities together that are similar, typically arranged in a hierarchical tree structure. A different kind of classification system, called a faceted classification system, is also widely used which allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways.


Library classification forms part of the field of library and information science. It is a form of bibliographic classification (library classifications are used in library catalogs, while "bibliographic classification" also covers classification used in other kinds of bibliographic databases). It goes hand in hand with library (descriptive) cataloging under the rubric of cataloging and classification, sometimes grouped together as technical services. The library professional who engages in the process of cataloging and classifying library materials is called a cataloguer or catalog librarian. Library classification systems are one of the two tools used to facilitate subject access. The other consists of alphabetical indexing languages such as Thesauri and Subject Headings systems.

Library classification of a piece of work consists of two steps. Firstly the "aboutness" of the material is ascertained. Next, a call number (essentially a book's address), based on the classification system in use at the particular library will be assigned to the work using the notation of the system.

It is important to note that unlike subject heading or thesauri where multiple terms can be assigned to the same work, in library classification systems, each work can only be placed in one class. This is due to shelving purposes: A book can have only one physical place. However in classified catalogs one may have main entries as well as added entries. Most classification systems like the Dewey Decimal Classification (DDC) and Library of Congress classification also add a cutter number to each work which adds a code for the author of the work.

Classification systems in libraries generally play two roles. Firstly they facilitate subject access by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g. where it is shelved).

Until the 19th century, most libraries had closed stacks, so the library classification only served to organize the subject catalog. In the 20th century, libraries opened their stacks to the public and started to shelve the library material itself according to some library classification to simplify subject browsing.

Some classification systems are more suitable for aiding subject access, rather than for shelf location. For example, UDC which uses a complicated notation including plus, colons are more difficult to use for the purpose of shelf arrangement but are more expressive compared to DDC in terms of showing relationships between subjects. Similarly faceted classification schemes are more difficult to use for shelf arrangement, unless the user has knowledge of the citation order.

Depending on the size of the library collection, some libraries might use classification systems solely for one purpose or the other. In extreme cases a public library with a small collection might just use a classification system for location of resources but might not use a complicated subject classification system. Instead all resources might just be put into a couple of wide classes (Travel, Crime, Magazines etc.). This is known as a "mark and park" classification method, more formally called reader interest classification.


There are many standard system of library classification in use, and many more have been proposed over the years. However in general, Classification systems can be divided into three types depending on how they are used.

In terms of functionality, classification systems are often described as

  • enumerative: produce an alphabetical list of subject headings, assign numbers to each heading in alphabetical order



  • hierarchical: divides subjects hierarchically, from most general to most specific
  • faceted or analytico-synthetic: divides subjects into mutually exclusive orthogonal facets

There are few completely enumerative systems or faceted systems, most systems are a blend but favouring one type or the other. The most common classification systems, LCC and DDC, are essentially enumerative, though with som

