Lesson 3: Sampling Techniques

Introduction to Sampling Techniques

A statistical analysis involves several steps, including:

Asking a research question
Designing a study to answer the research question
Defining the population of the study and drawing a sample from the population
Collecting data from the sample
Analyzing the data to answer the research question
Presenting the results of the study
Drawing conclusions and making recommendations

It can be seen from these steps that a statistical analysis involves analyzing data collected from a sample of individuals or objects. There are various sampling techniques for drawing a sample from a population. It is important to understand various sampling techniques because sampling strategies determine the extent to which statistical analysis results can be generalized and some sampling techniques could be more appropriate for certain studies than others.

Sample

A sample consists of all the individuals, objects, or observational units selected from a population of interest for analysis. The total number of observational units in a sample is called the sample size. Each sample can be described using sample characteristics or statistics such as mean, proportion, standard deviation, variance, etc. A statistical analysis is usually conducted with a sample, then the results obtained from the sample are generalized to the population if the sample is representative of the population.

Population

A population consists of all the individuals or objects of interest. A population is described using population characteristics called population parameters (or simply, parameters). Similar to sample statistics, population parameters include mean, proportions, variance, standard deviation, etc. Notation: Lowercase English Alphabet letters are used to denote sample statistics, while Greek letters are mostly used to symbolize population parameters, as shown below:

Sampling

Sampling is the selection of observational units from a population of interest. It is usually preferable to study and analyze a sample instead of the entire population. It is more cost-effective, less time-consuming, and takes fewer resources to study and analyze a sample than the entire population. However, in situations where the population of interest is small, the entire population could be studied. A sampling frame is a list of all individuals in the population from which the sample is drawn.

Statistical Inference

When a sample is studied, data about each observational unit on variables of interest is collected. A sample statistic such as sample mean or proportion is computed from the sample data. Sample statistics are used to estimate population parameters. Therefore, sample data is used to make statistical inference or conclusion about the population.

To use sample data to make statistical inference about the population, the sample needs to be representative of the population. The extent to which a sample is representative of the population determines the strength of a statistical inference about the population. So, a more representative sample would lead to a more valid statistical inference about the population. A representative sample is the foundation of valid statistical inference. The sampling method used to select or draw a sample from the population determines whether the resulting sample will represent the population.

Sampling Methods

Sampling methods can be classified into random sampling or non-random sampling methods. Generally, random sampling methods produce more representative samples, while non-random sampling methods produce non-representative samples.

Random Sampling Methods

Random sampling methods are sampling techniques that use a probabilistic approach to draw a sample from the population. That means observational units are selected by chance. There are different types of random sampling methods, including simple random sampling, systematic sampling, stratified sampling, and cluster sampling.

Simple Random Sampling

Simple random sampling is a random selection of observational units from the population such that each observational unit has the same chance or probability of being selected. Let’s assume we had “N” individuals in the population and needed to randomly select a sample of “n” individuals using a simple random sampling technique. We could achieve this by writing the name of each person in the population on a piece of paper and putting all the pieces of paper in a basket. Then, we can thoroughly mix the pieces of paper and select one paper from the basket. The name on the selected paper represents a randomly drawn observational unit.

This process can be repeated “n” times to draw a sample of “n” observational units. If we had a large population, it would be tedious to write the names of all the individuals on paper. So, it is better to use a computer program to randomly select individuals. For example, the sample() function in R can be used to select n items from a vector of N elements, with or without replacement. Though simple random sampling could be done with or without replacement, the probability of selecting a unit slightly increases from one selection to the next when random sampling is done without replacement.

Systematic Sampling

Systematic sampling is a random sampling technique where the first observational unit is randomly selected from an ordered sampling frame, followed by every Kth unit. K is calculated as the population size (N) divided by the desired sample size (n). The first unit is randomly drawn from the first K units. K is repeatedly added to the index number of the most recently selected observational unit to select the next unit. For example, suppose we have 1000 units in the population and would like to draw 100 units using systematic random sampling; then k = 1000/100 = 10. If we randomly selected the number “3” from the first 10 units, the subsequent units would be drawn by repeatedly adding 10 to the latest number selected. Since the sampling frame is ordered, the units selected by systematic sampling will be units 3, 13, 23, 33, 43, 53, …, 983, and 993.

Stratified Sampling

Stratified sampling is a random sampling method that involves splitting the population into groups called strata. The splitting is based on a categorical variable such that the groups are the possible values or categories of the categorical variable. For example, a population may be split by gender into male and female groups or strata. A simple random sampling is then conducted within each stratum, and the units selected from each stratum are combined to form a stratified sample.

Cluster Sampling

Cluster sampling is the random selection of geographical regions from the population where each region is representative of the population. Geographic regions include zip codes, school districts, cities, states, etc. In a one-stage cluster sampling, clusters or geographic regions are randomly selected from the population. In a two-stage cluster sampling, clusters are randomly selected from the population; then, observational units are randomly selected from each cluster and combined to form a sample. For example, 5 cities could be randomly selected from Colorado, followed by a simple random selection of 50 individuals from each city. The individuals selected from each city are combined to obtain the final sample of 250 observational units.

Non-Random Sampling Methods

Non-random sampling methods are sampling techniques that use a non-probabilistic approach to select observational units. That means observational units are not selected by chance. There are different non-random sampling methods, including purposive, convenience, and snowball.

Purposive Sampling

Purposive or judgment sampling is a non-random sampling strategy where a researcher uses expert knowledge or certain criteria to recruit individuals or select observational units to produce a study sample. The needs of the study determine the selection criteria used. For example, a researcher may decide to recruit only students with a GPA above 3.5, maybe because the researcher is interested in studying students with high academic performance. It is possible to obtain a representative sample if the selection criteria match the population’s characteristics.

Convenience Sampling

Convenience sampling is a non-random sampling technique that involves selecting a sample that is available or easy to reach. For example, if you provide copies of your survey to your classmates to complete because you can easily contact them, then convenience sampling is used. Convenience sampling will hardly produce a representative sample. Statistical inference assumes that the sample used for analysis is randomly selected, so samples obtained through convenience sampling can jeopardize statistical inference.

Snowball Sampling

Snowball sampling is a non-random sampling method where existing participants in a study recruit other participants through referral. This sampling technique is beneficial when the researcher does not know or cannot easily reach the individuals in the population to be studied. For example, if a study is designed to collect data from smokers, it may be difficult for the researcher to identify smokers or distinguish smokers from non-smokers. If the researcher finds a few participants who smoke, it is possible that recruited smokers know other smokers and can easily refer them to the study. This sampling strategy where participants in a study refer others to the study, is called snowball sampling.

Sampling Error

Suppose several samples of equal sizes are drawn from a population with replacement; the sample statistic, such as sample mean, will vary from one sample to the other. The sample statistic of each sample would also likely be different from the population parameter being estimated. Sampling error is the difference between a sample statistic used to estimate a population parameter and the actual population parameter.