Lesson 1: Introduction to Statistics

Banner.

What is Statistics?

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data to uncover patterns, draw meaningful conclusions, and make informed decisions.

The root of the word “statistics” may be found in the Italian word “stato,” which means “State.” In the early times, statistics involved collecting and using data to manage the affairs of the State. Tracing the origin of statistics is like tracing the source of a river.

Statistics has come a long way and can be traced back to the days of Aristotle. Statistics was used in the early days to describe and compare States with respect to “public administration, justice, science, art, religious life, manners, customs, etc” (Harald, 1932, p.4). Today, statistics is not limited to describing the affairs of states; its applications span almost all fields of life and industries. The concepts and applications of statistics have evolved with time.

What are the Goals of Statistics?

Statistics allows us to:

  • extract meaningful information from data for decision making,
  • gain a better understanding of a situation using data,
  • describe a situation using data,
  • understand relationships between variables,
  • make generalizations or draw conclusions about a population using sample data, and
  • make causal inferences using sample data

How is Statistics Useful?

Decision Making

We make decisions daily, and sometimes, we unconsciously use data for decision-making. For example, when people want to buy an item on Amazon.com, they would probably check the customer ratings to ensure the item they want is highly rated. This is the use of ratings data to support a purchase decision. Similarly, organizations use data to support decision-making. Sales data can be used to understand the items in high demand to inform production and inventory decisions.

The success of several businesses today is based on data driven decisions. Gone are the days when decisions had to be based only on gut feelings. That does not mean intuition, expert judgment, and experiential knowledge are not important in decision-making. Expert knowledge alone is not sufficient for making optimal decisions. In addition to using qualitative information, meaningful information can be extracted from data using appropriate statistical techniques for decision-making.

Increasing Effeciency

Businesses and other organizations use statistical analysis results to increase efficiency and ultimately gain a competitive advantage. Statistics allows businesses to identify customer behavior trends, changes in market demand, and use such information for pricing optimization and promotions. Huge amount of data, the knowledge of statistics, combined with machine learning are used to build AI applications. AI is revolutionizing how people accomplish their day-to-day tasks. A problem that could take hours or even days to solve could now be solved using chatGPT in seconds.

Career Opportunities

The amount of data generated or collected by organizations has increased tremendously. Given that organizations can leverage meaningful information extracted from data to increase profitability, drive revenue, reduce cost, and improve efficiency, the demand for those with statistical and data analysis skills has skyrocketed. Almost all organizations need someone with data analysis skills to crunch and extract meaning from their data.

John Tukey said that the best thing about being a statistician is that you play in everyone’s backyard. People who understand how to find solutions using statistical methods are in demand in all fields of life. You don’t need to be a statistician to be interested or skilled in statistics. Statistics is open to everyone!

Making the World a Better Place

When a statistical model built with data is used to predict whether someone has a high risk for diabetes or not, the predictions could be used to recommend appropriate interventions to those who are at a high risk for diabetes. Insight from data is used in several ways to improve human life. The study of statistics provides us with the tools to systematically learn from data through scientific investigations, to make the world a better place.

Statistics helps us to understand relationships among variables in our world, which is the basis for making informed decisions. Suppose the results of a statistical investigation indicated that smoking causes cancer, this relationship between smoking and cancer could help individuals to make informed decisions about smoking.

Professional Communication

The study of statistics provides us with the professional language and vocabulary needed to understand and communicate statistical analysis results. The same word could mean different things in different fields. Without a good understanding statistics vocabulary, it is easy to misunderstand the communication of statistical information. For example, what comes to mind when you hear “experiment”? Do you think of a person with a white lab coat dissecting frogs in the lab? The word “experiment” in the context of statistics and probability differs from an experiment in a biology lab.

Technical Skill

The study of statistics is usually associated with the use of software tools for data analysis. Learning statistics could be a good opportunity to improve your technical skills by learning how to use statistical software for data analysis. Acquiring the technical skills required to run statistical analysis could open doors to new career opportunities. Learning to use tools such as Python for data analysis could be a game changer!

Understanding Statistical Information

Understanding statistics helps us to be intelligent and critical consumers of information. Sometimes, the information we consume from the media can be misleading. For example, suppose you were interested in working for a particular company and you wanted to know how the employer compensates the employees. While browsing through the company’s website, you read that the average salary of employees for that company is $ 84,000.

You might be tempted to think that the compensation for most of the employees is around $ 84,000. Wait a minute! This assumption or thought would be true only if certain conditions are met. We need to ask whether the salary data used to compute this average is skewed or not skewed. We should also be interested in understanding how individual salaries vary (on average) from the mean salary, $84,000.

There are uncountable scenarios that can produce an average salary of $84,000. For example, if all the employees were paid $84,000, that would give us an average salary of $84,000. If employee salaries were $20000, $30000, $30000, $40000, and $300000, that would also give us an average salary of $84,000. In the second situation, there is an outlier or extreme salary that makes the distribution of the salary data to be skewed. In this situation, the salaries of most employees are far below the average salary, $84,000. So, knowing the value of the average salary alone does not provide us with a better understanding of the distribution of salaries.

Though we don’t need the detailed salary data of all employees to be displayed on the company’s website, more information or other statistical measures are needed to get a better picture of the distribution of the salary data for this company. The median salary might be a better statistical measure to help us understand the center of the salary distribution. Information about the salary range will also be useful. Understanding statistics helps us to think critically, ask good questions, and draw better conclusions for ourselves when we consume statistical information.

What are the Two Main Types of Statistics?

Statistics can be broadly divided into descriptive statistics and inferential statistics: Descriptive statistics involve summarizing or describing data using measures such as mean, median, mode, variance, standard deviation, range, skewness, etc. Descriptive statistics also includes using visualization and graphs such as box plots, histograms, bar charts and pie charts, to represent data.

Inferential statistics is the use of sample data to:

  • estimate population parameters such as means and proportions,
  • test hypothesis about the population,
  • make generalization of statistical results to the population, and
  • make causal inferences

What are Some Applications of Statistics?

A few applications of statistics are as follows:

  • Manufacturers use statistics for quality control.
  • In marketing, statistics is used to understand which advertising media such as the television, radio, newspapers, etc has a greater impact on profit.
  • In the medical field, statistics is used to understand the effect of a new medication on health outcomes.
  • In education, statistics is used to understand the teaching methods that significantly increase academic outcomes.
  • Auditors in large firms may even use samples of invoices to estimate the proportion of incorrectly paid invoices.

References

Harald, W. (1932). Contributions to the History of Statistics. Retrieved from https://archive.org/details/in.ernet.dli.2015.233427/page/n13/mode/2up?q=stato