Types of Statistics

Robert Wahlstedt

Grand Canyon University

DSC-510-O500

Professor Thakkar

December 1, 2021

1.1 The purpose of descriptive statistics is to provide basic information and to show relationships between data which is gathered (Nisbet, Elder, & Miner, 2019). The purpose is to neatly categorize raw data into useful information. 1.2 Common properties include box plots and clustering. 1.3 For example, are used to summarize stock markets based on recent events. 1.4 A common test is chi-square distribution test.

2.1 The purpose of inferential statistics is to infer facts about a population at large taking a sample of participants (Lind, Marchal, & Mason, 2018). For example, to determine if there is a relationship between how old people are when they get covid could be shown by inferential statistics in the form of a histograms. Since this is not likely to be the entire group, this would be inference. 2.2 Common properties are grouping and predicting. 2.3 This could be tested using both hypothesis testing and parametric testing. 2.4 Another scenario is fraud detection in a supervised fraud detection system (Nisbet, Elder, & Miner, 2019). The purpose of this is to mitigate risks by using a representative sample.

3.1 Frequency distribution is useful for determining the likelihood of an event occurring based on what has happened in the past (Lind, Marchal, & Mason, 2018). 3.2 Common properties are raw data or ungrouped data. 3.3 An example of a frequency distribution is finding the age of the billionaires in the United States per each state. The purpose of a frequency distribution is to determine the spread. 3.4 A way of testing a frequency distribution is with a box and whisker distribution chart. Outliers are displayed with the extension of the standard deviation as shown by the box and whisker plot. Another way of determining outliers is with stem and leaf plots.

4.1 Bayesian statistics is useful for predicting future events based on previous events. 4.2 A hypothesis is confirmed or denied based on certain test results. Common properties are decision alternatives and acts that confirm or deny a hypothesis (Kurt, 2019). 4.3 An example of Bayesian statistics is determining the outcome of a factor such as a student’s academic success based on their SAT scores. Every quarter the person receives a grade which confirms or denies the possibility of the student being successful in the future. These events are known as prior events (Kurt, 2019). 4.4 The way to test Bayesian statistics is through an A/B test (Kurt, 2019).

Part 2

Simple random sampling is a sample of a population that each member has an equal chance of being randomly selected (Lind, Marchal, & Mason, 2018). An example is a lottery. When finding a box of apples that are past date based on picking up an apple from the box and tossing away the entire box should the apple fail to pass the standard. Simple random sampling is done when doing a survey for a population is costly or time consuming.

Systematic random sampling is where all the elements are listed, and every nth element is chosen. It is particularly useful if the list is not predetermined ahead of time.

Stratified random sampling is when the population is divided into clusters and from each group comes representatives. These groups are called strata. An example of stratified random sampling is listing people by their last names for a poll.

Cluster sampling is where the groups are predetermined in advanced and out of these groups comes representatives. This is like stratified sampling except in stratified sampling the groups are not determined yet. A cluster sampling is the United States senate.

Categorial data analysis is using the type of data that can take on two ore more discrete values which may or may not be ordered such as eye color or race of person. Categorical properties are often not with a numerical significance. A property of categorical data analysis is the mode (Griffin, 2009). A chi-squared test can test two categories to see if they are related (Triola, 1998). Categorical data analysis can be represented in bar and pie charts.

Two means uses two means each with its own group and sees how these groups are different from each other. For example, if a group of students were to take a SAT test at the end of their 10th grade and at the end of the 12th grade because they want to see what classes created the biggest difference. The properties of each group include the two means and the data size as well as the grand mean and the group mean. A two means can be represented graphically by two overlaying box and whisker diagrams. The property is the difference between the two means when there is a normal distribution, the null hypothesis, the selected level of significance, and the test statistic. The way to test a two means is with a two-sample test of proportions.

Several means analysis or ANOVA is the study of factors and which factors might influence a mean. For example, is it true that protein can make a person gain weight? To determine the determining factor, we take the means of your individual weight and compare it to other hypotheses. We start with a null hypothesis and an alternative hypothesis. Next this is broken into two components, one of which is random and the other one is treated. The variance between samples is an estimate of the variance due to treatment like eating protein (Triola, 1998). Then we search for variance within samples which could be due to error possibly caused by outliers (Triola, 1998). Then we search for the degrees of freedom. We test ANOVA with the t-test. ANOVA is represented with a boxplot.

Linear regression investigates how much one quantity affects another quantity (Downing & Clark, 2009). Linear regression determines the relationship between two or more variables and quantifies the relationship. To set a linear regression set the parameters which are coefficients for the candidate predictor variable (Nisbet et al, 2009) and tell us how much we can expect to see the response to change when the explanatory variable is observed to change (Spiegelhalter, 2019). To test a linear regression by taking the sum of the squares of all the errors (Downing & Clark, 2009). Properties consist of the fitted straight line that enables us to make a prediction, the number of data and the deterministic unperfect representation because of error that screw the reality of the actual world (Spiegelhalter, 2019). We use a form where the model parameters fit the training set. There is also a version called gradient descent that gradually uses the model parameters over the cost of the training set (Geron, 2019). The normal equation consists of a line (Geron, 2019). An example is seeing tall fathers produce tall sons as offspring (Spiegelhalter, 2019). Linear regression is represented graphically by a straight line. Because of difficulty separating the good data points from the outliers, outliers must be retained.

A full factor ANOVA is helpful for two types of effects the main effects and causality effects which are secondary. To determine the main effects the means are compared. How can we determine what factors are affecting the result? In multi-factor ANOVA there are multiple factors of influence, and each are important. The samples must be independent of each other. The test is compared to a null hypothesis to see if there is an effect (The Visual Learner Statistics).

References

Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow. Oreilly.

Kurt, W. (2019). Bayesian Statistics The Fun Way. No Starch Press.

Lind, D. A., Marchal, W. G., & Mason, R. D. (2018). Statistical Techniques in Business and Economics. McGraw-Hill Irwin.

Nisbet, R., Elder, J., & Miner, G. (2019). Handbook of Statistical Analytics and Data Mining Applications. Elsvier.

Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data. New York, NY: Basic Books: Hachette Book Group.

The Visual Learner Statistics. (n.d.). Retrieved from https://lc.gcumedia.com/hlt362v/the-visual-learner/the-visual-learner-v2.1.html

Triola, M. F. (1998). Elementary Statistics. Addison Wesley.

Search This Blog

Robwahllabs

Types of Statistics

Comments

Post a Comment

Popular posts from this blog

SSL strip with http and https

Open Daylight and OpenFlow

Cancer