# STATISTICS

### MEASURES OF CENTRAL TENDENCY

An average or a central value of a statistical series is the value of the variable which describes the characteristic of the entire distribution. In other words an average of a distribution is the value of the variable which is representative of the entire distribution. Following measures are usually employed for central tendency :

- Mathematical Averages
- Arithmetic mean or Mean
- Geometric mean
- Harmonic mean
- Averages of position
- Median
- Mode

### ARITHMETIC MEAN

ARITHMETIC MEAN FOR UNCLASSIFIED DATA

If n numbers be x1, x2, x3 ........ xn then their arithmetic mean

A or

ARITHMETIC MEAN FOR FREQUENCY DISTRIBUTION

Let f1, f2, ........, fn be corresponding frequencies of x1, x2, .......xn. Then

ARITHMETIC MEAN FOR CLASSIFIED DATA

Class Mark of the class-interval a – b,

For a classified data, we take the class-marks x1, x2, ...., xn of the classes as variables and

Deviation of a Term

If the variable be x and A.M. be A, then the deviation of x from A

d (or dx) = x – A

d (or dx) = x – A

#### USE OF SHORT-CUT METHOD IN COMPUTING ARITHMETIC MEAN

We take a number arbitrarily as the mean called the assumed mean (should be taken approximately in the middle of the greatest and the least values of the variable)

SHORT-CUT METHOD FOR SIMPLE DISTRIBUTION

Where, a = assumed mean, d = x – a, n = no. of terms

SHORT-CUT METHOD FOR UNCLASSIFIED FREQUENCY DISTRIBUTION

Where a = assumed mean, d = x – a, f = frequency of variable x

SHORT-CUT METHOD IN CLASSIFIED FREQUENCY DISTRIBUTION

Where a = assumed mean, d = x – a, x = class-mark of the class-interval, f = frequency of the class interval

STEP DEVIATION METHOD IN A CLASSIFIED FREQUENCY DISTRIBUTION

This is not an independent method for finding A.M. but it is an auxiliary method for short-cut method. In this method the class-intervals should be of equal lengths

,

Sometimes the values of di are divisible by a common number h (say). In such a case the calculation for arithmetic mean is reduced to a greater extent by taking i = 1,2, ...., n

h ui = xi – a, i = 1, 2, ....., n

h fi ui = fixi – a fi, i = 1, 2,......, n

#### COMBINED MEAN

If x1, x2, ......, xr be r groups of observations, the arithmetic mean of the combined group x is called the combined mean of the observation

A = combined mean. Ar = A.M. of collection Xr

nr = total frequency of the collection Xr

#### ALGEBRAIC PROPERTIES OF A.M.

- The algebraic sum of the deviations of all the values of the variable from their mean is zero
- for simple distribution Î£ (xi – A) = 0
- for frequency distribution Î£ fi (xi – A) = 0
- The mean of the sum of two variables is equal to sum of their means. (True for more than two variables also)
- Let u and Î½ be two variables and w = bu + cÎ½ be a new variable, where b and c are constants. If the corresponding A.M. be B and C for the variables u's and Î½'s and A.M. of w's be A, then A = bB + cC
- The sum of the squares of the deviations of the variable is minimum when taken about A.M.

#### WEIGHTED ARITHMETIC MEAN

If Ï‰ be the weight of the variable x then the weighted A.M. .

Short cut method :

aw = assumed mean,

Î£ wd = sum of products of the deviations & corresponding weight

### GEOMETRIC MEAN

If x1, x2, ......, xn be n values of the variable then

For Frequency Distribution , where

or

### HARMONIC MEAN

For Frequency Distribution, , where

### MEDIAN

The median of a distribution is the value of the middle variable when the variables are arranged in ascending or descending order.

Median (M) is an average of position of the numbers

#### DETERMINATION OF MEDIAN

DETERMINATION OF MEDIAN FOR SIMPLE DISTRIBUTION

Arrange the terms in ascending or descending order and then find the number of terms n.

- If n is odd then term is the Median
- If n is even then there are two middle terms namely,

and term. Hence

Median = Mean of and terms.

Median = Mean of and terms.

DETERMINATION OF MEDIAN FOR UNCLASSIFIED FREQUENCY DISTRIBUTION

- First find , where
- Find the cumulative frequency of each value of the variable and take value of the variable which is equal to or just greater than
- This value of the variable is the median

DETERMINATION OF THE MEDIAN FOR CLASSIFIED DATA

- Class limit and Boundary

Overlapping intervals, e.g. 10 – 20, 20 – 30, 30 – 40, ......

Upper limit for 10 - 20 interval = 20 Lower limit = 10

Non-overlapping intervals, e.g. 10 - 19, 20 - 29, .......

Upper boundary for 10 - 19 = = lower boundary of 20 – 29.

- Median Class

If in a continuous distribution the total frequency be N then the class whose cumulative frequency is either equal to or is just greater than is called median class.

For a continuous distribution, median

l = lower limit of the median class,

f = frequency of the median class

N = total frequency =

F = Cumulative frequency of the class just before the median class

C = length of the median class

DETERMINATION OF MEDIAN BY GRAPH

We draw the “less than” ogive and “more than” ogive for the distribution. The abscissa of the point of intersection of these ogives is the median

### QUARTILES

- The median divides the distribution in two equal parts. The distribution can similarly be divided in more equal parts (four, five, six etc.)
- The values of three division points which divide the distribution into four equal parts are called quartiles. If the first, second and third division values be Q1, Q2, Q3, they are called the first quartile, the second quartile and the third quartile respectively.
- If the distribution is arranged in ascending order, then clearly the first quartile will be greater than theth variable but less thanth variable, the second quartile will be greater thanth variable but less thanth variable and the third quartile is greater thanth variable but less than Nth variable

Clearly the second quartile = Median

First quartile is also called lower quartile and the third quartile as upper quartile

TO DETERMINE QUARTILES FOR A CONTINUOUS DISTRIBUTION

N = total frequency

l = lower limit of the first quartile class

f = frequency of the first quartile class

F = the cumulative frequency corresponding to the class just before the first quartile class

C = the length of the first quartile class

Similarly

where symbols have the same meanings as above only taking third quartile in place of first quartile

### MODE

The mode (M0) of a distribution is the value at the point about which the items tend to be most heavily concentrated. It is generally the value of the variable which appears to occur most frequently in the distribution

MODE FOR A RAW DATA

Mode from the following numbers of a variable 70, 80, 90, 96, 70, 96, 96, 90 is 96 as 96 occurs maximum number of times.

MODE FOR UNCLASSIFIED FREQUENCY DISTRIBUTION

Here the greatest frequency is 13, which is the frequency of 5. So mode = 5

MODE FOR CLASSIFIED DISTRIBUTION

The class having the maximum frequency is called the MODAL CLASS and the middle point of the modal class is called the CRUDE MODE.

The class just before the modal class is called PRE-MODAL CLASS and the class after the modal class is called the POST-MODAL CLASS

Determination of mode for classified Data (continuous distribution)

l = lower limit of the modal class

f0 = frequency of the modal class

f–1 = frequency of the pre-modal class

f1 = frequency of the post-modal class

i = length of the class-interval.

IMPORTANT POINTS

- The length of intervals should be equal.
- Intervals should be overlapping. If it is not overlapping it should be changed to overlapping
- If more than one class has maximum frequency, then determine the modal class using grouping method

Sometimes, 2f0 – f–1 – f1 becomes zero. We then write

### SYMMETRIC AND SKEW DISTRIBUTION

A distribution is symmetric if the same number of frequencies is found to be distributed at the same linear distance on either side of the mode. The frequency curve is bell-shaped and A = M = M0

In antisymmetric or skew distribution, the variation does not have symmetry

- If the frequencies increase sharply at beginning and decrease slowly after modal value, then it is called POSITIVE SKEWNESS and A > M > M0

- If the frequencies increase slowly and decrease sharply after the modal value, the skewness is said to be NEGATIVE and A < M < M0

### RELATION BETWEEN MEAN, MEDIAN AND MODE

For a symmetric distribution, M0 = M = A

For a moderately skewed distribution A – M0 = 3 (A – M)

This is an empirical formula and was first described by KARL PEARSON.

### MEAN DEVIATION AND STANDARD DEVIATION

#### DISPERSION

The degree to which numerical data tend to spread about an average value is called VARIATION or DISPERSION of the data. (Its measurement is called DEVIATION)

Deviation of a term from a measure of Central Tendency

If the term = x and the mean = A, then deviation d or dx = x – A

is called absolute deviation.

is called absolute deviation.

MEASURE OF DISPERSION

Popular methods are (i) Mean Deviation (ii) Standard Deviation

Popular methods are (i) Mean Deviation (ii) Standard Deviation

Some other methods are (iii) Range (iv) Quartile Deviation

#### MEAN DEVIATION (Î´)

The arithmetic mean of the absolute deviations of the values of the variable from a measure of their either mean is called Mean Deviation (M.D.), denoted by or or

- for simple (discrete) distribution

where n = no. of terms, z = A or M or M0

- for Unclassified frequency distribution

- for classified distribution

where x stands for class-mark

The mean deviation is the least when measured from the median.

COEFFICIENT OF MEAN DEVIATION

It is the ratio of M.D. and the mean from which the deviation is measured.

Thus the coefficient of M.D. = or or

#### STANDARD DEVIATION (Ïƒ)

S.D.(Ïƒ) is the square root of the arithmetic mean of the squares of deviations of the terms from their A.M. i.e. standard division is the positive square root of variance

- For simple (discrete) distribution

- For frequency distribution

- For classified data :

where x = class-mark of the interval

IMPORTANT POINTS

- The square of S.D., i.e., Ïƒ2 is called the VARIANCE.
- The ratio of S.D. (Ïƒ) and the A.M. (A) is called the coefficient of standard deviation.
- The percentage form of coefficient of S.D., i.e. is called Coefficient of Variance.
- The distribution for which the coefficient of variance is less is called more consistent.

#### Root Mean Square Deviation (S)

The square root of the A.M. of squares of the deviations from an assumed mean is called the root mean-square deviation. Thus

- for simple (discrete) distribution

a = assumed mean

- for frequency distribution

if a = A, then S = Ïƒ

Short-cut method for S.D.

Where d = x – a and a = assumed mean

IMPORTANT POINTS

1. The RMS deviation is the least when measured from A.M.

2. The sum of the squares of the deviation of the values of the variables is the least when measured from A.M.

3.

4. For discrete distribution f = 1, thus.

5. Above formula is a short-cut for finding Ïƒ.

Variance =

6. The mean deviation about the mean is less than or equal to the S.D., i.e.,

#### STEP DEVIATION METHOD

Where c = length of interval and

This method makes the calculation simpler and it is used primarily in case of classified frequency distribution.