QUESTION 1: CLEAN THE DATA
The dataset contained 10 missing values, and two out liars in mean kilo watt hours per day for the month (V8). There are many ways to clear the data from these discrepancies, instead of using the case wise deletion method for missing data; I followed the procedure as below:
1. First I drew the graphs between V4, V 5 against the time series, from where the pattern of the graph was clear. To impute the missing values, I took the assumption that in 1991 and 1992, the gas was consumed but the company did not billed in September, while in 1993 and after that the gas company started sending the bills in August and October accounted for 60 calendar days. So I have taken the average of August ? 91 and October ? 91, in the missing data of September ? 91, and similarly for September ? 92. On the other hand after 93 and onward, I have divided the values (V4, V5 and V6) of August and October into halves and impute into the missing values of July and September respectively as shown below (Used jpeg format for ease).
(Imputation of missing data in 91 and 92)
(i could not add the excel files in this file so where there is space understand its a place of excel file)
(Imputation of missing data after 93 and onward)
2. To check out the out liars in the dataset, I used the route of Grubb's tests, the details of the test is in the excel sheet wac027, placed in the directed folder on Indus. After applying the test, it was clear that there was no outliar present in V4 dataset while in the V8 dataset, values of September ? 93 and March ? 96 were suspected outliars (from the V8 ~ time series graph), which were confirmed later by Grubb's tests as ...