Data Collection and Estimation

Introduction

There are 2 main concepts in this topics; data collection and error. The first concept is about how to gather the information for our system analysis. One of the way is to go to Australian Bureau of Statistics which has much reliable data about Australia. Another way is doing survey. The main advantage of doing a survey is that we can get all the direct information that we need by asking specific questions. If we have our own survey data, only less assumption need to be made and much more accurate and reliable information can be extracted for our system.

Back to envelope estimation

When we have little or no data for analysis, we need to make some assumption and estimates the data. In the case of such situation, Fermi estimation technique which involves top down or bottom up estimation based on the little data or assumption we have made. For example, if we want to calculate the number of students who go to study at libraries ANU and we will begin with top down estimation. Let'a assume that Canberra has population of 500,000 people and 1 out 3 people is a student, this will give 500,000 * 1/3 = 166667 people. we further assume that 1 out 3 students are ANU students which will give 55556 people. Then we estimate that 1 in 6 ANU students would like to study at libraries at ANU. This will give us 9260 students which is not too far from the number we obtained in the scaling section on this page.

The concept about error

The error arise from everywhere when we try to manipulate the data. Sometimes error can be arise from assumptions we made for our analysis. Even though we make our own survey and collect data, there is a chance of getting error. The way of the wording that we made in the questions is one source of error.

According to Australian Bureau of Statistics, there are 2 types of error; sampling error and non-sampling error. Sampling error related to sample size, randomness of sampling and appropriate proportion of data. For a survey to have minimum error, the sampling size should be at least 30 and the questions should be asked to random persons which means asking people around the family is not really a good idea to have data. In order to to minimize this error, I have tried to collect as many responses as for my survey about library usage. I got about 101 responses for my survey which is quite fair amount of sample size to avoid sampling error. The survey is distributed on WATTLE and Facebook, and completed by random people. This will help me to reduce the biasing in answering question. For example, if I asked the same survey to my friend or family or neighbors, the result will not correctly reflect the actual data.

Non-sampling error is about question wording and interpretation of data. The questions in the survey should not be biased. It should not be a leading question. For example, "Do you think that studying at library is a good idea?". The question should not be pushed to make the desired conclusion of analysis. I tried to avoid asking such kind of leading question in my survey.

Data collection

I have conducted a survey about how students are using the libraries at ANU. All the calculation and data estimation in this portfolio is done based on the 101 valid responses from the survey. According to the survey, 51.5% of people choose library for their choice of place for study. The survey is distributed across different media.

Survey question didn't limit to one library. There are 5 libraries in ANU. So, the responds collected in this survey represents the data across the 5 libraries.

Figure 2: Survey result of prefer place to study

Survey Objectives

The objective of the survey is to obtain reliable and suitable data required for the purpose of system analysis for the portfolio. The survey aims to
- identify the population that use library
- understand how students use different library facilities during normal and exam period
- understand the study habits of students

Survey Process

The survey required users to provide the facilities they use in library, number of spent at library during 2 different time period, etc. Users are required to select multiple choice answers, tick boxes answers and select answer from drop-down list.

The survey was conducted in August 2013 and allow the users to completed the survey anonymously. Users are asked to answers the survey only once and all the questions are mandatory. The confidentiality and mandatory questions helps ensure that true and genuine data is collected from the participants. The survey should be completed online only.

Scaling

According to ANU International undergraduate student guide 2014, There are 10,200 undergraduate students and 8,200 graduate students. So, there are total of 18,400 students at ANU. The survey will be scaled by percentage proportion, this will give 9475 students would like to study at library while 8924 students would like to study at home. We saw that the number 9474 is not too far away from the number of students we have estimated in the "Back to envelope estimation" section.