Chapter 1 : Defining and Collecting Data Notes


 
                     Defining and Collecting Data 


 Content 

1.1 Defining Variables 

     Classifying Variables by Type 

     Measurement Scale 

1.2 Collecting Data 

    Population and Sample 

    Data Sources 

1.3 Types of Sampling Methods 

   Simple Random Sample 

   Systematic Sample

   Stratified Sample

   Cluster Sample

1.4 Data Presentation 

    Data Cleaning 

   Data Formatting 

   Stacked and Unstacked Variables

   Recording Variables 

1.5 Types of Survey Errors 

     Coverage Error 

    Nonresponse Error 

    Sampling Error 

    Measurement Error

   Ethical Issues About Surveys 

====================================================================================================================


1.1 Defining Variables  


Variables- It is a characteristic that can have any assumed value.

 

Broadly there are two types of variables :-

1. Numerical Variables : Variables whose data represent a counted or measured quantity. Example- Monthly sales variables.

2. Categorical Variables : Variables whose data represents categories .Gender with its categories male and female is a categorical variable.


Numerical Variables are of two types :- 

A. Discrete Numerical Variables -  These Variables have data that arise from a counting process. Example- Monthly numbers of books sold in a book store. 

B. Continuous Numerical Variables- These Variables have data that arise from a measuring process. Ex- Time spent while waiting for bus because its data represent timing measurements. 


Measurement Scales 

It defines the ordering of values and determine if differences among pairs of values for a variable are equivalent and whether you can express one value in terms of another. 

  • Interval Scale - It expresses a difference between measurements that do not include a true zero point . 
  • Ratio Scale- It expresses an ordered scale that includes a true zero point. If a numerical variable has a ratio scale, you can characterize one value in terms of another. You can say that the item cost(ratio) $2 is twice as expensive as the item cost $1. 
  • Nominal Scale- Data measured on this scale , category values express no order or ranking .
  • Ordinal Scale- For Data measured on ordinal scale , an ordering or ranking of category values is implied . Ordinal scales give you information to compare values but not as much as interval or ratio scales. For example , the ordinal scale poor, fair, good, excellent allows you to know that ''good'' is better than poor or fair and not better than excellent. But unlike interval and ratio scales, you do not know the difference from poor to fair is the same as fair to good. 
1.2 Collecting Data 

Proper data collection avoids introducing biases and minimize errors. 

  • Population -  A population contains all the items or individuals of interest that you seek to study.
  • Sample - A sample contains only a portion of a population of interest. 
  • Statistic - A value that summarizes the data of a specific variables for sample data.
  • Parameter - It summarizes the value of a population for a specific variable. 
  • Primary data source- When you perform the activity that collects the data , you are using a primary data source.
  • Secondary data source - When the data collection part is done by someone else of these activities ,you are using a secondary data source .
  • Treatment - Researchers that collects data are looking for the effect of some change , called a treatment, on a variable of interest.
1.3 Types of Sampling Methods

  • Frame - The frame is a complete or partial listing of the items that make up the population from which the sample will be selected.
  • Probability sample -In a probability , you select items based on known probabilities . Whenever possible, you should a probability sample as such a sample will allow you to make inferences about population being analyzed. 
  • Non probability sample - In this sample you select items or individuals without knowing their probabilities of selection.
  • Convenience sample - To collect the convenience sample ,you select items that are easy , inexpensive or convenient to sample. 
  • Judgement sample -  You collect the opinions of pre-selected experts in the subject matter. Although  the experts maybe well informed , you cannot generalize their results to the populations.  
  • Simple Random Sample - In a sample random sample , every item from a frame has the same chance of selection as every other item. Simple random sampling is the most elementary random sampling . With simple random sampling, you use n to represent the sample size and N to represent the frame size. You number every item in the frame from 1 to N. The chance that you will select any particular member of the frame on the first selection is 1/N. It is of two types :- 
  • (a) Simple random sampling with replacement - In this sampling , you return the item to the frame after selecting it , the probability of being selected again is the same as before. Example - There is a bowl of 10 white balls and 10 green balls . You chose one ball and the probability of that ball to be red or white is 1/2 . You again put that ball in the bowl and chose one ball again. Here , the probability of red or green ball will be 1/2 as before. 
  • (b) Simple random sampling without replacement - In this sampling method , you cannot choose the item which you have chosen earlier. You do not return it to the frame after chosing it . Example - In the case of previous example , we were returning the balls after selecting it but now we will not . Lets say you chose a ball randomly and the people probability of it to be white or green is 1/2 but when we will again chose one ball then this time the probability will not be same as 1/2 . Lets say the previous ball was a green ball and now the probability of ball to be green will be 2/5 and probability of ball to be red will be 1/5 since we have removed the green ball once we had selected it . 
  • Table of random numbers - A table of random numbers consists of a series of digits listed in a randomly generated sequence. To use a random number table for selecting a sample, you first need to assign code numbers to the individual items of the frame. Then you generate the random sample by reading the table of random numbers and selecting those individuals from the frame whose assigned code numbers match the digits found in the table. Because the number system uses 10 digits 10, 1, 2, c, 92, the chance that you will randomly generate any particular digit is equal to the probability of generating any other digit. This probability is 1 out of 10. Hence, if you generate a sequence of 800 digits, you would expect about 80 to be the digit 0, 80 to be the digit 1, and so on. Because every digit or sequence of digits in the table is random, the table can be read either horizontally or vertically. The margins of the table designate row numbers and column numbers. The digits themselves are grouped into sequences of five in order to make reading the table easier.
  • Systematic Sample :- In a systematic sample, you partition the N items in the frame into n groups of k items, where k =N/n .You round k to the nearest integer. To select a systematic sample, you choose the first item to be selected at random from the first k items in the frame. Then, you select the remaining n - 1 items by taking every kth item. 
  • Stratified Sample :- In a stratified sample, you first subdivide the N items in the frame into separate subpopulations, or strata. A stratum is defined by some common characteristic, such as gender or year in school. You select a simple random sample within each of the strata and combine the results from the separate simple random samples. "Stratified sampling is more efficient than either simple random sampling or systematic sampling because you are ensured of the representation of items across the line. "
  • Cluster Sample :- In a cluster sample, you divide the N items in the frame into clusters that contain several items. Clusters are often naturally occurring groups, such as counties, election districts, city blocks, households, or sales territories. You then take a random sample of one or more clusters and study all items in each selected cluster.Cluster sampling is often more cost-effective than simple random sampling, particularly if the population is spread over a wide geographic region. However, cluster sampling often requires a large sample size to produce results as precise as from simple random sampling and stratified sampling. 

Post a Comment

0 Comments