Level of significance are 90%, 95%, 99%
Type I error
Data analysis
Deviation and spread
P value
Alternative hypothesis; Null hypothesis
It means that there is substantial evidence against the null hypothesis and we reject it at 5%
Age, sex, income, occupation
It is used to emphasize that the data mining or finding data to prove a hypothesis but it does not mean that observational data is enough to infer the causation. To prove the hypothesis one must have experimental data which can lead to infer the causation.
a Data Integration: collect data from all the different sources.

b Data selection: Select the data which will be useful for mining

c Data Cleaning: Remove the error in the data selected

d Data Extraction: Extracting relevant information from the data

e Data Interpretation: Interpreting the results obtained.

a Clustering

b Regression

c Decision trees

d Neural network

e Genetic Algorithm

Data Mining

Data Mining

Big data

Randomized response

Correlation regression



Interview and survey

The data selected can serve the purpose for which it is selected

A representative sample is a small amount which represents the characteristics of the larger entity accurately.

The researcher can avoid bias by

Doing a preliminary research and asking open ended questions

Clearly outlining the population for which the study is to be conducted

The researcher should have complete understanding of all the statistical techniques before starting the research.

The consequence of improperly collected data are

The researcher will not be able to answer research questions inaccurately.

The researcher will not be able to repeat and the validate the study

The researcher will lose trust and will not be consulted for further studies.

Data selection is dependent on purpose for which data will be used, potential reuse, timeframe for which the data will be used, budget for data selection.


