Theories And Definitions Data Analysis Essay


Discuss about the Theories and Definitions Data analysis.



There are various statistical techniques that are available for the analysis of data. Regression analysis, confidence interval are important measures of statistics. There are two types of variables that are used in statistical methods. Continuous random variable and discrete random variable are the two important divisions of the random variables (Struben et al. 2015).

In this assignment, various statistical techniques would be explained to provide a brief idea about them. Theories and definitions related to regression techniques, discrete random variable, continuous random variable and confidence interval would be described in this assignment. The assignment would give an idea about the above said methods and variables in statistics and the procedure to use these techniques and variables. The assignment would also give an idea about the knowledge and understanding that had been gained from the theories and definitions related to regression techniques, discrete random variable, continuous random variable and confidence interval.

Regression technique

Regression techniques are used to find the relationship between the independent variable and the dependent variables (Kleinbaum et al. 2013). Regression analysis is used to predict a continuous dependent variable from one or more independent variables (Cameron and Trivedi 2013). The degree of influence of the independent variables on the dependent variable is understood by the method of regression techniques. There are seven types of regression techniques that can be used. They are linear regression methods, polynomial regression, stepwise regression, logistic regression, lasso regression, ridge regression, and Elastic Net regression (Cameron and Trivedi 2013). The regression equation is given by Y = ?0 + ?i Xi ; where i= 1,2,….k (Kleinbaum et al. 2013). Regression techniques help to frame a model that can be used for prediction and forecasting and infer casual relationship between independent and dependent variables. This tool of modeling and analyzing data is used by fitting a curve on the points of data so that the distance between these data points and regression line is minimized (Draper and Smith 2014). Thus, regression analysis is used to indicate the significant relationship between independent variable and dependent variables and indicate the potency of the impact of independent variable on dependent variable.

Discrete random variable

The variable which can acquire only a countable number of values is termed as discrete random variable. In the methods of probability distribution, the random variable is defined as the function which maps the point of a probability space on the real space (Petrov 2012). When there random variables is discrete, it is known as discrete random variable. The probability distribution function of a discrete random variable is known as probability density function. The probability mass function of any discrete random variable is any graph, table, or formula which gives each probable value along with its probability (Petrov 2012).

Continuous random variable

The random variable which takes infinitely many values is known as continuous random variable (Kay 2012). The values of these variables come from continuous random distributions as a continuous random variable tales on an uncountable infinite number of possible outcomes. Specific probabilities cannot be assigned to the continuous variables as a continuous random variable takes unlimited number of possible values (Hogg et al. 2014). The values of continuous random variable come from an interval which consists of the values of these random variables. The probability distribution function of a continuous random variable is knows as probability density function (Hogg et al. 2014).

Confidence interval

Confidence interval is defined as the interval estimate of a parameter of the population (Struben et al. 2015). Confidence interval is usually calculated from the observations and is different from sample to sample. Confidence interval is formed by calculating the lower confidence limit and upper confidence limit of the sample from the population (Cappelleri and Ting 2015). The region between this lower confidence limit and upper confidence limit gives the confidence interval of the population parameter. A large range of values lie in the confidence interval and it acts as a perfect estimate of the population parameter. Confidence interval is usually considered for 99%, 95%, 90%, 85% and 80% of confidence interval (Perez et al. 2014). These confidence intervals are usually considered from the desired level of significance in which the sample is to be estimated.


It can be concluded that these statistical techniques are useful for knowing various dimensions of the sample and its population. The dependency of the dependent variable on the independent variable, two types of random variables and the confidence interval of the parameters of population was described and its utility in the statistical field was given. It is seen that the mean of the variable was 17.491; median was 18.63 while the mode was 22.75. It is seen that discrete random variable and continuous random variable have different modes of approach towards probability distribution. The discrete variables show that the 22 male customers use sauna service and 25 of them do not like it. Similarly, 20 female customers use the sauna service while 31 of them do not like it. It is also seen that the highest frequency result with 8 customers which have a 19% for the first hour and there are two second highest frequency with 6 customers each have a 14% for the fourth hour and twentieth hour respectively. The Poisson distribution shows that the probability that 2 customers visiting sauna after gym is 0.2707 while the probability reduces to 0.0361 when it comes to 5 customers. The confidence interval was found to be (14.773, 18.774). The regression technique shows that there exist a negative relationship between the timing hour and the number of customers’ visiting sauna. The value of R2 was found to be 0.264 while the value of R was found to be -0.514.


I have learnt that there are two kinds of variables that are random, are available. They are continuous random variable and discrete random variable. Both these variables have different approaches towards the probability distributions. I have learnt that regression techniques indicate the significant relationship between independent variable and dependent variables and it indicates the strength of relationship between them. I have also learnt that confidence interval estimates the interval estimate of a parameter of the population. Thus, this assignment had helped me to understand various statistical techniques and their utilities. I came to know that there exist dependent vaialbes and indepdnnet variables apart from discrete and random variables. I came to know about the importance of the relationship between them. I also learnt the importance of confidence interval.


Cameron, A.C. and Trivedi, P.K., 2013. Regression analysis of count data (Vol. 53). Cambridge university press.

Cappelleri, J.C. and Ting, N., 2015. Evaluation of a Confidence Interval Approach for Relative Agreement in a Crossed Three-Way Random Effects Model. In Applied Statistics in Biomedicine and Clinical Trials Design (pp. 381-392). Springer International Publishing.

Draper, N.R. and Smith, H., 2014. Applied regression analysis. John Wiley & Sons.

Hogg, R.V., Tanis, E. and Zimmerman, D., 2014. Probability and statistical inference. Pearson Higher Ed.

Kay, S.M., 2012. Continuous Random Variables. In Intuitive Probability and Random Processes Using MATLAB® (pp. 285-342). Springer US.

Kleinbaum, D.G., Kupper, L.L., Nizam, A. and Rosenberg, E.S., 2013. Applied regression analysis and other multivariable methods. Nelson Education.

Perez, A.E., Haskell, N.H. and Wells, J.D., 2014. Evaluating the utility of hexapod species for calculating a confidence interval about a succession based postmortem interval estimate. Forensic science international, 241, pp.91-95.

Petrov, V., 2012. Sums of independent random variables (Vol. 82). Springer Science & Business Media.

Struben, J., Sterman, J. and Keith, D., 2015. Parameter and confidence interval estimation in dynamic models: maximum likelihood and bootstrapping methods. Analytical Handbook for Dynamic Modelers.

How to cite this essay: