Number of web users has been increased up to 10 times starting from 1999 to 2016. The first billion users were achieved in 2005, the second in year 2010, third in 2014 and forth billion internet users are close to be achieved.
According to the above graph one million users where using the internet in year 2005 and the numbers were double in 2010. This proves that number of internet data is going double after each 4 to 5 years. The source of this data is in inform of journals, ecommerce stores, internet pages, individual to individual information exchange, wikis, e-businesses, internet gaming, banking and other elements of this sort. Storage and handling of this data is another challenge. Organizations are more focused on data purification which means storing and analyzing useful information from data and discarding useless data.
The facts state that user generated data can be extremely importance if processes for data gathering, data storage, data transformation, data cleansing, generating knowledge through information are carried out with latest data handling approaches. This data is then used in decision support systems to help managers taking business decisions. Taking the example of e-commerce online shopping stores, these decision support systems serves best in terms of their applications. Organizations are getting user generated data from most sold products, user geolocation & geographic and user profiles. The data serves as the intimal stage of decision support system and results in profitability, business patters, and future trends prediction.
Social insurance can likewise be enhanced by amassing, investigating, overseeing and osmosis of gigantic volumes of unstructured and organized information as of now creating by human services systems. The issue is the means by which to handle the user created content, where to spare and how to prepare and investigate in efficient approach to change over it into productive data. Here comes the expression “Big Data”, the idea is not new; however, the method for its definition is evolving continually, no single partner characterize the term similarly. Huge information has a relationship with two stages, one is information stockpiling and the other one is investigation on data which is already stored in database for knowledge repositories. In the beginning Big Data was defined by three V’s, Volume, Velocity and Variety.
Handling user generated data has influenced scientism to investigate a feasible model in order to handle data changeover and getting productive data from it. Finally, a model came up called “Big Data”. The idea is relatively new but methods which generate this model are old enough and evolving with time. Similarly, these evolving methods changing the definition of Big data. Big data is the combination of two main stages, first is information gathering and the second is storing information as simple as storing in a database to create knowledge repositories. In 2011 Big data was defined by three V’s i.e. Volume, Velocity and Variety, later fourth V was introduced by IBM called Veracity.
Big Data is strongly influencing all information sectors such as health care, government identification departments and businesses of all sorts. Approach of handling data through Big data involves two segments; first is to keep the data clean, which means there should not be any data redundancy and the other is to handle unstructured form of data which involves data cleaning and its optimization for query purposes and decision support. Considering health care as an application of Big Data. With revolutionary improvement in health care industry its volume is expected to grow exponentially. “It is the critical requirement of health care departments to adopt latest tools, models, techniques and hardware handle big data efficiently. Failure of this can be risk of losing millions of dollars and medical records accuracy”.
Similarly, In the field of Geography, Big Data is the vital part of this global village. Big data provides numerous opportunities with strong insight of the diverse data. “Traditional system for handling data failed to perform when data is divers, unformatted or different formats and stored in different physical locations”. Companies generally consider Big Data as solution to handle huge data volume but Big Data can handle any data volume and any data formats of almost all types. Although, Big Data has impressive advantages but its flaws cannot be undermined. The flaws can be a great potential risk for any organization if not handled properly. “Data sets from internet sources are often unreliable, error prone and unstructured”. Jesper Anderson (co-founder of open financial data store FreeRisk) further explains that combination of data generated from different internet sources can be risky as these resources can be error prone and can only magnify the problem when generated from multiple datasets. Critics have accepted that all 4 V’s of Big Data has the tendency to improve decision support system of any organization compared to old data warehousing techniques. Still there is another” Value” addition required in Health care sector. “Analyzing data for decision can be hard for managers without adding “Value” to it for competitive advantage.
A Formal Risk Assessment
Bases of risk assessment is called explanation of facts. Process consists of analyzing diverse stages to distinguish between risks, threads and opportunities. After analyzing the possible risks and threads, appropriates measures are required to minimize the risks which can directly minimize the loss. Risk analysis has 3 stages called: Risk Identification, Risk Classification and Risk Analysis.
Researchers follow the different methods to conduct risk analysis. Choice of these methods depends upon the size and magnitude of the problem. These methods are:
- Qualitative method.
- Quantitative method.
- Semi- Quantitative method.
Frist of the queue is Qualitative Method. This method is used when potential of threats is low and organizations are looking for basic parameters of managerial decisions. For instance, business ventures consider qualitative model of risk assessments provided that businesses that are margining have no inside potential risks individually. Quantitative method normally lack in presetting numbers, graphs and figures but they are helpful to find the probability and impact of a risk or threat. Managers can set priority of risks for further analysis and this can help organizations to reduce uncertainty and be focused on high priority of tasks. Semi-Quantitative method is the technique between qualitative and quantitative methods. The representation is the combination of textual and numerical representation. Risk can be characterized as Low, Medium or High. While scaling the risks researchers must provide deep attentions since false prioritizing can result in whole process to fail. The risks involved in Big Data are as follows:
Risk of Data Loss
First and foremost, risk to consider is avoiding any productive data loss. This can happen in two prospects. When reformatting data of each data sources there is a possibility that some of the data can be lost. To avoid this risk organizations must study the types of formatting Big Data which can support and avoid formatting if there is a possibility of data loss. Combining data of all sources and converting unstructured data into structured form can also be sources of data loss.
Economic Constraints for Small Organization
Massive difference can be observed when merging data, analyzing and applying decision support system with old system and latest technologies. This means that there is always a cost to achieve accurate results and this can be expensive for small organizations.
Technical Uncertainty in Technologies
Some risks are based on technologies used. Combing old, new and latest technologies can be of greater risk in handling data. Since Big Data is not under a single module. All its technologies are developed under different vendors.
Risk to Handle Corrupt Data
Risk of handling data stored in cloud or in physical apart storage systems. Since data can be in different formats and duplication of data is always there. When combining this data there is a possibility of data corruption.
Issue of Data Analysis
Data analysis can be a greater potential list when data formatting and storage enrich with risks. Organizations lack of handling risks in initial stages of Big Data can cause bigger risk and false data analysis in later stages.
Risk of Uncontrollable Events
Events of natural disasters, accidents and fire can damage data stored in different locations. Although frameworks like Hadoop works in form of data nodes and each data node has its backup stored in other 4 to 5 nodes, still there is a possibility of data loss.
Communication of Risk
After the risk assessment, prioritizing risk is the next task. Managers follow the same scaling as Low, Medium and High in order to deal with these risks. Resources are deployed based on the priority and complexity of these risks. Organizations set standard values for each risk scale based on these factors total risk scale on each risk is calculated.
Organization follows the tradition technique to communicate risks. In case of Big Data only difference is that the data security is prioritized above all risks. If data is under serious threats it comes at top priority and after dealing with data security, same risk priority is followed.
Big Data has a great potential to handle unstructured and multiformat data stored in distributed environment. Handling risks of Big Data can result in massive change in organization’s strategy. Techniques of gathering data based on user requirements, decision making, analyzing internal and external user needs are also part of the process of avoiding risks.
To handle each risk related to Big Data organization should broadcast the change before dealing with each risk and threat. There is a need to consults with managers and technological department so that the message can be conveyed to each relative member of the organization. Handling multi-formatted data is the distinguish factor in Big Data – for that instance, data analysis should be done with perfection before implementing Big Data. Losing a single node of data with no backup or generating patterns with data which is itself is not clean can break the backbone of any data centered organization. In the new world it is foremost important for organization to understand the importance of data especially right data (data that can be used for decision support). This is also proven truth that dealing with risks involved in Big Data can not only sustain the organizations but can also guarantee competitive advantages. Organizations that are interested in short time data analysis (weekly, fortnightly) need to be more focused on its other attributes of Big Data. That is the reason why it is recommended to follow other models in handling Big Data such as handling thousands of data nodes, applying analysis based on needs. This approach can not only reduce the cost of new infrastructure but can also speedup the process.