Saturday, September 12, 2015

Applications of Big Data in different domains




The term "big data" has been around for decades. A Quora posting provides an example of its usage dating back to 1987. In the 1990s, technologists referred to the big data as the growth in data volume, pointing to a relatively new data source known as the Internet, and discussed its impact on storage systems. Thanks to Moore's Law, computational power and storage became cheaper and more accessible, enabling the Big Data sources to keep rather than discard data.

In the 2000s, the emphasis was on meaningful integration of data from different sources. By this time, many processes across functions such as supply-chain, market research and strategic planning were getting directly or indirectly connected to tangible/quantifiable information. For instance, strategic plans were starting to be backed up by historical evidence through data rather than qualitative judgements based on prior experience of a few managers. The number of sources of Big Data has multiplied in the last few years and their capability to generate continuous data has increased exponentially.  







Below, we present some use-cases and perspectives for Big Data application across different areas.

Environment

Environment in general, can be classified into (a) the micro-environment or indoor spaces where humans spend most of their time - be it office, home or community centers and (b)  the macro-environment or the ecological system. Sensors have been used for both types of environments in the past, such as for applications with satellites monitoring global weather changes or household thermostats. There are exciting applications of Big Data analytics in this domain such as IBM-Alberta University’s real time analysis of Environment (video) or Microsoft China’s Air quality monitoring in smart buildings (article).

Sensors are able to minutely record human movements, air quality, light and several other factors. We all know that indoor environment conditions are closely related  with the human eco-system. Our mental state and wellbeing is closely related to characteristics of the environment that we are exposed to. Workers in factory settings often complain of various health problems due to extreme environment. Workspaces with more natural light and better ventilation has been hypothesized to improve employee satisfaction as well as efficiency. Now imagine collecting data about the human state and relating to everything that she is exposed to,  during a typical day in her life. This can be conceptualized as  adding the human component to the IoT paradigm, where currently several devices interact with each other. For such a human-environment interactive system, devices not only interact with each other; but also accept and deliver signals at real time to the humans. This could imply the possibility of futuristic applications that assist human well-being by using Big Data. Imagine a thermostat sensing unusual variability in the ambient temperature of a person in addition to controlling room temperature. Then, such a device could be designed to predict if a person is going to fall ill based on inputs from medical records and the temporal pattern of body temperature (assuming near-body temperature variability is proportional to the inner body temperature). These data analytical  applications complement applications supported by improving technology such as alert systems. Such a proposed application apparently  capitalizes on a subset of Big Data applications for environment data. Anomaly detection, unusual pattern detection, prediction of health conditions, natural disaster prediction are top-of-the-mind applications of Big Data analysis for environment that will be driven by the expedient progress of environment sensing technologies as well as data science algorithms. Big Data for environment and assisted human living is the future which has been projected and an immanent part of the datafication revolution that we are currently experiencing.

Epidemiology

Epidemiology is a branch of medicine science that closely related to public health. ‘Epi’ means
upon or befall, ‘demo’ means the people, and ‘ology’ means the study of. So literally epidemiology is the study of what falls upon the people. The classic definition of epidemiology is the study of distribution and determinants of disease frequency of human populations. It precedes the larger spectrum of environment studies on human well-being and the underlying theories are therefore more mature. Epidemiology is a comparative discipline. By making comparison among different groups, an epidemiologist would be able to identify causal factors of diseases.

Big Data techniques can be used in creative empirical studies pertaining to the area of epidemiology. Epidemiological data has high variability, volume and veracity. For example, the spatial and temporal information are available for a large range of epidemiological factors. The populations at risk for any health outcome are broken down into several groups by using timestamp and location co-ordinated in the datasets. As part of developing innovative Big Data solutions, the medical symptoms or concerns among those groups could be collected by data crawlers over the web. Data analysis models comparing the spatial-temporally identified groups, could provide evidence for developing new theories. Big Data methodology such as network thinking could be used to integrate different sources of information together to explore reasons, detect outbreak, and provide surveillance of epidemic disease. For instance, Google Flu Trends uses frequency of certain search keywords rather than survey data provided by CDC, providing several weeks faster detection and surveillance of influenza outbreaks. Another project, Toronto-based BioDiaspora, models the spread of infection in a different way, using global airline data to predict and track the spread of diseases based on the origins, travel routes, and destinations of commercial flights. These are few applications demonstrating, how Big Data could re-model the way we traditionally have approached epidemiological research problems in the past with experiments and observational studies.

However, there are some concerns about the use of Big Data among epidemiologists. In Dr. Antoine Flahaut’s speech - “Big Data in Public Health Research: Will it Kill Epidemiology?”, he presented several challenges and threats that epidemiology would face with the introduction of big data in research [2]. The ethical issues with confidentiality and privacy, and data quality issues, which meant that the data was not originally collected for epidemiology research, were the most concerning ones. Nevertheless, Big Data does help in improving quality and efficiency of the health care system, creating new job demand in computer science, mathematics, and public health, and inventing new paradigm in public health research. It is an emerging trend of using Big Data in epidemiology research and applications. Let’s wait and see the what happens!

Government / Economy

In today’s world it’s very hard to find good sources of data. However, government sector has been a very vast yet overlooked and underutilized source of granular data. And, economist have been sophisticated data users for a long time. Big Data can be effectively used to analyze these large administrative datasets collected from various sources like healthcare, finance (tax), insurance and census data.The patterns and findings discovered from this analysis can be used to form or alter economic policies to improve government operations.

John Wennberg and colleagues at Dartmouth analyzed large samples of Medicare claims to discover that medicare spending per enrollee does not depend on health status or prices and is not correlated with measured health outcomes. This research was pivotal in Affordable Care Act in 2009, and has became a leading evidence for inefficiency in the US healthcare system. In a similar way big data and predictive modeling can be used to improve targeting of government services. Imagine a Medicare system where every individual has a health-care score based on his likely response to a treatment and a mediclaim policy covering  the treatment only if this score exceeds a particular threshold.

These large scale administrative data sets have the power to allow better measurements of economic effects and outcomes. Chetty, Friedman, and Rockoff did an interesting case study in 2011 on the long-term effects of better teachers. They analyzed records of 2.5 million New York City schoolchildren and their earnings 20 years later. The aim of the study was to check if a teacher’s “value added” lessons had a lifelong impact on the earnings of their students. The “value added” teachers were measured by the amount of improvement in test scores. The study gave very striking and interesting results that - replacing a teacher in the bottom 5% with an average teacher raises the lifetime earnings of students by a quarter of a million dollars ($ 250,000) in present value terms. This is just one of the many real world case studies mentioned in “The Data Revolution and Economic Analysis” by Liran Einav and Jonathan Levin. Imagine the endless possibilities and changes in economic policies if more government data is made available to researchers. Considering these examples one can imagine the not so distant futuristic governments being run by analysts by exploring OpenData.

Higher Education Analytics

There is a growing interest in Big Data in the field of higher education. On the administrative side, colleges and universities are harnessing the power of Big Data and predictive analytics to improve student performance, increase institutional effectiveness, and to launch the online college experience. We have also recently seen a trend in using Big Data throughout numerous student engagement points of the college experience - from recruitment all the way to understanding alumni giving- see Schmarzo’s blog post on this topic.
There has also been a recent surge in the use of data science methodologies to launch, expand, and improve online education. Online education presents a change in postsecondary education’s core function of teaching by restructuring the access and delivery of courses. Online education is a radical innovation that disruptively departs from the existing practices and processes of traditional face-to-face higher education. And to this day, there are large debates about whether online education is as effective as traditional on-campus schooling.  While this debate is likely to persist for many years to come, the vast amount of user interaction data that online education offers has given universities and colleges the ability to improve the online learning experience in real-time. From measuring the timing of how long students take to complete an exam, to identifying topics that students are struggling with, etc.  Online education is producing a vast amount of data that was previously unavailable to those studying education-data that has the potential to better deliver curriculums, allow institutions to personalize a student’s educational experience, and improve student learning if harnessed. See Big Data and Online Education for more on the possibility of applying big data techniques to online educational models as a way to refine the learning experience of online students.  

Manufacturing

The manufacturing industry was oblivious to data for some time because it was a different field altogether where more focus was given on process optimization in terms of operations. But now, there is a paradigm shift happening in the manufacturing industry too. Data is now considered an asset. One such example is the chip manufacturing giant Intel. They use Big Data in chip validation. This involves a lot of testing of the chip design wherein hundreds of sensors collect data in a timely manner. This huge amount of structured and unstructured data is being used by intel to optimize the design process and the time-to-production/time-to market.




To delve a little into detail, there are no ideal rules defining when a chip should be launched in the market. If a chip is not tested it will have bugs and if it is excessively tested then it will be delayed in the market and the company might lose it's edge. By using the sensor data on the physical and logical state of a processor, it can be understood how the testing tools are doing. Bid Data analysis can help in debug process by using clustering defects and performing root cause analysis on the massive amount of historical sensor data. These insights can give a better idea on how to improve the design and the testing process of a chip.

Aviation

Aviation is one of those few domains which has actually being dealing with big data, even before the term 'Big Data' gained traction, thanks to the tons of data collected from sensors fitted on aircrafts and the sheer volume of flights and passengers. However, what's changed in the last decade or so is how the data is being used. From improving operational efficiency to attracting more customers - big data is used in a big way by aerospace companies and airlines. For example, Boeing uses the ecoDemonstrator Program to harness big data by testing ways to use data to save fuel and flying time. Southwest Airlines analyses passenger traffic to determine services to have on specific routes to attract more customers.

BigDataAviation.jpg


According to article, based on analysis of International Air Transport Association (IATA), the leading cost of flight delays are due to airline-controlled processes like maintenance. For every hour that an aircraft is grounded, the airline stands to lose an average of $10,000. With such large monetary repercussions riding, it is all the more important for airlines to prevent the need for unscheduled maintenance. The amount of data generated by the sensors on the aircraft is staggering. The super-sized Airbus 380-1000, for example, is fitted with 10,000 sensors on each of its wings. A Boeing 737 generates 20 terabytes of engine information every hour. By collecting in-flight aircraft information and relaying it to maintenance personnel on the ground, the maintenance crews can be ready with the parts and information to quickly make any necessary repairs when the plan arrives at the gate. The data from the sensors also helps identify recurring faults and trends and proactively plan for future maintenance. Pilots too can use insights provided by satellites, weather sensors and ground data to make real-time decisions to save fuel and improve safety.

Aviation companies have always wanted to make better real-time decisions based on insights from information collected. The recent progress in big data technologies and tools has empowered them to better process the information collected and make smarter fact-driven decisions.

References

  1. The Data Revolution and Economic Analysis, Liran Einav, Stanford University and NBER, Jonathan Levin, Stanford University and NBER

No comments:

Post a Comment