1 Big Data Analytics | Sem 8 | facebook mail instagram - mr.samrattayade | twitter linkedin youtube -samrattayade

Data Mining

Data mining is defined as a process used to extract usable data from a larger set of raw data.Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events.

Database System

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Algorithms

In mathematics and computer science, an algorithm is a finite sequence of well-defined, computer-implementable instructions, typically to solve a class of problems or to perform a computation.An algorithm is a step by step procedure to solve logical and mathematical problems.

INTRODUCTION TO BIG DATA ANALYTICS :

Introduction to Big Data :

Big data refers to the massive datasets that are collected from a variety of data sources for business needs to reveal new insights for optimized decision making. According to IBM sources, e-business and consumer life create 2.5 exabytes (1018 bytes) of data per day. It is predicted that 8 zettabytes (1021 bytes) of data will be produced by 2015 and 90% of these will be from the last 5 years. These data have to be stored for analysis to reveal hidden correlations and patterns which are termed as Big Data Analytics.Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on).

Big Data Characteristics

big data has five major characteristics: Volume, Velocity, Variety, Veracity and Value.
3V is volume – velocity – variety

1)Volume
Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis from various sources like social media platforms, business processes, machines, networks.

2) Variety
Variety of Big Data refers to structured, unstructured, and semistructured data that is gathered from multiple sources.
3)Velocity
Velocity essentially refers to the speed at which data is being created in real-time.

Types Of Big Data

1)Structured data is the data which conforms to a data model, has a well defined structure, follows a consistent order and can be easily accessed and used by a person or a computer program.
Structured data is usually stored in well-defined schemas such as Databases. It is generally tabular with columns and rows that clearly define its attributes.
SQL (Structured Query language) is often used to manage structured data stored in databases.
2)Unstructured data :
Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.
Unstructured data is the data which does not conform to a data model and has no easily identifiable structure such that it can not be used by a computer program easily. Unstructured data is not organised in a predefined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database.
3)Semistructured data
Semi-structured data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze. With some process, you can store them in the relational database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data.

Traditional Vs Big Data Business Approach :

Traditional data management and analytics store structured data in data marts and data warehouses. Traditional data management, too, was able to handle huge volume of transactions but up to an extent; for example, billions of credit-card transactions worldwide could be handled but not peta or zeta bytes of data and that too in variety of formats. Most important problem of the business organization is to understand the true customer experience. Business organizations have multiple customer inputs, including transactional systems, customer service (call centers), web sites, online chat services, retail stores and partner services. Customers may use one or many of these systems. What is necessary is to find the overall customer experience or to understand the combined effects of all these systems. 1.4.1 Traditional Data Warehouse Approach Hundreds of these systems are distributed throughout the organization and its partners. Each of these systems has its own silos of data and many of these silos may contain information about customer experience that is required for making business decisions. The traditional data warehouse system approach has extensive data definition with each of these systems and vast transfer of data from each other. Many of the data sources do not use the same definitions. Copying all the data from each of these systems to a centralized location and keeping it updated is not an easy task. Moreover, sampling the data will not serve the purpose of extracting required information. The objective of big data is to construct a customer experience view over a period of time from all the events that took place. The duration for implementing such a project will be at least 1 year with traditional systems. Big Data Approach The alternative to this problem is the big data approach. Many IT tools are available for big data projects. The storage requirements of big data are taken care of by the Hadoop cluster. Apache Spark is capable of stream processing (e.g., advertisement data). When used, these tools can dramatically reduce the time-to-value – in most cases from more than 2 years to less than 4 months. The benefit is that many speculative projects can be approved or abandoned based on the result. Organizations whose data workloads are constant and predictable are better served by the traditional database, whereas organizations challenged by increasing data demands will want to take advantage of the Hadoop’s scalable infrastructure. Scalability allows servers to be added on demand to accommodate the growing workloads. There are hybrid systems, which integrate Hadoop platforms with traditional (relational) databases, that are gaining popularity as the cost-effective systems for organizations to leverage the benefits of both the platforms.

Big Data Challenges

Some of the Big Data challenges are:
  1. Sharing and Accessing Data:
    • Perhaps the most frequent challenge in big data efforts is the inaccessibility of data sets from external sources.
    • Sharing data can cause substantial challenges.
    • It includes the need for inter and intra- institutional legal documents.
    • Accessing data from public repositories leads to multiple difficulties.
    • It is necessary for the data to be available in an accurate, complete and timely manner because if data in the companies information system is to be used to make accurate decisions in time then it becomes necessary for data to be available in this manner.
  2. Privacy and Security:
    • It is another most important challenge with Big Data. This challenge includes sensitive, conceptual, technical as well as legal significance.
    • Most of the organizations are unable to maintain regular checks due to large amounts of data generation. However, it should be necessary to perform security checks and observation in real time because it is most beneficial.
    • There is some information of a person which when combined with external large data may lead to some facts of a person which may be secretive and he might not want the owner to know this information about that person.
    • Some of the organizations collect information about the people in order to add value to their business. This is done by making insights into their lives that they’re unaware of.
  3. Analytical Challenges:
    • There are some huge analytical challenges in big data which arise some main challenges like how to deal with a problem if data volume gets too large?
    • Or how to find out the important data points?
    • Or how to use data to the best advantage?
    • These large amounts of data on which this type of analysis is to be done can be structured (organized data), semi-structured (Semi-organized data) or unstructured (unorganized data). There are two techniques through which decision making can be done:
      • Either incorporate massive data volumes in the analysis.
      • Or determine upfront which Big data is relevant.
  1. Technical challenges:
    • Quality of data:
      • When there is a collection of a large amount of data and storage of this data, it comes at a cost. Big companies, business leaders and IT leaders always want large data storage.
      • For better results and conclusions, Big data rather than having irrelevant data, focuses on quality data storage.
      • This further arise a question that how it can be ensured that data is relevant, how much data would be enough for decision making and whether the stored data is accurate or not.
    • Fault tolerance:
      • Fault tolerance is another technical challenge and fault tolerance computing is extremely hard, involving intricate algorithms.
      • Nowadays some of the new technologies like cloud computing and big data always intended that whenever the failure occurs the damage done should be within the acceptable threshold that is the whole task should not begin from the scratch.
    • Scalability:
      • Big data projects can grow and evolve rapidly. The scalability issue of Big Data has lead towards cloud computing.
      • It leads to various challenges like how to run and execute various jobs so that goal of each workload can be achieved cost-effectively.
      • It also requires dealing with the system failures in an efficient manner. This leads to a big question again that what kinds of storage devices are to be used.



Big data real time examples

1)Shopping Sites
2)Food Delivery Sites
3)Entertainment


Big Data Applications

1) Healthcare
Big Data has already started to create a huge difference in the healthcare sector. With the help of predictive analytics, medical professionals and HCPs are now able to provide personalized healthcare services to individual patients. Apart from that, fitness wearables, telemedicine, remote monitoring – all powered by Big Data and AI – are helping change lives for the better.
2) Academia
Big Data is also helping enhance education today. Education is no more limited to the physical bounds of the classroom – there are numerous online educational courses to learn from. Academic institutions are investing in digital courses powered by Big Data technologies to aid the all-round development of budding learners.
3) Banking
The banking sector relies on Big Data for fraud detection. Big Data tools can efficiently detect fraudulent acts in real-time such as misuse of credit/debit cards, archival of inspection tracks, faulty alteration in customer stats, etc.
4) Manufacturing
According to TCS Global Trend Study, the most significant benefit of Big Data in manufacturing is improving the supply strategies and product quality. In the manufacturing sector, Big data helps create a transparent infrastructure, thereby, predicting uncertainties and incompetencies that can affect the business adversely.
5) IT
One of the largest users of Big Data, IT companies around the world are using Big Data to optimize their functioning, enhance employee productivity, and minimize risks in business operations. By combining Big Data technologies with ML and AI, the IT sector is continually powering innovation to find solutions even for the most complex of problems.


Facebook / Instagram/ Gmail : mr.samrattayade
Twitter/Linkedin/pinterest : samrattayade
Website / Blog : samrattayade.com : samrattayade.blogspot.com

Post a Comment

0 Comments