Skip to main content

Data is the New oil of Industry?

Let's go back to 18th century ,when development was taking its first footstep.The time when oil was considered to be the subset of industrial revolution. Oil than tends to be the most valuable asset in those time. Now let's come back in present. In 21st century, data is vigorously called the foundation of information revolution. But the question that arises is why are we really calling data as the new oil. Well for it's explanation


Now we are going to compare Data Vs Oil

  1. Data is an essential resource that powers the information economy in much the way that oil has fueled the industrial economy.
  2. Once upon a time, the wealthiest were those with most natural resources, now it’s knowledge economy, where the more you know is proportional to more data that you have.
  3. Information can be extracted from data just as energy can be extracted from oil.
  4. Traditional Oil powered the transportation era, in the same way that Data as the new oil is also powering the emerging transportation options like driverless cars and hyperloop (1200km/hr) which are based on advanced synthesis of data inform of algorithms and cognitive knowledge without use of fossil fuel.
  5. Traditional oil is finite, Data availability seems infinite.
  6. Data flows like oil but we must “drill down” into data to extract value from it. Data promises a plethora of new uses — diagnosis of diseases, direction of traffic patterns, etc. — just as oil has produced useful plastics, petrochemicals, lubricants, gasoline, and home heating.
  7. Oil is a scarce resource. Data isn’t just abundant, it is a cumulative resource.
  8. If Oil is being used, then the same oil cannot be used somewhere else because it’s a rival good. This results in a natural tension about who controls oil. If Data is being used, the same Data can be used elsewhere because it’s a non-rival good.
  9. As a tangible product, Oil faces high friction, transportation and storage costs. As an intangible product, Data has much lower friction, transportation and storage costs.
  10. The life cycle of Oil is defined by process: extraction, refining, distribution. The life cycle of Data is defined by relationships: with other data, with context and with itself via feedback loops.
Data is valuable, and can be ‘mined’ and refined, like oil. But there are many differences where the analogy breaks down:
  • Oil is a finite resource that that we are drawing down on. Data is growing at an exponential rate.
  • Oil is consumed when it is used. Data is not. We can make copies of data.
  • Oil is stored physically and is not easily replicable. Data is stored digitally and is readily replicated.
  • Oil is a commodity. Data is highly context dependent.
  • There are lots of other analogies for data as well. For example:
  1. Data is like currency (a medium for exchange, when we exchange our data for ‘free’ services)
  2. Data is like water (abundant and essential for our survival, but requiring cleaning)
  3. Data is a weapon (dormant, but with the potential to cause harm)
  4. However, for all of these, they only show some aspects of data while editing out the others. Ultimately, all analogies break down and it may be futile looking for a single phrase to capture the multi-faceted nature of data.
As Per me this is a Subjective , everyone has own explanation ðŸ˜‰

Happy Learning...!!

Comments

  1. Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing. Continue to share your knowledge through articles like these, and keep posting more blogs. visit below for

    Data Engineering Solutions 

    AI & ML Service

    Data Analytics Solutions

    Data Modernization Solutions

    ReplyDelete

Post a Comment

Popular posts from this blog

Deep Learning Interview Questions - Part 1

Q1. What do you mean by Deep Learning?  Deep Learning  is nothing but a paradigm of machine learning which has shown incredible promise in recent years. This is because of the fact that Deep Learning shows a great analogy with the functioning of the human brain. Q2. What is the difference between machine learning and deep learning? Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning can be categorised in the following three categories. Supervised machine learning, Unsupervised machine learning, Reinforcement learning Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Q3. What, in your opinion, is the reason for the popularity of Deep Learning in recent times? Now although Deep Learning has been around for many years, the major breakthroughs from these te...

R vs Python: Who is the Winner according to me...!!

As a data scientist, you probably want and need to learn Structured Query Language, or SQL. SQL is the de-facto language of relational databases, where most corporate information still resides. But that only gives you the ability to retrieve the data — not to clean it up or run models against it — and that’s where Python and R come in.R and Python both share similar features and are the most popular tools used by data scientists. Both are open-source and henceforth free yet Python is structured as a broadly useful programming language while R is created for statistical analysis. A little background on R R was created by Ross Ihaka and Robert Gentleman — two statisticians from the University of Auckland in New Zealand. It was initially released in 1995 and they launched a stable beta version in 2000. It’s an interpreted language (you don’t need to run it through a compiler before running the code) and has an extremely powerful suite of tools for statistical modeling and graphing...

How to deal with missing values in data cleaning

The data you inherit for analysis will come from multiple sources and would have been pulled adhoc. So this data will not be immediately ready for you to run any kind of model on. One of the most common issues you will have to deal with is missing values in the dataset. There are many reasons why values might be missing - intentional, user did not fill up, online forms broken, accidentally deleted, legacy issues etc.  Either way you will need to fix this problem. There are 3 ways to do this - either you will ignore the missing values, delete the missing value rows or fill the missing values with an approximation. Its easiest to just drop the missing observations but you need to very careful before you do that, because the absence of a value might actually be conveying some information about the data pattern. If you decide to drop missing values : df_no_missing = df.dropna() will drop any rows with any value missing. Even if some values are available in a row it will still get dropp...