Thursday, May 31, 2012

Please do not take weights lightly in the Big Data world

Don't forget the weights to play in the Big Data jungle. I have not this topic of "weights" given enough weight by many experts and practitioners, as well as the Big Data companies. The topic seems to be either ignored or not mentioned as i comb through the literature and attend talks and panels on Big Data.

Weights could be used at various steps of data collection to storage, structuring or semi-structuring, data migration, consolidation and synchronization, Map-Reduce type processing, variety of meta data creation and most importantly during data analysis using different data science and business intelligence, and surely data visualization stages.

Weights are very important as all data are not born and processed equally. In a way it has been a "data democracy" paradigm in the Big Data world. I believe in people democracy, even though all parts of the world may not believe not practice 'one person one vote'. The Big Data practitioners should wake up and use the "weights" ASAP.

Some of you may argue that the concept of weights is basically subjective. I agree with you. However, the practice of weights is important for the following reasons:


  • Just by assigning different weights to different sources and types of data, we recognize that all data are not born or processed equally.
  • The different data have different types of origins, contexts, utility, value in the value chain of aggregation or the process of combining. 
  • The moment we start thinking of weights associated with the data, we will consciously attempt to understand it weighted value in the analysis process.
  • The weights do not have to be fixed. They could be calculated based a a variety of contributing factors.
  • The weights and the weight calculation formulas, the different levels of aggregation and combination formulas could be refined as the understand of these improve.
  • We could utilize various iterative, recursive, correlation, confidence levels and statistical distribution techniques. You may contact me for any detailed discussions
As soon as you get up in the morning, you should lift some weights to be a Big Data champion.

No comments:

Post a Comment