After reading Everybody Lies, Big Data, New Data, and What The Internet Can Tell Us About Who We Really Are by Seth Stephens – Davidowitz. It opens up a chapter in Data Science, which we tend to overlook over time. It’s simply the complex subject of big data to an easy to understand way or everyone rather than to subject matter expert. It illustrates how things change over time; one variable is no longer true or applicable data analysis or better decision making.
Data – facts and statistics collected together for reference or analysis (Oxford Dictionary). Data also can be used to forecast things like stock price, yield growth, investment, etc. However, data was not widely available in the past; there was no computer, internet, landline and what so ever. In the information age, there is an abundant data available to access since more and more people spend time on line.
Companies that love data
Data is heavily used in tech companies like Facebook, Google, Twitter, Netflix, and Instagram to name a few. It is popular because of a huge amount clicks, likes, shares, searches, retweets, bookmarks, etc. Each activity has been recorded in the database as primary data for further analysis. With the help of data analysis, those companies can refine the right ads, posts, and searches for specific users. The term “Big Data” begins to gain momentum, especially in the tech field.Companies that love data
What is Big Data?
According to Seth Stephens – Davidowitz, Big data has four characteristics: a new type of data, honest data, ability to zoom in, and causal experiments. What is Big Data?
New Type of Data – data is collected through surveys and data entry, which is usually slow and inaccurate. However, the internet and technology have made new data available. For example, the ability to collect likes and unlikes of someone else posts. How many searches are done on a particular topic? This data is constantly collected 24/7 for each move on the web or mobile. Other technologies enable us to scan the organ of the living to examine the difference from one animal to another.
Honest Data – with instant data available on the internet and its nature of interaction. It enables users to enter honest data about themselves. There is no one there. It’s just you and the internet. It takes away the insecurity of being judged by others.
Zoom in On Small Sublets of People – the ability of the data analysis has just gotten easier with just a few line of coding. Data scientist can zoom in the database base on race, gender, age, occupation and so on to narrow their study. It gives a huge advantage to compare and contrast to figure out if there is a correlation or pattern.
Causal Experiments – because of the large data set and availability, data scientist can do many causal experiments on the database to find a correlation. The author talked about race horse auction, the famous buyer don’t just look at one data (breed) but other that data like organ to find out if there is a correlation that distinctive from good horse to the best horse. Another example is the marketing campaign or more precisely A/B testing.