Chapter 5 Business Analytics at the Data Warehouse Level During the last couple of years, a lot of changes have happened at the data warehouse level, and
we can expect many more changes in the future. One of the major changes was called by the
phrase Big Data. The reports that created this term came from McKinsey Global Institute in June 2011. The report also addressed the concern about the future lag of skilled analysts, but this we
will discuss in the next chapter. In this chapter we will only focus on the data warehousing
aspects of the Big Data term.
The Big Data phrase was coined to put focus on the fact that there is more data available for
organizations to store and commercially benefit from than ever before. Just think of the huge
amount of data provided by Facebook, Twitter, and Google. Often, this oversupply of data is
summed up in 3 Vs, standing for high volumes of data, high variability of data types, and high velocity in the data generation. More cynical minds may add that this has always been the case. It is just more clear for us, now that we know what we can use the data for, due to the
digitalization of the process landscape.
The huge amount of data may lead to problems. One concrete example of data problems
most companies are facing is multiple data systems, which leads to data?driven optimization
made per process and never across the full value chain. This means that large companies, which
are the ones that relatively invest the most in data, cannot realize their scale advantages based on
data. Additionally, many companies still suffer from low data quality, which makes the business
reluctant to trust the data provided by its data warehouse section. In addition, the business
typically does not realize that their data warehouse section only stores the data on behalf of the
business, and that the data quality issue hence is a problem that they must be solved by
themselves. The trend is, however, positive, and we see more and more cases where the
ownership of each individual column in a data warehouse is assigned to an individual named
responsible business unit, based on who will suffer the most if the data quality is low.
Another trend we see is symbolized by the arrival of a little yellow toy elephant called Hadoop.
This open?source file distribution system is free and allows organizations to store and process
huge amounts of raw data at a relatively low cost. Accessing the data stored via these file
distribution systems is, however, not easy, which means that there are still additional costs
associated with using the data for traditional BI reporting and operational systems. But at least
organizations can now join the era of Big Data and store social media information, Web logs,
reports, external data bases dumped locally, and the like, and analyze this data before investing
more into it.
Another newer area is
Recent Comments