What Is the Difference Between a Data Lake and a Data Warehouse?
By Dave Kellermanns
The data warehouse and data lake are two different types of data storage
repository. The data warehouse integrates data from different sources and
suits business reporting. The data lake stores raw structured and
unstructured data in whatever form the data source provides. It does not
require prior knowledge of the analyses you think you want to perform.
What is a Data Lake?
A data lake is a storage repository that holds a vast amount of raw data in
its native format until it is needed. While a hierarchical data warehouse
stores data in files or folders, a data lake uses a flat architecture to
What is a Data Warehouse?
A core component of business intelligence, the data warehouse is a central
repository of integrated data from one or more disparate sources, and it's
used ... (more)
Data Lake and Data Refinery – Gartner Controversy!
Much discussion has been going on the new phrase called Data Lake. Gartner
wrote a report on the ‘Data Lake’ fallacy, saying to be careful about
‘data lake’ or ‘data swamp’. Then Andrew Oliver wrote in the
InfoWorld these beginning words, “For $200, Gartner tells you ‘data
lakes’ are bad and advises you to try real hard, plan far in advance, and
get governance correct”. Wow, what an insight!
During my days at IBM and Oracle, Gartner wanted to get time on my calendar
to talk about database futures. Then afterwards, I realized that I paid
significant fee to attain the Gartner conference to hear back what I had told
them. Good business of information gathering and selling back. Without
meaning any disrespect, many analysts like to create controversial statements
to stay relevant. Here is such a case with Gartner.
The ... (more)
Data Lake Phenomenon Among Enterprises
Over the past few years, there has been an explosion in the volume of data.
To tackle this big data explosion, there has been a rise in the number of
successful Hadoop projects in enterprises. Due to the large volumes of data,
the emergence of Hadoop technology, and the need to store all soloed data in
one place, has prompted a phenomenon among enterprises called: Data Lake.
Is the Data Lake an effective catchment for all of the enterprise data?
Yes and No. Data lakes are good to house the current, inter-related data but
they don’t address the need for an enterprise-wide data management system
Since the data lake holds raw data of different types the business user
cannot have controlled access to risk-free, secure, governed and curated data
with semantic consistency as in the case of an enterprise data warehouse
Enterprise data t... (more)
The Data Lake Has Landed
I'm in hi-tech marketing. I live in a sea of buzzwords, business jargon, and
acronyms (most of which are actually abbreviations, but I've learned to let
that one slide). They spread faster than a virus in a daycare center. I hear
people on conference calls saying things like "Dave, let's double click on
that thought and explore it further." Seriously? Do what? Do I sound like
that? Or, I'll read marketing materials that say things such as "...Our full
stack enterprise-grade cloud solution for business acceleration speeds
time-to-value, increase margins, enhances performance, and reduces risk."
Yeah...thanks for clarifying that - at first, I didn't know what you meant.
The biggest, and possibly the most hated buzzwords to happen to IT in the
last 10 years have been... Cloud, Internet of Things, and drumroll... Big
Data (featuring its ugly ste... (more)
It's not hard to find technology trade press commentary on the subject of Big
Variously defined (in non-technical terms) as the cluttered old shoebox of
all data - and again (in more technical terms) as that amount of data that
does not comfortably fit into a standard relational database for storage,
processing and analytics within the normal constraints of processing, memory
and data transport technologies - we can say that Big Data is an oft
mentioned and sometimes misunderstood subject.
Three key Big Data control factors
Good advice for CIOs faced with this new planet of data types driven by
everything from ecommerce to the Internet of Things (IoT) is to look for
technologies that provide three key controlling factors and functions:
Management Automation Enhancement
Without management, automation and enhancement controls, Big Data starts to
feel like blindin... (more)
2016 will be the year of the data lake. But I expect that much of 2016 data
lake efforts will be focused on activities and projects that save the company
more money. That is okay from a foundation perspective, but IT and Business
will both miss the bigger opportunity to leverage the data lake (and its
associated analytics) to make the company more money.
This blog examines an approach that allows organizations to quickly achieve
some “save me more money” cost benefits from their data lake without
losing sight of the bigger “make me more money” payoff – by coupling
the data lake with data science to optimize key business processes, uncover
new monetization opportunities and create a more compelling and
differentiated customer experience.
Let’s start by quickly reviewing the concept of a data lake.
The Data Lake
The data lake is a centralized repository for all the org... (more)
Don’t Jump in the Data Lake
32. 47. 19. 7. 85.
Congratulations! I just gave you five very important, valuable numbers. Or
If they were tomorrow's winning Powerball numbers, then certainly. But maybe
they're monthly income numbers. Or sports scores. Or temperatures. Who knows?
Such is the problem of context. Without the appropriate context, data are
inherently worthless. Separate data from their metadata, and you've just
killed the Golden Data Goose.
If we scale up this example, we shine the light on the core challenge
of data lakes. There are a few common definitions of data lake, but
perhaps the most straightforward is a large object-based storage repository
that holds data in its native format until it is needed or perhaps a massive,
easily accessible, centralized repository of large volumes of structured and
True, there may be metadata... (more)
Data lakes are among the hottest topics in the enterprise big data world
today. While data warehouses have provided value for many years, they require
careful preparation and formatting of data before loading it into the
warehouse. With data lakes, in contrast, people can load all types and
structures of data first, in the hopes that someone will be able to get value
by transforming and analyzing the data at some point in the future.
Data lakes are becoming increasingly essential to today’s enterprise
digital strategies, so EAs should understand their strengths and weaknesses,
and how to facilitate their proper adoption across the organization.
A useful and concise resource for understanding data lakes is the white paper
How to Build an Enterprise Data Lake: Important Considerations Before Jumping
In, by Mark Madsen of Third Nature.
In this paper, Madsen first define... (more)
Water, water everywhere and nothing to drink. Today I traveled from Boston
to San Jose, CA. With stunningly clear weather and a window seat, I observed
the transition from a frozen blanket of white covering the entire Northeast
and Great Lakes, to the dry and rugged Rockies that are oddly snow-free, to
the nearly empty reservoirs of California with their bleached sidewalls that
reveal our failure to control our supply and demand for natural resources.
The picture here is the Utah Wasatch range that’s home to Snowbird and
Alta, which usually have among the most snow of any U.S. ski area (looks more
like May than February right now). This year, you’ll find far more snow in
New England. This trip brings me to the biggest gathering of big data
practitioners of the year and although I see empty reservoirs, I see lots of
In fact, from looking... (more)
Organizations can now efficiently and securely set up, deploy and manage a
Hadoop-based data lake with the new version of Podium, the enterprise-ready
management platform from Podium Data.
The new version of Podium meets real-world requirements of enterprises
planning to use a data lake to establish "Data as a Platform" within their
organizations through three business-critical capabilities: advanced
security, the data development lifecycle and accessibility.
"At CIGNA, we are building an enterprise data lake as a next generation data
management platform that will allow us to exponentially expand data delivery
to the business," said Don Gray, Chief Data Officer CIGNA, "Nothing is more
important to us than ensuring the security of data in the lake and we place a
huge premium on those data lake technologies that are both committed and
capable of delivering truly harde... (more)
Over the past 2 decades, we have spent considerable time and effort trying to
perfect the world of data warehousing. We took the technology that we were
given and the data that would fit into that technology, and tried to provide
our business constituents with the reports and dashboards necessary to run
It was a lot of hard work and we had to do many “unnatural” acts to get
these OLTP (Online Transaction Processing)-centric technologies to work;
aggregated tables, plethora of indices, user defined functions (UDF) in
PL/SQL, and materialized views just to name a few. Kudos to us!!
Now as we get ready for the full onslaught of the data lake, what lessons can
we take away from our data warehousing experiences? I don’t have all the
insights, but I offer this blog in hopes that others will comment and
contribute. In the end, we want to learn from our data... (more)