Analytics, big data and enterprise data management that make the promise of Hadoop a reality

Data Lakes

Subscribe to Data Lakes: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Lakes: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

What Is the Difference Between a Data Lake and a Data Warehouse? By Dave Kellermanns The data warehouse and data lake are two different types of data storage repository. The data warehouse integrates data from different sources and suits business reporting. The data lake stores raw structured and unstructured data in whatever form the data source provides. It does not require prior knowledge of the analyses you think you want to perform. What is a Data Lake? A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. What is a Data Warehouse? A core component of business intelligence, the data warehouse is a central repository of integrated data from one or more disparate sources, and it's used ... (more)

Data Lake and Data Refinery | @ThingsExpo #BigData #IoT #M2M #API #InternetOfThings

Data Lake and Data Refinery – Gartner Controversy! Much discussion has been going on the new phrase called Data Lake. Gartner wrote a report on the ‘Data Lake’ fallacy, saying to be careful about ‘data lake’ or ‘data swamp’. Then Andrew Oliver wrote in the InfoWorld these beginning words, “For $200, Gartner tells you ‘data lakes’ are bad and advises you to try real hard, plan far in advance, and get governance correct”. Wow, what an insight! During my days at IBM and Oracle, Gartner wanted to get time on my calendar to talk about database futures. Then afterwards, I realized that I paid significant fee to attain the Gartner conference to hear back what I had told them. Good business of information gathering and selling back. Without meaning any disrespect, many analysts like to create controversial statements to stay relevant. Here is such a case with Gartner. The ... (more)

Data Lake Phenomenon | @ThingsExpo #IoT #M2M #BigData #Microservices

Data Lake Phenomenon Among Enterprises Over the past few years, there has been an explosion in the volume of data. To tackle this big data explosion, there has been a rise in the number of successful Hadoop projects in enterprises. Due to the large volumes of data, the emergence of Hadoop technology, and the need to store all soloed data in one place, has prompted a phenomenon among enterprises called: Data Lake. Is the Data Lake an effective catchment for all of the enterprise data? Yes and No. Data lakes are good to house the current, inter-related data but they don’t address the need for an enterprise-wide data management system Since the data lake holds raw data of different types the business user cannot have controlled access to risk-free, secure, governed and curated data with semantic consistency as in the case of an enterprise data warehouse Enterprise data t... (more)

The Data Lake Has Landed | @ThingsExpo #BigData #DevOps #IoT #M2M #API

The Data Lake Has Landed I'm in hi-tech marketing. I live in a sea of buzzwords, business jargon, and acronyms (most of which are actually abbreviations, but I've learned to let that one slide). They spread faster than a virus in a daycare center. I hear people on conference calls saying things like "Dave, let's double click on that thought and explore it further." Seriously? Do what? Do I sound like that? Or, I'll read marketing materials that say things such as "...Our full stack enterprise-grade cloud solution for business acceleration speeds time-to-value, increase margins, enhances performance, and reduces risk." Yeah...thanks for clarifying that - at first, I didn't know what you meant. The biggest, and possibly the most hated buzzwords to happen to IT in the last 10 years have been... Cloud, Internet of Things, and drumroll... Big Data (featuring its ugly ste... (more)

CIOs Must Beware Big Data Blindness By @ABridgwater | @BigDataExpo #BigData

It's not hard to find technology trade press commentary on the subject of Big Data. Variously defined (in non-technical terms) as the cluttered old shoebox of all data - and again (in more technical terms) as that amount of data that does not comfortably fit into a standard relational database for storage, processing and analytics within the normal constraints of processing, memory and data transport technologies - we can say that Big Data is an oft mentioned and sometimes misunderstood subject. Three key Big Data control factors Good advice for CIOs faced with this new planet of data types driven by everything from ecommerce to the Internet of Things (IoT) is to look for technologies that provide three key controlling factors and functions: Management Automation Enhancement Without management, automation and enhancement controls, Big Data starts to feel like blindin... (more)

Data Lake: Save Me More Money vs. Make Me More Money By @Schmarzo | @BigDataExpo #BigData

2016 will be the year of the data lake. But I expect that much of 2016 data lake efforts will be focused on activities and projects that save the company more money. That is okay from a foundation perspective, but IT and Business will both miss the bigger opportunity to leverage the data lake (and its associated analytics) to make the company more money. This blog examines an approach that allows organizations to quickly achieve some “save me more money” cost benefits from their data lake without losing sight of the bigger “make me more money” payoff – by coupling the data lake with data science to optimize key business processes, uncover new monetization opportunities and create a more compelling and differentiated customer experience. Let’s start by quickly reviewing the concept of a data lake. The Data Lake The data lake is a centralized repository for all the org... (more)

Don’t Jump in the Data Lake

Don’t Jump in the Data Lake 32. 47. 19. 7. 85. Congratulations! I just gave you five very important, valuable numbers. Or did I? If they were tomorrow's winning Powerball numbers, then certainly. But maybe they're monthly income numbers. Or sports scores. Or temperatures. Who knows? Such is the problem of context. Without the appropriate context, data are inherently worthless. Separate data from their metadata, and you've just killed the Golden Data Goose. If we scale up this example, we shine the light on the core challenge of data lakes. There are a few common definitions of data lake, but perhaps the most straightforward is a large object-based storage repository that holds data in its native format until it is needed or perhaps a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. True, there may be metadata... (more)

EA Communique: Data Lake Considerations for the Enterprise Architect

Data lakes are among the hottest topics in the enterprise big data world today. While data warehouses have provided value for many years, they require careful preparation and formatting of data before loading it into the warehouse. With data lakes, in contrast, people can load all types and structures of data first, in the hopes that someone will be able to get value by transforming and analyzing the data at some point in the future. Data lakes are becoming increasingly essential to today’s enterprise digital strategies, so EAs should understand their strengths and weaknesses, and how to facilitate their proper adoption across the organization. A useful and concise resource for understanding data lakes is the white paper How to Build an Enterprise Data Lake: Important Considerations Before Jumping In, by Mark Madsen of Third Nature. In this paper, Madsen first define... (more)

Live From Strata + Hadoop World: Dry Lakes, Salt Lakes, Data Lakes

Jeffrey Abbott Water, water everywhere and nothing to drink. Today I traveled from Boston to San Jose, CA. With stunningly clear weather and a window seat, I observed the transition from a frozen blanket of white covering the entire Northeast and Great Lakes, to the dry and rugged Rockies that are oddly snow-free, to the nearly empty reservoirs of California with their bleached sidewalls that reveal our failure to control our supply and demand for natural resources. The picture here is the Utah Wasatch range that’s home to Snowbird and Alta, which usually have among the most snow of any U.S. ski area (looks more like May than February right now). This year, you’ll find far more snow in New England. This trip brings me to the biggest gathering of big data practitioners of the year and although I see empty reservoirs, I see lots of data lakes. In fact, from looking... (more)

New Enterprise Data Lake Management Platform Delivers Enterprise-Grade Hadoop

Organizations can now efficiently and securely set up, deploy and manage a Hadoop-based data lake with the new version of Podium, the enterprise-ready management platform from Podium Data. The new version of Podium meets real-world requirements of enterprises planning to use a data lake to establish "Data as a Platform" within their organizations through three business-critical capabilities: advanced security, the data development lifecycle and accessibility. "At CIGNA, we are building an enterprise data lake as a next generation data management platform that will allow us to exponentially expand data delivery to the business," said Don Gray, Chief Data Officer CIGNA, "Nothing is more important to us than ensuring the security of data in the lake and we place a huge premium on those data lake technologies that are both committed and capable of delivering truly harde... (more)

Data Warehousing Lessons for A Data Lake World

Over the past 2 decades, we have spent considerable time and effort trying to perfect the world of data warehousing. We took the technology that we were given and the data that would fit into that technology, and tried to provide our business constituents with the reports and dashboards necessary to run the businesses. It was a lot of hard work and we had to do many “unnatural” acts to get these OLTP (Online Transaction Processing)-centric technologies to work; aggregated tables, plethora of indices, user defined functions (UDF) in PL/SQL, and materialized views just to name a few. Kudos to us!! Now as we get ready for the full onslaught of the data lake, what lessons can we take away from our data warehousing experiences? I don’t have all the insights, but I offer this blog in hopes that others will comment and contribute. In the end, we want to learn from our data... (more)