Analytics, big data and enterprise data management that make the promise of Hadoop a reality

Data Lakes

Subscribe to Data Lakes: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Lakes: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

Data Lake and Data Refinery – Gartner Controversy! Much discussion has been going on the new phrase called Data Lake. Gartner wrote a report on the ‘Data Lake’ fallacy, saying to be careful about ‘data lake’ or ‘data swamp’. Then Andrew Oliver wrote in the InfoWorld these beginning words, “For $200, Gartner tells you ‘data lakes’ are bad and advises you to try real hard, plan far in advance, and get governance correct”. Wow, what an insight! During my days at IBM and Oracle, Gartner wanted to get time on my calendar to talk about database futures. Then afterwards, I realized that I paid significant fee to attain the Gartner conference to hear back what I had told them. Good business of information gathering and selling back. Without meaning any disrespect, many analysts like to create controversial statements to stay relevant. Here is such a case with Gartner. The ... (more)

The Data Lake Has Landed | @ThingsExpo #BigData #DevOps #IoT #M2M #API

The Data Lake Has Landed I'm in hi-tech marketing. I live in a sea of buzzwords, business jargon, and acronyms (most of which are actually abbreviations, but I've learned to let that one slide). They spread faster than a virus in a daycare center. I hear people on conference calls saying things like "Dave, let's double click on that thought and explore it further." Seriously? Do what? Do I sound like that? Or, I'll read marketing materials that say things such as "...Our full stack enterprise-grade cloud solution for business acceleration speeds time-to-value, increase margins, enhances performance, and reduces risk." Yeah...thanks for clarifying that - at first, I didn't know what you meant. The biggest, and possibly the most hated buzzwords to happen to IT in the last 10 years have been... Cloud, Internet of Things, and drumroll... Big Data (featuring its ugly ste... (more)

Data Lake Phenomenon | @ThingsExpo #IoT #M2M #BigData #Microservices

Data Lake Phenomenon Among Enterprises Over the past few years, there has been an explosion in the volume of data. To tackle this big data explosion, there has been a rise in the number of successful Hadoop projects in enterprises. Due to the large volumes of data, the emergence of Hadoop technology, and the need to store all soloed data in one place, has prompted a phenomenon among enterprises called: Data Lake. Is the Data Lake an effective catchment for all of the enterprise data? Yes and No. Data lakes are good to house the current, inter-related data but they don’t address the need for an enterprise-wide data management system Since the data lake holds raw data of different types the business user cannot have controlled access to risk-free, secure, governed and curated data with semantic consistency as in the case of an enterprise data warehouse Enterprise data t... (more)

Data Lake: Save Me More Money vs. Make Me More Money By @Schmarzo | @BigDataExpo #BigData

2016 will be the year of the data lake. But I expect that much of 2016 data lake efforts will be focused on activities and projects that save the company more money. That is okay from a foundation perspective, but IT and Business will both miss the bigger opportunity to leverage the data lake (and its associated analytics) to make the company more money. This blog examines an approach that allows organizations to quickly achieve some “save me more money” cost benefits from their data lake without losing sight of the bigger “make me more money” payoff – by coupling the data lake with data science to optimize key business processes, uncover new monetization opportunities and create a more compelling and differentiated customer experience. Let’s start by quickly reviewing the concept of a data lake. The Data Lake The data lake is a centralized repository for all the org... (more)

A Hybrid Data Pipeline | @CloudExpo @ProgressSW #BigData #AI #DataLake

Building a Hybrid Data Pipeline for Salesforce and Hadoop My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at Progress, will discuss how the key to delivering on this was using standard interfaces using a bi-directional data pipeline to connect the systems. On the Salesforce side, we were able to get frictionless access to the data lake using clicks-not-code via OData. On the Hadoop side,... (more)

Difference Between a Data Lake and a Data Warehouse | @BigDataExpo #BigData #DataLake #Storage

What Is the Difference Between a Data Lake and a Data Warehouse? By Dave Kellermanns The data warehouse and data lake are two different types of data storage repository. The data warehouse integrates data from different sources and suits business reporting. The data lake stores raw structured and unstructured data in whatever form the data source provides. It does not require prior knowledge of the analyses you think you want to perform. What is a Data Lake? A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. What is a Data Warehouse? A core component of business intelligence, the data warehouse is a central repository of integrated data from one or more disparate sources, and it's used ... (more)

Data Science Monetization: Focus on Innovation, Not Effectiveness | @BigDataExpo #BigData #Analytics

Data Science Monetization: Focus on Innovation, Not Effectiveness “I have over 1,200 Data Analysts, so we have it nailed.” When I heard this being uttered by the head of their “analytics” group, I knew the meeting was over. I knew that I could safely close my laptop, put away my notebook, and gracefully thank them for their time. It didn’t matter that others in the room didn’t agree with that assessment.  It didn’t matter that others could see the benefit of a “think differently” collaborative engagement with key business stakeholders in envisioning how to broaden the organization’s thinking with respect to the how to leverage data and analytics to power the business.  Nope, their analytics leader made the statement with such authority and confidence that any further conversation was just going to frustrate both him and me.  He already had all the answers, even to pr... (more)

CIOs Must Beware Big Data Blindness By @ABridgwater | @BigDataExpo #BigData

It's not hard to find technology trade press commentary on the subject of Big Data. Variously defined (in non-technical terms) as the cluttered old shoebox of all data - and again (in more technical terms) as that amount of data that does not comfortably fit into a standard relational database for storage, processing and analytics within the normal constraints of processing, memory and data transport technologies - we can say that Big Data is an oft mentioned and sometimes misunderstood subject. Three key Big Data control factors Good advice for CIOs faced with this new planet of data types driven by everything from ecommerce to the Internet of Things (IoT) is to look for technologies that provide three key controlling factors and functions: Management Automation Enhancement Without management, automation and enhancement controls, Big Data starts to feel like blindin... (more)

Don’t Jump in the Data Lake

Don’t Jump in the Data Lake 32. 47. 19. 7. 85. Congratulations! I just gave you five very important, valuable numbers. Or did I? If they were tomorrow's winning Powerball numbers, then certainly. But maybe they're monthly income numbers. Or sports scores. Or temperatures. Who knows? Such is the problem of context. Without the appropriate context, data are inherently worthless. Separate data from their metadata, and you've just killed the Golden Data Goose. If we scale up this example, we shine the light on the core challenge of data lakes. There are a few common definitions of data lake, but perhaps the most straightforward is a large object-based storage repository that holds data in its native format until it is needed or perhaps a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. True, there may be metadata... (more)

New Enterprise Data Lake Management Platform Delivers Enterprise-Grade Hadoop

Organizations can now efficiently and securely set up, deploy and manage a Hadoop-based data lake with the new version of Podium, the enterprise-ready management platform from Podium Data. The new version of Podium meets real-world requirements of enterprises planning to use a data lake to establish "Data as a Platform" within their organizations through three business-critical capabilities: advanced security, the data development lifecycle and accessibility. "At CIGNA, we are building an enterprise data lake as a next generation data management platform that will allow us to exponentially expand data delivery to the business," said Don Gray, Chief Data Officer CIGNA, "Nothing is more important to us than ensuring the security of data in the lake and we place a huge premium on those data lake technologies that are both committed and capable of delivering truly harde... (more)

EA Communique: Data Lake Considerations for the Enterprise Architect

Data lakes are among the hottest topics in the enterprise big data world today. While data warehouses have provided value for many years, they require careful preparation and formatting of data before loading it into the warehouse. With data lakes, in contrast, people can load all types and structures of data first, in the hopes that someone will be able to get value by transforming and analyzing the data at some point in the future. Data lakes are becoming increasingly essential to today’s enterprise digital strategies, so EAs should understand their strengths and weaknesses, and how to facilitate their proper adoption across the organization. A useful and concise resource for understanding data lakes is the white paper How to Build an Enterprise Data Lake: Important Considerations Before Jumping In, by Mark Madsen of Third Nature. In this paper, Madsen first define... (more)