Building a Hybrid Data Pipeline for Salesforce and Hadoop
My team embarked on building a data lake for our sales and marketing data to
better understand customer journeys. This required building a hybrid data
pipeline to connect our cloud CRM with the new Hadoop Data Lake. One
challenge is that IT was not in a position to provide support until we proved
value and marketing did not have the experience, so we embarked on the
journey ourselves within the product marketing team for our line of business
In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at
Progress, will discuss how the key to delivering on this was using standard
interfaces using a bi-directional data pipeline to connect the systems. On
the Salesforce side, we were able to get frictionless access to the data lake
using clicks-not-code via OData. On the Hadoop side,... (more)
Data Science Monetization: Focus on Innovation, Not Effectiveness
“I have over 1,200 Data Analysts, so we have it nailed.”
When I heard this being uttered by the head of their “analytics” group, I
knew the meeting was over. I knew that I could safely close my laptop, put
away my notebook, and gracefully thank them for their time.
It didn’t matter that others in the room didn’t agree with that
assessment. It didn’t matter that others could see the benefit of a
“think differently” collaborative engagement with key business
stakeholders in envisioning how to broaden the organization’s thinking with
respect to the how to leverage data and analytics to power the business.
Nope, their analytics leader made the statement with such authority and
confidence that any further conversation was just going to frustrate both him
and me. He already had all the answers, even to pr... (more)
Data Lake and Data Refinery – Gartner Controversy!
Much discussion has been going on the new phrase called Data Lake. Gartner
wrote a report on the ‘Data Lake’ fallacy, saying to be careful about
‘data lake’ or ‘data swamp’. Then Andrew Oliver wrote in the
InfoWorld these beginning words, “For $200, Gartner tells you ‘data
lakes’ are bad and advises you to try real hard, plan far in advance, and
get governance correct”. Wow, what an insight!
During my days at IBM and Oracle, Gartner wanted to get time on my calendar
to talk about database futures. Then afterwards, I realized that I paid
significant fee to attain the Gartner conference to hear back what I had told
them. Good business of information gathering and selling back. Without
meaning any disrespect, many analysts like to create controversial statements
to stay relevant. Here is such a case with Gartner.
The ... (more)
2016 will be the year of the data lake. But I expect that much of 2016 data
lake efforts will be focused on activities and projects that save the company
more money. That is okay from a foundation perspective, but IT and Business
will both miss the bigger opportunity to leverage the data lake (and its
associated analytics) to make the company more money.
This blog examines an approach that allows organizations to quickly achieve
some “save me more money” cost benefits from their data lake without
losing sight of the bigger “make me more money” payoff – by coupling
the data lake with data science to optimize key business processes, uncover
new monetization opportunities and create a more compelling and
differentiated customer experience.
Let’s start by quickly reviewing the concept of a data lake.
The Data Lake
The data lake is a centralized repository for all the org... (more)
The Data Lake Has Landed
I'm in hi-tech marketing. I live in a sea of buzzwords, business jargon, and
acronyms (most of which are actually abbreviations, but I've learned to let
that one slide). They spread faster than a virus in a daycare center. I hear
people on conference calls saying things like "Dave, let's double click on
that thought and explore it further." Seriously? Do what? Do I sound like
that? Or, I'll read marketing materials that say things such as "...Our full
stack enterprise-grade cloud solution for business acceleration speeds
time-to-value, increase margins, enhances performance, and reduces risk."
Yeah...thanks for clarifying that - at first, I didn't know what you meant.
The biggest, and possibly the most hated buzzwords to happen to IT in the
last 10 years have been... Cloud, Internet of Things, and drumroll... Big
Data (featuring its ugly ste... (more)
Data Lake Phenomenon Among Enterprises
Over the past few years, there has been an explosion in the volume of data.
To tackle this big data explosion, there has been a rise in the number of
successful Hadoop projects in enterprises. Due to the large volumes of data,
the emergence of Hadoop technology, and the need to store all soloed data in
one place, has prompted a phenomenon among enterprises called: Data Lake.
Is the Data Lake an effective catchment for all of the enterprise data?
Yes and No. Data lakes are good to house the current, inter-related data but
they don’t address the need for an enterprise-wide data management system
Since the data lake holds raw data of different types the business user
cannot have controlled access to risk-free, secure, governed and curated data
with semantic consistency as in the case of an enterprise data warehouse
Enterprise data t... (more)
What Is the Difference Between a Data Lake and a Data Warehouse?
By Dave Kellermanns
The data warehouse and data lake are two different types of data storage
repository. The data warehouse integrates data from different sources and
suits business reporting. The data lake stores raw structured and
unstructured data in whatever form the data source provides. It does not
require prior knowledge of the analyses you think you want to perform.
What is a Data Lake?
A data lake is a storage repository that holds a vast amount of raw data in
its native format until it is needed. While a hierarchical data warehouse
stores data in files or folders, a data lake uses a flat architecture to
What is a Data Warehouse?
A core component of business intelligence, the data warehouse is a central
repository of integrated data from one or more disparate sources, and it's
used ... (more)
The data lake is gaining lots of momentum across the different customers to
whom I talk. Every, and I mean every organization wants to learn why and
how to implement a data lake. But “because it is a cheaper way to
store/manage data” is not a good reason to adopt a data lake. The “Why
do I need a data lake?” answer is much more powerful than just having the
IT organization save some money.
The data lake is a powerful data architecture that leverages the economics of
big data (where it is 20x to 50x cheaper to store, manage and analyze data as
compared to traditional data warehouse technologies). And new big data
processing and analytics capabilities help organizations address business and
operational challenges that were difficult to address using conventional
Business Intelligence and data warehousing technologies.
The data lake has the potential ... (more)
It's not hard to find technology trade press commentary on the subject of Big
Variously defined (in non-technical terms) as the cluttered old shoebox of
all data - and again (in more technical terms) as that amount of data that
does not comfortably fit into a standard relational database for storage,
processing and analytics within the normal constraints of processing, memory
and data transport technologies - we can say that Big Data is an oft
mentioned and sometimes misunderstood subject.
Three key Big Data control factors
Good advice for CIOs faced with this new planet of data types driven by
everything from ecommerce to the Internet of Things (IoT) is to look for
technologies that provide three key controlling factors and functions:
Management Automation Enhancement
Without management, automation and enhancement controls, Big Data starts to
feel like blindin... (more)
Don’t Jump in the Data Lake
32. 47. 19. 7. 85.
Congratulations! I just gave you five very important, valuable numbers. Or
If they were tomorrow's winning Powerball numbers, then certainly. But maybe
they're monthly income numbers. Or sports scores. Or temperatures. Who knows?
Such is the problem of context. Without the appropriate context, data are
inherently worthless. Separate data from their metadata, and you've just
killed the Golden Data Goose.
If we scale up this example, we shine the light on the core challenge
of data lakes. There are a few common definitions of data lake, but
perhaps the most straightforward is a large object-based storage repository
that holds data in its native format until it is needed or perhaps a massive,
easily accessible, centralized repository of large volumes of structured and
True, there may be metadata... (more)
Data lakes are among the hottest topics in the enterprise big data world
today. While data warehouses have provided value for many years, they require
careful preparation and formatting of data before loading it into the
warehouse. With data lakes, in contrast, people can load all types and
structures of data first, in the hopes that someone will be able to get value
by transforming and analyzing the data at some point in the future.
Data lakes are becoming increasingly essential to today’s enterprise
digital strategies, so EAs should understand their strengths and weaknesses,
and how to facilitate their proper adoption across the organization.
A useful and concise resource for understanding data lakes is the white paper
How to Build an Enterprise Data Lake: Important Considerations Before Jumping
In, by Mark Madsen of Third Nature.
In this paper, Madsen first define... (more)