The Clearly Podcast

Data Lakes


Listen Later

Send us a text

This week we talk about data lakes.  Essentially, a data lake is a mechanism to store large quantities of (typically) raw data, both structured and unstructured, bringing together data from across an organisation.

In a "traditional" data warehouse solution,  we tend to think about an "Extract, Transform and Load " process, extracting the data from source, transforming it for analysis, and loading it into the data warehouse.  With a data lake, the approach tends to be "Extract, Load, and Transform", data is extracted from source, loaded into the data lake, then transformed when needed. 

This can simplify the process as there is no need to transform it for every scenario at build time - so we can speed up implementation.  The down side of course is that we have to do more work at run time.  As such, there is probably not an either/or situation with data lakes vs more structured systems.

The flexibility of data lakes makes it tempting to dump anything and everything into the data lake.  If this starts to happen without any curation, you are likely to end up in more of a data swamp.  Data lakes are not a way to avoid governance.

The main cloud players all offer some sort of data lake:
Azure Data Lake
AWS Data Lake
Google Data Lake

If you already use Power BI, or are considering it, we strongly recommend you join your local Power BI user group here.

To find out more about our services and the help we can offer, contact us at one of the websites below:
UK and Europe: https://www.clearlycloudy.co.uk/
North America: https://www.clearlysolutions.net/

...more
View all episodesView all episodes
Download on the App Store

The Clearly PodcastBy Clearly Podcasting

  • 5
  • 5
  • 5
  • 5
  • 5

5

1 ratings