A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.
While the data lake concept can be applied more broadly to include other types of systems, it most frequently involves storing data in the Hadoop Distributed File System (HDFS) across a set of clustered compute nodes based on commodity server hardware. The reliance on HDFS has, over time, been supplemented with data stores using object storage technology, but non-HDFS Hadoop ecosystem components typically are part of the enterprise data lake implementation.