Hi,
Apache Spark is an advanced data processing system that can access data from multiple data sources. It creates distributed datasets from the file system you use for data storage. The popular file systems used by Apache Spark include HBase, Cassandra, HDFS, and Amazon S3, etc.