Read multiple xml files in Spark

Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?
Jul 25 in Apache Spark by Zeba

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df ="com.databricks.spark.xml")


.option("rowTag", "address") //the root node of your xml to be treated as row


load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")
answered Jul 25 by Jack

