Read multiple xml files in Spark

0 votes
Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?
Jul 25 in Apache Spark by Zeba
203 views

1 answer to this question.

0 votes

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df = sqlContext.read.format("com.databricks.spark.xml")

.option("inferschema","true")

.option("rowTag", "address") //the root node of your xml to be treated as row

.load("/path/to/files/*.xml")

load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")
answered Jul 25 by Jack

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,345 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 23 in Apache Spark by Rana
44 views
0 votes
1 answer

Spark: Read from Hive, store in HDFS

Below is an example of reading data ...READ MORE

answered Jul 26 in Apache Spark by Lohit
38 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,340 points
172 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,097 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
349 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
15,204 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
31,553 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,300 points
1,410 views