Read multiple xml files in Spark

0 votes
Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?
Jul 25, 2019 in Apache Spark by Zeba
5,278 views

1 answer to this question.

0 votes

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df = sqlContext.read.format("com.databricks.spark.xml")

.option("inferschema","true")

.option("rowTag", "address") //the root node of your xml to be treated as row

.load("/path/to/files/*.xml")

load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")
answered Jul 25, 2019 by Jack

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
7,263 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rana
1,225 views
0 votes
1 answer

Spark: Read from Hive, store in HDFS

Below is an example of reading data ...READ MORE

answered Jul 26, 2019 in Apache Spark by Lohit
2,463 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,910 points
4,682 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,556 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,184 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,201 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
71,210 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
7,905 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP