Read multiple xml files in Spark

0 votes
Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?
Jul 25 in Apache Spark by Zeba
471 views

1 answer to this question.

0 votes

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df = sqlContext.read.format("com.databricks.spark.xml")

.option("inferschema","true")

.option("rowTag", "address") //the root node of your xml to be treated as row

.load("/path/to/files/*.xml")

load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")
answered Jul 25 by Jack

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,280 points
1,478 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 23 in Apache Spark by Rana
58 views
0 votes
1 answer

Spark: Read from Hive, store in HDFS

Below is an example of reading data ...READ MORE

answered Jul 26 in Apache Spark by Lohit
69 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,420 points
473 views
+1 vote
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,515 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
431 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
17,847 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
35,505 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,350 points
1,704 views