Read multiple xml files in Spark

0 votes
Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?
Jul 25 in Apache Spark by Zeba
23 views

1 answer to this question.

0 votes

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df = sqlContext.read.format("com.databricks.spark.xml")

.option("inferschema","true")

.option("rowTag", "address") //the root node of your xml to be treated as row

.load("/path/to/files/*.xml")

load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")
answered Jul 25 by Jack

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,111 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 23 in Apache Spark by Rana
31 views
0 votes
1 answer

Spark: Read from Hive, store in HDFS

Below is an example of reading data ...READ MORE

answered Jul 26 in Apache Spark by Lohit
16 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,300 points
30 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,303 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
234 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
11,891 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
25,460 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
995 views