How to read more than one files in Apache Spark?

0 votes

I have a directory in which there are multiple files

$ pwd
/home/user/student
$ ls
a.csv b.csv c.csv

I want to read all these files at once. 

I am trying to do it like this:

val text = sc.wholeTextFiles("student")
text.collect()

But I am getting this error:

java.lang.ArrayIndexOutOfBoundsException: 0
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:591)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:243)
    at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:267)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1779)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:885)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:884)

How to solve this?

Dec 11, 2018 in Big Data Hadoop by digger
• 27,620 points
168 views

1 answer to this question.

0 votes

Try this:

val text = sc.wholeTextFiles("student/*")
text.collect()
answered Dec 11, 2018 by Omkar
• 67,120 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How will the Fair Scheduler handle more than one Job?

In your case when job B gets ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
34 views
0 votes
3 answers

Spark Scala: How to list all folders in directory

val spark = SparkSession.builder().appName("Demo").getOrCreate() val path = new ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Mark
1,145 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 67,120 points
648 views
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
2,054 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
197 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,506 views
0 votes
1 answer

How to read HDFS and local files with the same code in Java?

You can try something like this: ​ ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 67,120 points
205 views
0 votes
1 answer

How to read multiple files in hdfs?

If you want to read files with ...READ MORE

answered Feb 7 in Big Data Hadoop by Omkar
• 67,120 points
206 views