How to read mp4 video file stored at HDFS using pyspark

+1 vote
Is there any way to read the video file (mp4) using spark? I need to read the file and extract frame by frame data.
May 12, 2020 in Apache Spark by Amey
• 210 points
455 views

1 answer to this question.

0 votes

Hi@Amey,

You can enable WebHDFS to do this task. Follow the below given steps.

  1. Enable WebHDFS in HDFS configuration file. (hdfs-site.xml)
    Set dfs.webhdfs.enabled as true.

  2. Restart HDFS daemons.

  3. We can now access HDFS with the WebHDFS API.

Now you can browse your video by using curl command.

$ curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>

As we know video file is combination of arrays. So read your video file in a variable.

answered May 29, 2020 by MD
• 95,180 points

Related Questions In Apache Spark

+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,870 points
3,252 views
0 votes
0 answers

How to parse an S3 XML file to find tags using apache spark

How can one parse an S3 XML ...READ MORE

Mar 18, 2020 in Apache Spark by anonymous
• 120 points
673 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
3,954 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
5,162 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 13, 2020 in Apache Spark by Gitika
• 65,870 points
55,634 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
7,817 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,738 views
0 votes
1 answer

How to increase the amount of data to be transferred to shuffle service at the same time?

The amount of data to be transferred ...READ MORE

answered Mar 1, 2019 in Apache Spark by Omkar
• 69,130 points
254 views
0 votes
1 answer
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

answered Apr 13, 2020 in Apache Spark by MD
• 95,180 points
1,208 views