How to read .mp4 (video file) stored at HDFS using pyspark?

+1 vote
Is there any way to read the video file (mp4) using spark? I need to read the file and extract frame by frame data.
May 12 in Apache Spark by Amey
• 210 points
183 views

1 answer to this question.

0 votes

Hi@Amey,

You can enable WebHDFS to do this task. Follow the below given steps.

  1. Enable WebHDFS in HDFS configuration file. (hdfs-site.xml)
    Set dfs.webhdfs.enabled as true.

  2. Restart HDFS daemons.

  3. We can now access HDFS with the WebHDFS API.

Now you can browse your video by using curl command.

$ curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>

As we know video file is combination of arrays. So read your video file in a variable.

answered May 29 by MD
• 56,480 points

Related Questions In Apache Spark

+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 37,370 points
2,419 views
0 votes
0 answers

How to parse an S3 XML file to find tags using apache spark

How can one parse an S3 XML ...READ MORE

Mar 18 in Apache Spark by anonymous
• 120 points
184 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,320 points
2,904 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
3,896 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
39,802 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
6,101 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,950 points
3,966 views
0 votes
1 answer

How to increase the amount of data to be transferred to shuffle service at the same time?

The amount of data to be transferred ...READ MORE

answered Mar 1, 2019 in Apache Spark by Omkar
• 69,030 points
164 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

answered Apr 13 in Apache Spark by MD
• 56,480 points
360 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

answered May 7 in Apache Spark by MD
• 56,480 points
2,463 views