How to read mp4 video file stored at HDFS using pyspark

+1 vote
Is there any way to read the video file (mp4) using spark? I need to read the file and extract frame by frame data.
May 12, 2020 in Apache Spark by Amey
• 210 points
869 views

1 answer to this question.

0 votes

Hi@Amey,

You can enable WebHDFS to do this task. Follow the below given steps.

  1. Enable WebHDFS in HDFS configuration file. (hdfs-site.xml)
    Set dfs.webhdfs.enabled as true.

  2. Restart HDFS daemons.

  3. We can now access HDFS with the WebHDFS API.

Now you can browse your video by using curl command.

$ curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>

As we know video file is combination of arrays. So read your video file in a variable.

Hope this helps!

To know more about Pyspark, it's recommended that you join Pyspark course online.

Thanks.

answered May 29, 2020 by MD
• 95,360 points

Related Questions In Apache Spark

+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,870 points
3,930 views
–1 vote
0 answers

How to parse an S3 XML file to find tags using apache spark

How can one parse an S3 XML ...READ MORE

Mar 18, 2020 in Apache Spark by anonymous
• 110 points
1,297 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,510 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
6,577 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,870 points
86,462 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
10,079 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 7,267 views
0 votes
1 answer

How to increase the amount of data to be transferred to shuffle service at the same time?

The amount of data to be transferred ...READ MORE

answered Mar 1, 2019 in Apache Spark by Omkar
• 69,210 points
382 views
0 votes
1 answer
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

answered Apr 13, 2020 in Apache Spark by MD
• 95,360 points
2,353 views
webinar REGISTER FOR FREE WEBINAR X
Send OTP
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP