Apache Spark - Nested JSON array to flatten columns

–1 vote

Hello,
I have a JSON which is nested and have Nested arrays. How could I use Apache Spark Python script to flatten it in a columnar manner so that I could use it via AWS Glue and use AWS Athena or AWS redshift to query the data?

Jan 2, 2019 in Big Data Hadoop by digger
• 26,740 points
5,290 views

1 answer to this question.

0 votes

It depends on the structure of your JSon file but here I have posted a code that you can refer:

import pandas as pd
from pandas.io.json import json_normalize
import json
with open('user.txt') as f:
json_data = json.load(f)


def flatten_json(y):
out = {}

def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out

flat = flatten_json(json_data)
dt=json_normalize(flat)

dt is your data frame object containing flattened json.

answered Jan 2, 2019 by Omkar
• 69,210 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 3, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
993 views
0 votes
1 answer

Is there a possibility to run Apache Spark without Hadoop?

Spark and Hadoop both are the open-source ...READ MORE

answered Jun 6, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
942 views
0 votes
1 answer

Is it possible to run Apache Spark without Apache Hadoop?

First of all, Let us get a ...READ MORE

answered Jun 17, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
771 views
0 votes
1 answer

How do I connect my Spark based HDInsight cluster to my blob storage?

Go through this blog: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage#access-blobs I went through this ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,922 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,555 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,184 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,190 views
0 votes
1 answer

Apache Spark gives "Failed to load native-hadoop with error"

Seems like hadoop path is missing in java.library.path. ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 69,210 points
2,440 views
0 votes
1 answer

How to read more than one files in Apache Spark?

Try this: val text = sc.wholeTextFiles("student/*") text.collect() ...READ MORE

answered Dec 11, 2018 in Big Data Hadoop by Omkar
• 69,210 points
2,350 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP