Apache Spark - Nested JSON array to flatten columns

0 votes

Hello,
I have a JSON which is nested and have Nested arrays. How could I use Apache Spark Python script to flatten it in a columnar manner so that I could use it via AWS Glue and use AWS Athena or AWS redshift to query the data?

Jan 2 in Big Data Hadoop by digger
• 27,620 points
594 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

It depends on the structure of your JSon file but here I have posted a code that you can refer:

import pandas as pd
from pandas.io.json import json_normalize
import json
with open('user.txt') as f:
json_data = json.load(f)


def flatten_json(y):
out = {}

def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out

flat = flatten_json(json_data)
dt=json_normalize(flat)

dt is your data frame object containing flattened json.

answered Jan 2 by Omkar
• 65,840 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 1,460 points
16 views
0 votes
1 answer

How do I connect my Spark based HDInsight cluster to my blob storage?

Go through this blog: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage#access-blobs I went through this ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 12,150 points
400 views
0 votes
1 answer

How to tune Spark jobs & optimize the performance?

You need to know the cluster properly ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by coldcode
• 1,980 points
303 views
0 votes
1 answer

How to transfer data from Netezza to HDFS using Apache Sqoop?

Remove the --direct option. It gives issue ...READ MORE

answered Apr 23, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
181 views
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,651 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
7,990 views
0 votes
1 answer

Apache Spark gives "Failed to load native-hadoop with error"

Seems like hadoop path is missing in java.library.path. ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 65,840 points
174 views
0 votes
1 answer

How to read more than one files in Apache Spark?

Try this: val text = sc.wholeTextFiles("student/*") text.collect() READ MORE

answered Dec 11, 2018 in Big Data Hadoop by Omkar
• 65,840 points
105 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.