In a Spark DataFrame how can I flatten the struct?

0 votes
Can anyone help me in understanding that how can I flatten the struct in Spark Data frame?
May 24, 2018 in Apache Spark by code799
864 views

2 answers to this question.

0 votes
You can go ahead and use the flatMap method.
answered May 24, 2018 by Shubham
• 13,290 points
+1 vote
// Collect data from input avro file and create dataset
            Dataset<Row> inputRecordsCollection = spark.read().format("avro").load(inputFile);

            // Flatten data of nested schema file to remove nested fields
            inputRecordsCollection.createOrReplaceTempView("inputFileTable");
            

          //function call to flatten data  

           String fileSQL = flattenSchema(inputRecordsCollection.schema(), null);
            Dataset<Row> inputFlattRecords = spark.sql("SELECT " + fileSQL + " FROM inputFileTable");
            inputFlattRecords.show(10);

public static String flattenSchema(StructType schema, String prefix) {
        final StringBuilder selectSQLQuery = new StringBuilder();

        for (StructField field : schema.fields()) {
            final String fieldName = field.name();

            if (fieldName.startsWith("@")) {
                continue;
            }

            String colName = prefix == null ? fieldName : (prefix + "[0]." + fieldName);
            String colNameTarget = colName.replace("[0].", "_");

            DataType dtype = field.dataType();
            if (dtype.getClass().equals(ArrayType.class)) {
                dtype = ((ArrayType) dtype).elementType();

            }
            if (dtype.getClass().equals(StructType.class)) {
                selectSQLQuery.append(flattenSchema((StructType) dtype, colName));
            } else {
                selectSQLQuery.append(colName);
                selectSQLQuery.append(" as ");
                selectSQLQuery.append(colNameTarget);
            }

            selectSQLQuery.append(",");
        }

        if (selectSQLQuery.length() > 0) {
            selectSQLQuery.deleteCharAt(selectSQLQuery.length() - 1);
        }

        return selectSQLQuery.toString();

    }
answered Jul 4 by Dhara dhruve

Related Questions In Apache Spark

0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,210 views
0 votes
1 answer
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
1,114 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14 in Apache Spark by Ishan
44 views
0 votes
1 answer

How do I access the Map Task ID in Spark?

You can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Jul 23 in Apache Spark by ravikiran
• 4,560 points
37 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,240 points
56 views
0 votes
1 answer
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 180 points
1,290 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
5,445 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,328 views