use length function in substring in spark

+2 votes
I'm using spark 2.1.

Using a length function inside a substring for a Dataframe is giving me an error (mismatch).

val SSDF = testDF.withColumn("newcol", substring($"col", 1, length($"col")-1))

May 3, 2018 in Apache Spark by Data_Nerd
• 2,340 points
6,669 views

4 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
+1 vote

You can use the function expr

val data = List("..", "...", "...")
val df = sparkContext.parallelize(data).toDF("value")
val result = df.withColumn("cutted", expr("substring(value, 1, length(value)-1)"))
result.show(false)


This might help

answered May 3, 2018 by kurt_cobain
• 9,260 points
can you provide some working examples????
0 votes

You can try this

val substrDF =testDF.withColumn("newcol", $"col".substr(lit(1), length($"col")-1))

answered Dec 10, 2018 by Devatha
0 votes

You have passed the wrong parameters. Here is the right syntax:

substring(str: Column, pos: Int, len: Int): Column 
answered Dec 10, 2018 by Saloni
0 votes

You can also use UDF

testDF.withColumn("newcol", regexp_replace($"name", ".$" , "") ).show

answered Dec 10, 2018 by Foane

Related Questions In Apache Spark

0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 9,020 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
47 views
0 votes
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3 in Apache Spark by Omkar
• 65,850 points
70 views
0 votes
1 answer

Sliding function in spark

The sliding function is used when you ...READ MORE

answered Jan 29 in Apache Spark by Omkar
• 65,850 points
84 views
0 votes
0 answers
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3 in Apache Spark by Omkar
• 65,850 points
27 views
0 votes
2 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 180 points
796 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
3,124 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
63 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
736 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.