I have a code that I want to use alter values between two columns in my dataset

Question

I have a data set which has "Speed" as one of the columns (features). The column contains both zeros and non-zero values. I want to randomly set 10% of the non-zero values to zeros. This will change the corresponding class label to be zeros. I mean any value that is set to zero, its corresponding class value will be zero. I have done this but it is give me errors below the error.

file_path = 'Processed_data/data1.csv'  
df = pd.read_csv(file_path)  
per_change = 0.1  
attr = 'Speed'  
target = 'Class'  
df_spd = df[df['Speed'] > 0.]  

num_rows_to_change = int(df.shape[0] * per_change)  
num_with_zero_initial = df[df[attr] == 0].shape[0]  
assert df_spd.shape[0] > num_rows_to_change, \  
'Number of rows with non-zero speed is less than 10% of the original dataset.'
df_update = df_spd.sample(num_rows_to_change)
df_update[attr] = 0.
df_update[target] = 0.
df.update(df_update)
update_list = df_update.index.tolist()
num_with_zero_final = df[df['Speed'] == 0].shape[0]
assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
'Number of rows needed to change not equal to number of rows changed.'
df.to_csv('changed.csv')

AssertionError
Traceback (most recent call last)
<ipython-input-11-f93535705bac> in <module>
1 assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
----> 2 'Number of rows needed to change not equal to number of rows changed.'
AssertionError: Number of rows needed to change not equal to number of rows changed.

Omkar · Answer 1 · Mar 17, 2019

Hi @elvin. I read your script and found that your approach is a little complex. I have written a simple script to do your job. Try this:

import pandas as pd
file_path = #path to your file
df = pd.read_csv(file_path)

change = df.query('Speed>0').sample(frac=.1).index
df.loc[change, 'Speed'] = 0
df.loc[change, 'Class'] = 0

df.to_csv('data1.csv', header=True, index=False)