I have a code that I want to use alter values between two columns in my dataset

0 votes

I have a data set which has "Speed" as one of the columns (features). The column contains both zeros and non-zero values. I want to randomly set 10% of the non-zero values to zeros. This will change the corresponding class label to be zeros. I mean any value that is set to zero, its corresponding class value will be zero. I have done this but it is give me errors below the error.

file_path = 'Processed_data/data1.csv'  
df = pd.read_csv(file_path)  
per_change = 0.1  
attr = 'Speed'  
target = 'Class'  
df_spd = df[df['Speed'] > 0.]  

num_rows_to_change = int(df.shape[0] * per_change)  
num_with_zero_initial = df[df[attr] == 0].shape[0]  
assert df_spd.shape[0] > num_rows_to_change, \  
'Number of rows with non-zero speed is less than 10% of the original dataset.'
df_update = df_spd.sample(num_rows_to_change)
df_update[attr] = 0.
df_update[target] = 0.
df.update(df_update)
update_list = df_update.index.tolist()
num_with_zero_final = df[df['Speed'] == 0].shape[0]
assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
'Number of rows needed to change not equal to number of rows changed.'
df.to_csv('changed.csv') 

AssertionError
Traceback (most recent call last)
<ipython-input-11-f93535705bac> in <module>
1 assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
----> 2 'Number of rows needed to change not equal to number of rows changed.'
AssertionError: Number of rows needed to change not equal to number of rows changed.

Mar 17 in Python by elvin
• 130 points
17 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
+1 vote

Hi @elvin. I read your script and found that your approach is a little complex. I have written a simple script to do your job. Try this:

import pandas as pd
file_path = #path to your file
df = pd.read_csv(file_path)

change = df.query('Speed>0').sample(frac=.1).index
df.loc[change, 'Speed'] = 0
df.loc[change, 'Class'] = 0

df.to_csv('data1.csv', header=True, index=False)

Let me know if this does what you want.

answered Mar 17 by Omkar
• 65,850 points
Thanks a lot Omkar. The code perfectly did the job

Related Questions In Python

+7 votes
2 answers

I want to build a recommender system incorporating diversity and accuracy in the recommender engine.

I dont know what exactly you are ...READ MORE

answered Sep 24, 2018 in Python by slayer
• 29,040 points
89 views
0 votes
1 answer

How do I use urllib to see if a website is 404 or 200 in Python?

For Python 3, try doing this: import urllib.request, ...READ MORE

answered Nov 29, 2018 in Python by Nymeria
• 3,500 points

edited Dec 11, 2018 by Nymeria 132 views
0 votes
1 answer

I have a dictonary in python how to access the value field?

dic={"car":["limo","sedan"]} print (dic.keys())    <-----------------------Fetch the key "car" print (dic['car'][0])   <------------------------Fetch ...READ MORE

answered Dec 19, 2018 in Python by Shuvodip
59 views
0 votes
1 answer
0 votes
1 answer

how can i count the items in a list?

suppose you have a list a = [0,1,2,3,4,5,6,7,8,9,10] now ...READ MORE

answered May 2 in Python by Mohammad
• 1,400 points
24 views
0 votes
1 answer

How to calculate difference in timestamp columns?

First, write the data in a csv file. Then ...READ MORE

answered Jan 18 in Python by Omkar
• 65,850 points
22 views
0 votes
1 answer

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.