Machine Learning and Python Code

0 votes

So, I recently started with Machine Learning and coding in Python. I've been trying to figure out the partition method used in the Amazon fine food review data from kaggle and its code. What i also can't understand, is the purpose of the last 3 lines of code.

    %matplotlib inline
    import sqlite3
    import pandas as pd
    import numpy as np
    import nltk
    import string
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.feature_extraction.text import TfidfVectorizer

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.metrics import confusion_matrix
    from sklearn import metrics
    from sklearn.metrics import roc_curve, auc
    from nltk.stem.porter import PorterStemmer

    # using the SQLite Table to read data.
    con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite') 

    #filtering only positive and negative reviews i.e. 
    # not taking into consideration those reviews with Score=3
    filtered_data = pd.read_sql_query("""
    SELECT *
    FROM Reviews
    WHERE Score != 3
    """, con) 

    # Give reviews with Score>3 a positive rating, and reviews with a 
    score<3 a negative rating.
    def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

    #changing reviews with score less than 3 to be positive vice-versa
    actualScore = filtered_data['Score']
    positiveNegative = 
    filtered_data['Score'] = positiveNegative

Any help would be greatly appreciated. Thanks.

Dec 13, 2018 in Data Analytics by Upasana
• 7,680 points

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
You can create an array called actualScore using the column Score from filtered_data

actualScore = filtered_data['Score']

Then create an array positiveNegative coding negative for values less than 3 and positive for values greater than 3.

positiveNegative =

Then you can overwrite the old column score with the new coded values

filtered_data['Score'] = positiveNegative
answered Dec 13, 2018 by Shubham
• 12,270 points

Related Questions In Data Analytics

0 votes
2 answers

Why should anyone learn Python instead of R for machine learning?

Machine learning is the latest technology everyone ...READ MORE

answered Apr 13 in Data Analytics by SA
• 1,030 points
0 votes
3 answers

R vs MATLAB, which is better with respect to machine learning?

Hello, Both are a good programming language you ...READ MORE

answered Apr 12 in Data Analytics by SA
• 1,030 points
0 votes
1 answer

What is the Difference in Size and Count in pandas (python)?

The major difference is size includes NaN ...READ MORE

answered Apr 30, 2018 in Data Analytics by DeepCoder786
• 1,700 points
0 votes
1 answer

Which package is used to do data import in R and Python and How do you import SAS data?

We can do data import using multiple ...READ MORE

answered Aug 24, 2018 in Data Analytics by ANMOL
• 3,620 points
0 votes
1 answer

how can i count the items in a list?

suppose you have a list a = [0,1,2,3,4,5,6,7,8,9,10] now ...READ MORE

answered May 2 in Python by Mohammad
• 1,400 points
+1 vote
1 answer

Jupyter Notebook : superscripts and subscripts

You can use the markdown cell to do this. ...READ MORE

answered Dec 6, 2018 in Data Analytics by Shubham
• 12,270 points

edited Dec 12, 2018 by Shubham 369 views
0 votes
1 answer

R Code : Combine two lists with different structures

You seem to be having an empty ...READ MORE

answered Dec 13, 2018 in Data Analytics by Shubham
• 12,270 points

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.