How can I reformat value counts analysis in Pandas for large number of columns

0 votes
I have a DF that has 100 rows and 1000 columns

[code]
In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')
[/code]

Usually I did value_counts() for every single column to check their individual distributions

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

The problem is, if I were to do the same for this particualr dataframe it would make my notebook extremely messy. How can I automate this using some function?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

df

id  column1  column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7
and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1
Apr 17, 2018 in Python by aryya
• 7,460 points
6,717 views

1 answer to this question.

0 votes
If I were you, I'd do it this way

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64

or

you could produce your desired value_counts sequentially, convert to dataframes and write to csv:

import pandas as pd

with open('out.csv', 'w') as out:

    for col in df.columns[1:]:

        res = df[col].value_counts()\
                     .reset_index()\
                     .rename(columns={col: 'count', 'index': col})\

        res.to_csv(out, index=False)
answered Apr 17, 2018 by anonymous

Related Questions In Python

0 votes
1 answer

Reformat value_counts() analysis in Pandas for large number of columns

I'd do it this way: In [83]: df.drop('id',1).apply(lambda ...READ MORE

answered Sep 25, 2018 in Python by Priyaj
• 58,020 points
2,027 views
0 votes
1 answer

How can I find the square of a number in python?

You can use the exponentiation operator or ...READ MORE

answered May 21, 2019 in Python by Mohammad
• 3,230 points
1,267 views
+2 votes
4 answers

How can I replace values with 'none' in a dataframe using pandas

Actually in later versions of pandas this ...READ MORE

answered Aug 13, 2018 in Python by bug_seeker
• 15,510 points
122,483 views
+2 votes
2 answers

In a list of dictionaries, how can I find the minimum calue in a common dictionary field.

There are several options. Here is a ...READ MORE

answered Apr 10, 2018 in Python by charlie_brown
• 7,720 points
1,433 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,466 views
0 votes
1 answer
0 votes
1 answer

How can I convert a list of dictionaries from a CSV into a JSON object in Python?

You could try using the AST module. ...READ MORE

answered Apr 17, 2018 in Python by anonymous
3,482 views
0 votes
1 answer

How can I iterate through two lists in Parallel

You have to use the zip function ...READ MORE

answered Apr 17, 2018 in Python by anonymous
1,449 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP