Reformat value_counts() analysis in Pandas for large number of columns

0 votes

I have dataset that consist of hundreds of column, and thousands of row

In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')

Usually I did value_counts() for every single column to see the distribution.

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

But for this dataframe, if I did this for every columns, this will make my notebook very messy, how to automate this? Is there any function that help?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

Based on @MaxU suggestion, this is my version of simplified dataframe

df

id  column1  column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7

and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1

Sep 25, 2018 in Python by bug_seeker
• 15,350 points
403 views

1 answer to this question.

0 votes

I'd do it this way:

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or:

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64
answered Sep 25, 2018 by Priyaj
• 56,900 points

Related Questions In Python

0 votes
1 answer

How can I reformat value_counts() analysis in Pandas for large number of columns?

If I were you, I'd do it ...READ MORE

answered Apr 17, 2018 in Python by anonymous
2,312 views
0 votes
2 answers

How to calculate square root of a number in python?

calculate square root in python >>> import math ...READ MORE

answered Apr 2 in Python by anonymous
213 views
0 votes
1 answer

Number of days between dates in Python

You can use the date module to ...READ MORE

answered May 30, 2018 in Python by Nietzsche's daemon
• 4,260 points
38 views
0 votes
1 answer

Lazy loading of columns in sqlalchemy python

class Book(Base): __tablename__ = ...READ MORE

answered Nov 8, 2018 in Python by Nymeria
• 3,520 points
71 views
0 votes
1 answer

How to rename columns in pandas (Python)?

It is easy by just adding ".columns" ...READ MORE

answered Apr 30, 2018 in Data Analytics by DeepCoder786
• 1,720 points
111 views
0 votes
1 answer

What is the Difference in Size and Count in pandas (python)?

The major difference is size includes NaN ...READ MORE

answered Apr 30, 2018 in Data Analytics by DeepCoder786
• 1,720 points
737 views
0 votes
2 answers
0 votes
1 answer

Converting a pandas data-frame to a dictionary

Emp_dict=Employee.to_dict('records') You can directly use the 'to_dict()' function ...READ MORE

answered May 23, 2018 in Data Analytics by Bharani
• 4,550 points
1,477 views
+1 vote
1 answer

How to estimate number of clusters through EM in scikit-learn

For future reference, the fixed function looks ...READ MORE

answered Sep 26, 2018 in Python by Priyaj
• 56,900 points
54 views
0 votes
1 answer

How to Pivot pandas for removing of some headers and renaming of some indexes?

Solution is add parameter values to pivot, then add reset_index for column ...READ MORE

answered Sep 27, 2018 in Python by Priyaj
• 56,900 points
1,134 views