How can I reformat value counts analysis in Pandas for large number of columns

Question

I have a DF that has 100 rows and 1000 columns

[code]
In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')
[/code]

Usually I did value_counts() for every single column to check their individual distributions

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

The problem is, if I were to do the same for this particualr dataframe it would make my notebook extremely messy. How can I automate this using some function?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

df

id column1 column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7
and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1

score 0 · Answer 1 · Apr 17, 2018

If I were you, I'd do it this way

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64

or

you could produce your desired value_counts sequentially, convert to dataframes and write to csv:

import pandas as pd

with open('out.csv', 'w') as out:

    for col in df.columns[1:]:

        res = df[col].value_counts()\
                     .reset_index()\
                     .rename(columns={col: 'count', 'index': col})\

        res.to_csv(out, index=False)