How can I reformat value_counts() analysis in Pandas for large number of columns?

0 votes
I have a DF that has 100 rows and 1000 columns

[code]
In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')
[/code]

Usually I did value_counts() for every single column to check their individual distributions

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

The problem is, if I were to do the same for this particualr dataframe it would make my notebook extremely messy. How can I automate this using some function?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

df

id  column1  column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7
and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1
Apr 17, 2018 in Python by ariaholic
• 7,320 points
1,755 views

1 answer to this question.

0 votes
If I were you, I'd do it this way

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64

or

you could produce your desired value_counts sequentially, convert to dataframes and write to csv:

import pandas as pd

with open('out.csv', 'w') as out:

    for col in df.columns[1:]:

        res = df[col].value_counts()\
                     .reset_index()\
                     .rename(columns={col: 'count', 'index': col})\

        res.to_csv(out, index=False)
answered Apr 17, 2018 by anonymous

Related Questions In Python

0 votes
1 answer

Reformat value_counts() analysis in Pandas for large number of columns

I'd do it this way: In [83]: df.drop('id',1).apply(lambda ...READ MORE

answered Sep 25, 2018 in Python by Priyaj
• 56,160 points
261 views
0 votes
1 answer

How can I find the square of a number in python?

You can use the exponentiation operator or ...READ MORE

answered May 20 in Python by Mohammad
• 1,500 points
20 views
+2 votes
2 answers

How can I replace values with 'none' in a dataframe using pandas

Actually in later versions of pandas this ...READ MORE

answered Aug 13, 2018 in Python by bug_seeker
• 15,310 points
3,788 views
+2 votes
2 answers

In a list of dictionaries, how can I find the minimum calue in a common dictionary field.

There are several options. Here is a ...READ MORE

answered Apr 10, 2018 in Python by charlie_brown
• 7,710 points
43 views
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6 in Python by Neha
• 330 points

edited Jul 8 by Kalgi 121 views
0 votes
1 answer

How can I convert a list of dictionaries from a CSV into a JSON object in Python?

You could try using the AST module. ...READ MORE

answered Apr 17, 2018 in Python by anonymous
513 views
0 votes
1 answer

How can I iterate through two lists in Parallel

You have to use the zip function ...READ MORE

answered Apr 17, 2018 in Python by anonymous
53 views