What data format for large files in R?

0 votes

I produce a very large data file with Python, mostly consisting of 0 (false) and only a few 1 (true). It has about 700.000 columns and 15.000 rows and thus a size of 10.5GB. The first row is the header.
This file then needs to be read and visualized in R.

So what data format suits best for this purpose?
Would it also make sense to compress (zip) it?

Example of my file:

id,col1,col2,col3,col4,col5,...
1,0,0,0,1,0,...
2,1,0,0,0,1,...
3,0,1,0,0,1,...
4,...

Jul 16 in Python by ana1504.k
• 7,870 points
16 views

1 answer to this question.

0 votes
Zipping won't help you, as you'll have to unzip it to process it. If you could post your code that generates the file, that might help a lot. Also, what do yo want to accomplish in R? Might it be faster to visualize it in Python, avoiding the read/write of 10.5GB?

Perhaps rethinking your approach to how you're storing the data (eg: store the coordinates of the 1's if there are very few) might be a better angle here.

For instance, instead of storing a 700K by 15K table of all zeroes except for a 1 in line 600492 column 10786, I might just store the tuple (600492, 10786) and achieve the same visualization in R.
answered Jul 16 by SDeb
• 13,160 points

Related Questions In Python

0 votes
1 answer
+3 votes
5 answers

How to read multiple data files in python

Firstly we will import pandas to read ...READ MORE

answered Apr 6, 2018 in Python by DeepCoder786
• 1,700 points
1,360 views
0 votes
1 answer

Unique identification for data items in Python

Try the UUID module of Python. For example, ...READ MORE

answered Apr 17, 2018 in Python by Nietzsche's daemon
• 4,260 points
31 views
0 votes
1 answer

How can I reformat value_counts() analysis in Pandas for large number of columns?

If I were you, I'd do it ...READ MORE

answered Apr 17, 2018 in Python by anonymous
1,899 views
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6 in Python by Neha
• 330 points

edited Jul 8 by Kalgi 169 views
0 votes
1 answer

Print C format in Python

For printf- style formatting and special case ...READ MORE

answered Sep 18, 2018 in Python by SDeb
• 13,160 points
32 views
0 votes
1 answer

What are the ternary conditional operator in Python?

The Ternary Conditional operator was added in ...READ MORE

answered Sep 19, 2018 in Python by SDeb
• 13,160 points
47 views