What data format for large files in R?

0 votes

I produce a very large data file with Python, mostly consisting of 0 (false) and only a few 1 (true). It has about 700.000 columns and 15.000 rows and thus a size of 10.5GB. The first row is the header.
This file then needs to be read and visualized in R.

So what data format suits best for this purpose?
Would it also make sense to compress (zip) it?

Example of my file:

id,col1,col2,col3,col4,col5,...
1,0,0,0,1,0,...
2,1,0,0,0,1,...
3,0,1,0,0,1,...
4,...

Jul 16 in Python by ana1504.k
• 7,890 points
33 views

1 answer to this question.

0 votes
Zipping won't help you, as you'll have to unzip it to process it. If you could post your code that generates the file, that might help a lot. Also, what do yo want to accomplish in R? Might it be faster to visualize it in Python, avoiding the read/write of 10.5GB?

Perhaps rethinking your approach to how you're storing the data (eg: store the coordinates of the 1's if there are very few) might be a better angle here.

For instance, instead of storing a 700K by 15K table of all zeroes except for a 1 in line 600492 column 10786, I might just store the tuple (600492, 10786) and achieve the same visualization in R.
answered Jul 16 by SDeb
• 13,210 points

Related Questions In Python

0 votes
1 answer
+3 votes
5 answers

How to read multiple data files in python

Firstly we will import pandas to read ...READ MORE

answered Apr 6, 2018 in Python by DeepCoder786
• 1,720 points
3,043 views
0 votes
1 answer

Unique identification for data items in Python

Try the UUID module of Python. For example, ...READ MORE

answered Apr 17, 2018 in Python by Nietzsche's daemon
• 4,260 points
46 views
0 votes
1 answer

How can I reformat value_counts() analysis in Pandas for large number of columns?

If I were you, I'd do it ...READ MORE

answered Apr 17, 2018 in Python by anonymous
2,679 views
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6 in Python by Neha
• 330 points

edited Jul 8 by Kalgi 405 views
0 votes
1 answer

Print C format in Python

For printf- style formatting and special case ...READ MORE

answered Sep 18, 2018 in Python by SDeb
• 13,210 points
51 views
0 votes
1 answer

What are the ternary conditional operator in Python?

The Ternary Conditional operator was added in ...READ MORE

answered Sep 19, 2018 in Python by SDeb
• 13,210 points
61 views