What data format for large files in R

0 votes

I produce a very large data file with Python, mostly consisting of 0 (false) and only a few 1 (true). It has about 700.000 columns and 15.000 rows and thus a size of 10.5GB. The first row is the header.
This file then needs to be read and visualized in R.

So what data format suits best for this purpose?
Would it also make sense to compress (zip) it?

Example of my file:

id,col1,col2,col3,col4,col5,...
1,0,0,0,1,0,...
2,1,0,0,0,1,...
3,0,1,0,0,1,...
4,...

Jul 16, 2019 in Python by ana1504.k
• 7,910 points
566 views

1 answer to this question.

0 votes
Zipping won't help you, as you'll have to unzip it to process it. If you could post your code that generates the file, that might help a lot. Also, what do yo want to accomplish in R? Might it be faster to visualize it in Python, avoiding the read/write of 10.5GB?

Perhaps rethinking your approach to how you're storing the data (eg: store the coordinates of the 1's if there are very few) might be a better angle here.

For instance, instead of storing a 700K by 15K table of all zeroes except for a 1 in line 600492 column 10786, I might just store the tuple (600492, 10786) and achieve the same visualization in R.
answered Jul 16, 2019 by SDeb
• 13,300 points

Related Questions In Python

0 votes
1 answer

What are the naming conventions for variables and data types in python?

There are certain rules that we have ...READ MORE

answered May 21, 2019 in Python by Mohammad
• 3,230 points
1,842 views
+3 votes
5 answers

How to read multiple data files in python

Firstly we will import pandas to read ...READ MORE

answered Apr 6, 2018 in Python by DeepCoder786
• 1,720 points
14,805 views
0 votes
1 answer

Unique identification for data items in Python

Try the UUID module of Python. For example, ...READ MORE

answered Apr 17, 2018 in Python by Nietzsche's daemon
• 4,260 points
991 views
0 votes
1 answer

How can I reformat value_counts() analysis in Pandas for large number of columns?

If I were you, I'd do it ...READ MORE

answered Apr 17, 2018 in Python by anonymous
6,459 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,080 views
0 votes
1 answer
0 votes
1 answer

Print C format in Python

For printf- style formatting and special case ...READ MORE

answered Sep 18, 2018 in Python by SDeb
• 13,300 points
805 views
0 votes
1 answer

What are the ternary conditional operator in Python?

The Ternary Conditional operator was added in ...READ MORE

answered Sep 19, 2018 in Python by SDeb
• 13,300 points
569 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP