Django upload and process multiple files failing with libreoffice

0 votes

I'm developing a Django programme that uses excel files. It only functions with xlsx files, however if you upload an xls or ods file, I'll convert it to an xlsx file beforehand so that it may be used with the processed file. Several files may be uploaded using my application's form. Each file that is uploaded is successfully saved into a database model with the field status set to "Created." A new Thread that processes files is started by a post-save model function to process those files in the background. Files are saved with a status of "Error" or "Processed" after processing. A second option to reprocess files has also been added.The issue arises when I attempt to upload several files that aren't xlsx; these files must be converted to xlsx before I can proceed with my own processing. I'm using a Python subprocess to run libreoffice —convert-to xlsx filename —headless for that purpose. When uploading one or two files simultaneously, this works perfectly. But, if I upload several files at once, some of them fail to process while others do, and there is no consistent pattern among the files. If I submit each testing file individually or even if I reprocess those files, everything functions as intended.

The error is given by libreoffice, because if I upload multiple files which are already xlsx files are being processed successfully too. When this happens, libreoffice returns 1 an no stdout nor stderr.

models.py

class Document(models.Model):
    docfile = models.FileField(upload_to='documents/%Y/%m/%d')
    date_creation = models.DateTimeField(auto_now_add=True)
    document_type = models.TextField(max_length=256)
    status = models.TextField(max_length=256, default="Created")
    bank_account = models.TextField(max_length=256, null=True)

    def filename(self):
        return os.path.basename(self.docfile.name)


@receiver(models.signals.post_save, sender=Document)
def process_file(sender, instance, **kwargs):
    t = threading.Thread(target=process_file_function,args=[sender,instance],daemon=True)
    t.start()

functions.py

def process_file_function(sender, instance, **kwargs):
    from accounting.models import Asiento, Apunte, FiltroBanco
    import pytz

    if instance.status == "Created" or instance.status == "Reprocess":
        filename = file = instance.docfile.name
        instance.status='Processing'
        instance.save(update_fields=['status'])

        print(f"Starting processing file: {file}")

        try:
            if filename.endswith('.ods') or filename.endswith('xls'):
                import os
                print(os.stat(filename))
                output = subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/"], capture_output=True)
                print(output)
                filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"

            wb = load_workbook(filename=filename, data_only=True)

            # Do my stuff

            instance.status='Processed'
            instance.save()
            print(f"Finished processing file: {file}")
        except Exception as e:
            instance.status='Error'
            instance.save()

Otuput example of a successful file:

Starting processing file: documents/2023/02/19/filename02.ods
os.stat_result(st_mode=33188, st_ino=901900, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=29771, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename02.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=0, stdout=b'convert /home/ajulian/Documents/code/python/facturasweb/documents/2023/02/19/filename02.ods -> /tmp/sage/filename02.xlsx using filter : Calc Office Open XML\n', stderr=b'')
Finished processing file: documents/2023/02/19/filename02.ods

Output example of a error file:

Starting processing file: documents/2023/02/19/filename01.ods
os.stat_result(st_mode=33188, st_ino=901899, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=21469, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename01.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=1, stdout=b'', stderr=b'')
ERROR: Error processing file: documents/2023/02/19/filename01.ods
-----
Traceback (most recent call last):
  File "/home/ajulian/Documents/code/python/facturasweb/accounting/functions.py", line 50, in process_file_function
    wb = load_workbook(filename=filename, data_only=True)
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "/usr/lib/python3.9/zipfile.py", line 1239, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sage/filename01.xlsx'

Take note of the variations in the output of the libreoffice subprocess. Filename01.ods was successfully converted in other executions, so don't hold it against it. Just multiple file uploads are affected; not all files.

What might the issue be? Why does this occur occasionally but not always? Why does LibreOffice merely return 1 and produce nothing else?

Feb 20, 2023 in Others by narikkadan
• 63,420 points
753 views

1 answer to this question.

0 votes

The problem happens when LibreOffice tries to open the same user configuration at the same time. Solved by creating a new userspace for each file: "-env:UserInstallation=file://{tmpfile}"

tmpfile = f"/tmp/sage/sessions/{filename.split('/')[-1].split('.')[0]}"
subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/", f"-env:UserInstallation=file://{tmpfile}"], capture_output=True)

answered Feb 20, 2023 by Kithuzzz
• 38,010 points

Related Questions In Others

–2 votes
0 answers
0 votes
1 answer

Reading multiple Excel files and merge them sheet-wise

This will analyse each Excel and combine ...READ MORE

answered Dec 15, 2022 in Others by narikkadan
• 63,420 points
548 views
0 votes
1 answer

Crawling through multiple excel files, match and copy data to master file

One application only. It would be quicker ...READ MORE

answered Jan 12, 2023 in Others by narikkadan
• 63,420 points
373 views
0 votes
1 answer

I want to compare two Excel files and highlight the differences with VBA

The workbook doesn't have the UsedRange property ...READ MORE

answered Jan 13, 2023 in Others by narikkadan
• 63,420 points
1,981 views
0 votes
1 answer
0 votes
1 answer

how to download and install Django rest framework?

To install Django, you can simply open ...READ MORE

answered Apr 24, 2018 in Python by Christine
• 15,790 points
1,605 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Export multiple worksheets without formula with the ability to select exact sheets and location

Try this: Sub ExportSheets() Dim ...READ MORE

answered Mar 24, 2023 in Others by Kithuzzz
• 38,010 points
324 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP