Validating content of Django file uploads

Published September 24, 2019 by Tommy George

There’s an internal tool at my work that accepts small CSV file uploads, that’s built with the Django web framework.

I recently wanted to do some extra validation on the content of those files, during the upload. Specifically so the app could return some more helpful and specific validation error messages, just like any other form field.

Usually, you don’t want to just read entire uploaded files into memory, to start “validating” their content. You can see warnings and workarounds for that in the Django docs, like this one in the description of UploadedFile.read():

Be careful with this method: if the uploaded file is huge it can overwhelm your system if you try to read it into memory. You’ll probably want to use chunks() instead;

For this particular app, that isn’t really a concern.

I didn’t see an obvious example of this, but it turns out you can wire it up just like any other form field validator.

In this case, there’s an extremely simple form with a single File field for uploading the CSV files. Something like this:

# forms.py

from validators import csv_content_validator

class CSVUploadForm(forms.Form):
    csv_file = forms.FileField(
        validators=[csv_content_validator]
    )

And here’s an example of the “validator” that’s imported there, that looks at the CSV file content itself to determine validity:

# validators.py

def csv_content_validator(csv_file):
    csv_file_content = csv_file.read()  # <-- Boom. CSV file content!
    # ...
    # Stuff happens here to verify all our "rows" and "columns"
    # contain exactly what we expect.
    # We add specific messages to a list variable called
    # `validation_error_messages` for anything that isn't correct.
    # This way our validation error message can pinpoint specific
    # rows/columns in the file, so the user knows exactly what
    # needs fixing.
    # ...
    if validation_error_messages:
        raise forms.ValidationError(" ".join(validation_error_messages))

As pointed out in the commented section there: As we read through the CSV content, we keep track of specific errors and deliver them all together in a single validation message.

Why build up a list of validation error messages?

Because if we raise an exception for the first one we find, the user will have to go through the cycle of making changes and then re-attempting the file upload before finding out there were more problems. In our case, it’s much more efficient to tell the end-user about all of the problems we can, on the first go around!

Internally, the app actually uses Python’s built in CSV reader to parse through the rows of a StringIO stream from the CSV content.

That’s probably not a good idea for most apps that take file uploads.

In our case it’s a controlled, internal app with just a handful of users. It’s not something that accepts uploads from the public internet.

When would you use this?

This can be used (even safely, with chunks) to validate:

A CSV/TSV file upload contains expected headers,
An image file upload contains a certain header,
A certain line count in a text document
… and so on.

Just don’t forget not to read large files into memory all at once!

Is there a better way to do this? Probably! Tell me about it!