Wasp•2mo ago

Best practices to verify files before uploading to S3?

I want users to upload pdf files and i want to verify the integrity of those files and their content before uploading but i don't know how, what are the best practices to do so?

10 Replies

kapa.ai•2mo ago

Hi @pipas. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave: ---------------------- Based on the knowledge sources provided, there isn't specific information about verifying file integrity or content validation before uploading to S3 in the Wasp framework. The sources mainly discuss the general approach to file uploads in Wasp applications, focusing on the upload process itself rather than validation. From what I can gather, Wasp doesn't have built-in functionality specifically for file validation before S3 uploads. The knowledge sources show that file uploads in Wasp typically involve: 1. Getting a pre-signed URL from S3 2. Uploading directly from the client to S3 (bypassing the server to avoid unnecessary traffic) 3. Storing file references in your database For your specific need to verify PDF files before uploading, you would likely need to implement custom validation logic. This might involve: 1. Checking file type/extension on the client side 2. Potentially using a library to verify the PDF structure 3. Implementing size limits or other basic validations If you need more specific guidance on implementing PDF validation in a Wasp application, you might want to check the Wasp documentation or ask in the Wasp Discord community, as the provided knowledge sources don't contain detailed information about file validation best practices. I'd recommend checking out the Discord conversation about file uploads mentioned in the sources for more context on how others have implemented file uploads in Wasp. GitHub: Wasp Lang Is there anything specific about file validation in Wasp that you'd like me to help with further? ---------------------- Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: ---------------------- Sorry @pipas, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist: Perhaps try again in 15mins?

pipasOP•2mo ago

G35x•2mo ago

Hey @pipas , probably good to do some server-side validation depending on your security requirements, like checking the MIME type and throwing an error if 'application/pdf' is not met. I haven't done this with my own WASP project but you could modify the the s3utils file from the template to do something like this (GPT generated so this won't work as is) import * as fs from 'fs'; import * as fileType from 'file-type'; // other imports // existing s3utils code const allowedMimeTypes = ['application/pdf']; // Specify allowed MIME types // Function to check the MIME type of the uploaded file const checkMimeType = async (filePath: string): Promise<boolean> => { const buffer = fs.readFileSync(filePath); // Read file content into a buffer const type = await fileType.fromBuffer(buffer); // Detect MIME type based on content
if (!type) { console.log('File type could not be determined'); return false; // Unable to determine MIME type } return allowedMimeTypes.includes(type.mime); // Check if it's an allowed MIME type }; export const getUploadFileSignedURLFromS3 = async ({ fileName, fileType, userId }: S3Upload) => { const tempFilePath = /tmp/${fileName}; // Assuming the file is being uploaded to a temporary location const isValidMimeType = await checkMimeType(tempFilePath); if (!isValidMimeType) { throw new Error('Invalid MIME type'); } // rest of your s3utils code You could also add some additional integrity checks to read portions of the PDFs and verify it isn't malformed/is structured as a PDF to verify the content matches the MIME type since this could be spoofed. If you're worried about malicious uploads, you could even use VirusTotal API to programatically scan files before storing them in your S3 bucket, but there could be privacy considerations to account for in doing so.

Filip•2mo ago

Hey @pipas. @G35x gave a great answer (thanks!) Other than that, could you elaborate what you mean by "the integrity of the PDF files"?

pipasOP•2mo ago

Hey, thank you for your reply Checking if the file is really a pdf, checking if the file can be opened and text extracted properly. The thing is i'm worried about front end validation because it can be spoofed and i don't want someone to spam corrupt or incorrect files to s3 because of it's pay as you go nature. So i need to verify it server side before it even upload to s3 and i don't really know how

MEE6•2mo ago

Wohooo @pipas, you just became a Waspeteer level 2!

pipasOP•2mo ago

should the file be temporarly uploaded to the server before uploading it in s3? If so how can i restrict the file to just be read and to not be executed (in case of someone uploading a virus)

NEROX•2mo ago

I use multer to upload files to the server to a temp folder (ensuring it is *.pdf extension) with an API (using import { tmpdir } from 'os';)

// Configure multer for file uploads
const upload = multer({
  dest: tmpdir(), // Use system's temporary directory
  limits: { fileSize: 10 * 1024 * 1024 } // 10MB size limit
});

export const uploadFileApi = async (req, res, context) => {
  // Check if user is authenticated
  if (!context.user) {
    return res.status(401).json({ message: 'Unauthorized' });
  }

  // Set up multer middleware for single file upload
  const singleUpload = upload.single('file');

  singleUpload(req, res, async function(err) {
    // Handle upload errors
    if (err) {
      return res.status(400).json({ message:

File upload error: ${err.message}

 });
    }

    // Check if file was provided
    if (!req.file) {
      return res.status(400).json({ message: 'No file was provided' });
    }

    try {
      // File successfully uploaded to temporary location
      console.log('File uploaded successfully:', req.file.originalname);
      
      // Return success response
      return res.json({ 
        success: true,
        filename: req.file.originalname,
        path: req.file.path
      });
    } catch (error) {
      // Handle any other errors
      console.error('Error processing file:', error);
      return res.status(500).json({ 
        message: 'Error processing uploaded file' 
      });
    }
  });
};

After that I read the Pdf content with pdf.js-extract , maybe you can check if the reading process is successfully finished before uploading to S3.

// PDF extraction function that only checks if extraction is successful
import fs from 'fs';
import { PDFExtract } from 'pdf.js-extract';

export const extractPdf = async (filePath) => {
  try {
    // Check if file exists
    if (!fs.existsSync(filePath)) {
      return {
        success: false,
        message: 'File not found'
      };
    }

    // Read the PDF file
    const pdfBuffer = fs.readFileSync(filePath);
    
    // Initialize PDF extractor
    const pdfExtract = new PDFExtract();
    
    // Attempt to extract content
    const pdfData = await pdfExtract.extractBuffer(pdfBuffer);
    
    // If we got here, extraction was successful
    return {
      success: true,
      message: 'PDF extraction successful (Safe to upload it to S3)',
      pageCount: pdfData.pages.length
    };
    
  } catch (error) {
    // Handle specific errors
    if (error.name === 'TypeError') {
      return {
        success: false,
        message: 'Invalid PDF format',
        error: error.message
      };
    }
    
    // Handle any other errors
    return {
      success: false,
      message: 'PDF extraction failed',
      error: error.message
    };
  }
};

Vinny (@Wasp)•2mo ago

No you don't have to. I think you'd want to validate the file type on the server first though. We just merged a PR that takes care of this. Maybe this helps you: https://github.com/wasp-lang/open-saas/blob/main/template/app/src/file-upload/fileUploading.ts#L41-L54

GitHub

open-saas/template/app/src/file-upload/fileUploading.ts at main · ...

A free, open-source SaaS app starter for React & Node.js with superpowers. Full-featured. Community-driven. - wasp-lang/open-saas

pipasOP•2mo ago

ok thank you!

Gaming

Programming

Best practices to verify files before uploading to S3?

Did you find this page helpful?