W
Wasp3w ago
pipas

Best practices to verify files before uploading to S3?

I want users to upload pdf files and i want to verify the integrity of those files and their content before uploading but i don't know how, what are the best practices to do so?
10 Replies
kapa.ai
kapa.ai3w ago
Hi @pipas. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave: ---------------------- Based on the knowledge sources provided, there isn't specific information about verifying file integrity or content validation before uploading to S3 in the Wasp framework. The sources mainly discuss the general approach to file uploads in Wasp applications, focusing on the upload process itself rather than validation. From what I can gather, Wasp doesn't have built-in functionality specifically for file validation before S3 uploads. The knowledge sources show that file uploads in Wasp typically involve: 1. Getting a pre-signed URL from S3 2. Uploading directly from the client to S3 (bypassing the server to avoid unnecessary traffic) 3. Storing file references in your database For your specific need to verify PDF files before uploading, you would likely need to implement custom validation logic. This might involve: 1. Checking file type/extension on the client side 2. Potentially using a library to verify the PDF structure 3. Implementing size limits or other basic validations If you need more specific guidance on implementing PDF validation in a Wasp application, you might want to check the Wasp documentation or ask in the Wasp Discord community, as the provided knowledge sources don't contain detailed information about file validation best practices. I'd recommend checking out the Discord conversation about file uploads mentioned in the sources for more context on how others have implemented file uploads in Wasp. GitHub: Wasp Lang Is there anything specific about file validation in Wasp that you'd like me to help with further? ---------------------- Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: ---------------------- Sorry @pipas, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist: Perhaps try again in 15mins?
pipas
pipasOP3w ago
?
G35x
G35x3w ago
Hey @pipas , probably good to do some server-side validation depending on your security requirements, like checking the MIME type and throwing an error if 'application/pdf' is not met. I haven't done this with my own WASP project but you could modify the the s3utils file from the template to do something like this (GPT generated so this won't work as is) import * as fs from 'fs'; import * as fileType from 'file-type'; // other imports // existing s3utils code const allowedMimeTypes = ['application/pdf']; // Specify allowed MIME types // Function to check the MIME type of the uploaded file const checkMimeType = async (filePath: string): Promise<boolean> => { const buffer = fs.readFileSync(filePath); // Read file content into a buffer const type = await fileType.fromBuffer(buffer); // Detect MIME type based on content
if (!type) { console.log('File type could not be determined'); return false; // Unable to determine MIME type } return allowedMimeTypes.includes(type.mime); // Check if it's an allowed MIME type }; export const getUploadFileSignedURLFromS3 = async ({ fileName, fileType, userId }: S3Upload) => { const tempFilePath = /tmp/${fileName}; // Assuming the file is being uploaded to a temporary location const isValidMimeType = await checkMimeType(tempFilePath); if (!isValidMimeType) { throw new Error('Invalid MIME type'); } // rest of your s3utils code You could also add some additional integrity checks to read portions of the PDFs and verify it isn't malformed/is structured as a PDF to verify the content matches the MIME type since this could be spoofed. If you're worried about malicious uploads, you could even use VirusTotal API to programatically scan files before storing them in your S3 bucket, but there could be privacy considerations to account for in doing so.
Filip
Filip3w ago
Hey @pipas. @G35x gave a great answer (thanks!) Other than that, could you elaborate what you mean by "the integrity of the PDF files"?
pipas
pipasOP3w ago
Hey, thank you for your reply Checking if the file is really a pdf, checking if the file can be opened and text extracted properly. The thing is i'm worried about front end validation because it can be spoofed and i don't want someone to spam corrupt or incorrect files to s3 because of it's pay as you go nature. So i need to verify it server side before it even upload to s3 and i don't really know how
MEE6
MEE63w ago
Wohooo @pipas, you just became a Waspeteer level 2!
pipas
pipasOP3w ago
should the file be temporarly uploaded to the server before uploading it in s3? If so how can i restrict the file to just be read and to not be executed (in case of someone uploading a virus)
NEROX
NEROX2w ago
I use multer to upload files to the server to a temp folder (ensuring it is *.pdf extension) with an API (using import { tmpdir } from 'os';) // Configure multer for file uploads const upload = multer({ dest: tmpdir(), // Use system's temporary directory limits: { fileSize: 10 * 1024 * 1024 } // 10MB size limit }); export const uploadFileApi = async (req, res, context) => { // Check if user is authenticated if (!context.user) { return res.status(401).json({ message: 'Unauthorized' }); } // Set up multer middleware for single file upload const singleUpload = upload.single('file'); singleUpload(req, res, async function(err) { // Handle upload errors if (err) { return res.status(400).json({ message: File upload error: ${err.message} }); } // Check if file was provided if (!req.file) { return res.status(400).json({ message: 'No file was provided' }); } try { // File successfully uploaded to temporary location console.log('File uploaded successfully:', req.file.originalname); // Return success response return res.json({ success: true, filename: req.file.originalname, path: req.file.path }); } catch (error) { // Handle any other errors console.error('Error processing file:', error); return res.status(500).json({ message: 'Error processing uploaded file' }); } }); }; After that I read the Pdf content with pdf.js-extract , maybe you can check if the reading process is successfully finished before uploading to S3. // PDF extraction function that only checks if extraction is successful import fs from 'fs'; import { PDFExtract } from 'pdf.js-extract'; export const extractPdf = async (filePath) => { try { // Check if file exists if (!fs.existsSync(filePath)) { return { success: false, message: 'File not found' }; } // Read the PDF file const pdfBuffer = fs.readFileSync(filePath); // Initialize PDF extractor const pdfExtract = new PDFExtract(); // Attempt to extract content const pdfData = await pdfExtract.extractBuffer(pdfBuffer); // If we got here, extraction was successful return { success: true, message: 'PDF extraction successful (Safe to upload it to S3)', pageCount: pdfData.pages.length }; } catch (error) { // Handle specific errors if (error.name === 'TypeError') { return { success: false, message: 'Invalid PDF format', error: error.message }; } // Handle any other errors return { success: false, message: 'PDF extraction failed', error: error.message }; } };
Vinny (@Wasp)
Vinny (@Wasp)2w ago
No you don't have to. I think you'd want to validate the file type on the server first though. We just merged a PR that takes care of this. Maybe this helps you: https://github.com/wasp-lang/open-saas/blob/main/template/app/src/file-upload/fileUploading.ts#L41-L54
GitHub
open-saas/template/app/src/file-upload/fileUploading.ts at main · ...
A free, open-source SaaS app starter for React & Node.js with superpowers. Full-featured. Community-driven. - wasp-lang/open-saas
pipas
pipasOP2w ago
ok thank you!

Did you find this page helpful?