Best practices to verify files before uploading to S3?
I want users to upload pdf files and i want to verify the integrity of those files and their content before uploading but i don't know how, what are the best practices to do so?
10 Replies
Hi @pipas. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
----------------------
Based on the knowledge sources provided, there isn't specific information about verifying file integrity or content validation before uploading to S3 in the Wasp framework. The sources mainly discuss the general approach to file uploads in Wasp applications, focusing on the upload process itself rather than validation.
From what I can gather, Wasp doesn't have built-in functionality specifically for file validation before S3 uploads. The knowledge sources show that file uploads in Wasp typically involve:
1. Getting a pre-signed URL from S3
2. Uploading directly from the client to S3 (bypassing the server to avoid unnecessary traffic)
3. Storing file references in your database
For your specific need to verify PDF files before uploading, you would likely need to implement custom validation logic. This might involve:
1. Checking file type/extension on the client side
2. Potentially using a library to verify the PDF structure
3. Implementing size limits or other basic validations
If you need more specific guidance on implementing PDF validation in a Wasp application, you might want to check the Wasp documentation or ask in the Wasp Discord community, as the provided knowledge sources don't contain detailed information about file validation best practices.
I'd recommend checking out the Discord conversation about file uploads mentioned in the sources for more context on how others have implemented file uploads in Wasp.
GitHub: Wasp Lang
Is there anything specific about file validation in Wasp that you'd like me to help with further?
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Sorry @pipas, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist:
Perhaps try again in 15mins?
?
Hey @pipas , probably good to do some server-side validation depending on your security requirements, like checking the MIME type and throwing an error if 'application/pdf' is not met.
I haven't done this with my own WASP project but you could modify the the s3utils file from the template to do something like this (GPT generated so this won't work as is)
import * as fs from 'fs';
import * as fileType from 'file-type';
// other imports
// existing s3utils code
const allowedMimeTypes = ['application/pdf']; // Specify allowed MIME types
// Function to check the MIME type of the uploaded file
const checkMimeType = async (filePath: string): Promise<boolean> => {
const buffer = fs.readFileSync(filePath); // Read file content into a buffer
const type = await fileType.fromBuffer(buffer); // Detect MIME type based on content
if (!type) { console.log('File type could not be determined'); return false; // Unable to determine MIME type } return allowedMimeTypes.includes(type.mime); // Check if it's an allowed MIME type }; export const getUploadFileSignedURLFromS3 = async ({ fileName, fileType, userId }: S3Upload) => { const tempFilePath =
if (!type) { console.log('File type could not be determined'); return false; // Unable to determine MIME type } return allowedMimeTypes.includes(type.mime); // Check if it's an allowed MIME type }; export const getUploadFileSignedURLFromS3 = async ({ fileName, fileType, userId }: S3Upload) => { const tempFilePath =
/tmp/${fileName}
; // Assuming the file is being uploaded to a temporary location
const isValidMimeType = await checkMimeType(tempFilePath);
if (!isValidMimeType) {
throw new Error('Invalid MIME type');
}
// rest of your s3utils code
You could also add some additional integrity checks to read portions of the PDFs and verify it isn't malformed/is structured as a PDF to verify the content matches the MIME type since this could be spoofed.
If you're worried about malicious uploads, you could even use VirusTotal API to programatically scan files before storing them in your S3 bucket, but there could be privacy considerations to account for in doing so.Hey @pipas. @G35x gave a great answer (thanks!)
Other than that, could you elaborate what you mean by "the integrity of the PDF files"?
Hey, thank you for your reply
Checking if the file is really a pdf, checking if the file can be opened and text extracted properly. The thing is i'm worried about front end validation because it can be spoofed and i don't want someone to spam corrupt or incorrect files to s3 because of it's pay as you go nature. So i need to verify it server side before it even upload to s3 and i don't really know how
Wohooo @pipas, you just became a Waspeteer level 2!
should the file be temporarly uploaded to the server before uploading it in s3? If so how can i restrict the file to just be read and to not be executed (in case of someone uploading a virus)
I use
multer
to upload files to the server to a temp folder (ensuring it is *.pdf
extension) with an API (using import { tmpdir } from 'os';
)
// Configure multer for file uploads
const upload = multer({
dest: tmpdir(), // Use system's temporary directory
limits: { fileSize: 10 * 1024 * 1024 } // 10MB size limit
});
export const uploadFileApi = async (req, res, context) => {
// Check if user is authenticated
if (!context.user) {
return res.status(401).json({ message: 'Unauthorized' });
}
// Set up multer middleware for single file upload
const singleUpload = upload.single('file');
singleUpload(req, res, async function(err) {
// Handle upload errors
if (err) {
return res.status(400).json({ message:
File upload error: ${err.message} });
}
// Check if file was provided
if (!req.file) {
return res.status(400).json({ message: 'No file was provided' });
}
try {
// File successfully uploaded to temporary location
console.log('File uploaded successfully:', req.file.originalname);
// Return success response
return res.json({
success: true,
filename: req.file.originalname,
path: req.file.path
});
} catch (error) {
// Handle any other errors
console.error('Error processing file:', error);
return res.status(500).json({
message: 'Error processing uploaded file'
});
}
});
};
After that I read the Pdf content with pdf.js-extract
, maybe you can check if the reading process is successfully finished before uploading to S3.
// PDF extraction function that only checks if extraction is successful
import fs from 'fs';
import { PDFExtract } from 'pdf.js-extract';
export const extractPdf = async (filePath) => {
try {
// Check if file exists
if (!fs.existsSync(filePath)) {
return {
success: false,
message: 'File not found'
};
}
// Read the PDF file
const pdfBuffer = fs.readFileSync(filePath);
// Initialize PDF extractor
const pdfExtract = new PDFExtract();
// Attempt to extract content
const pdfData = await pdfExtract.extractBuffer(pdfBuffer);
// If we got here, extraction was successful
return {
success: true,
message: 'PDF extraction successful (Safe to upload it to S3)',
pageCount: pdfData.pages.length
};
} catch (error) {
// Handle specific errors
if (error.name === 'TypeError') {
return {
success: false,
message: 'Invalid PDF format',
error: error.message
};
}
// Handle any other errors
return {
success: false,
message: 'PDF extraction failed',
error: error.message
};
}
};
No you don't have to. I think you'd want to validate the file type on the server first though.
We just merged a PR that takes care of this. Maybe this helps you: https://github.com/wasp-lang/open-saas/blob/main/template/app/src/file-upload/fileUploading.ts#L41-L54
GitHub
open-saas/template/app/src/file-upload/fileUploading.ts at main · ...
A free, open-source SaaS app starter for React & Node.js with superpowers. Full-featured. Community-driven. - wasp-lang/open-saas
ok thank you!