Import action freezes browser window when importing 86k rows of data. How to troubleshoot?
Hello I am trying to understand how filaments import action works so that I may troubleshoot it further.
Could someone point me in the direction of where I could understand the Filament import action's process of importing a CSV file? Like does it load the entire file into memory or does it read it in parts? I know there is the chunking functionality but if I am not mistaken that is only for the part where it is sending chunks of data to be processed in batches for import. I'm more curious about the loading and reading of the file before it is chunked for processing/import because an app I am working on seems to freeze in the browser for larger imports.
I am able to import smaller files that are about 1,000 rows long without issue. However when I attempt to import a larger file that is ~86k rows (8MB) the window freezes. I need to be able to import files that have up to a million or more rows (40+MB)
I have set the max_memory_limit to 2G and the chunk size of the import action to 100. I know it defaults to this but I wanted to be sure.
I am relatively new to PHP, Laravel and filament and I am not sure where to look. I have tried reading through the docs, and unless I am missing something, I don't see anything to either manage or troubleshoot the reading/loading of the file. I see stuff like lifecycle hooks and most of them seem to be related to the lifecycle after the read.
Is there a way to not load the entire CSV file into memory and instead load/read the number of rows specified in chunks?
Any help or insight would be appreciated.
30 Replies
Did you use Queues?
Yes, I have but do Queues break up the read/load of the CSV? Or is it only queuing up the batches for import *after *the file has been read/loaded into memory and batched into chunks? I thought it would be the latter.
Update: I haven't solved the issue yet, but I did find out about Laravel's Lazy Collections feature which seems tailor made for just such a scenario as I described.
Laravel - The PHP Framework For Web Artisans
Laravel is a PHP web application framework with expressive, elegant syntax. We’ve already laid the foundation — freeing you to create without sweating the small things.
It seems like I might have to create a custom import action to leverage this. I'm hoping I can just use the Filament pre-built import action as a starting point.
I feel this is more of an issue outside of filament and has more to do with the server and its allocation of memory. But of the server takes too long eventually it will timeout. So, if you are importing this many rows it would be more prudent to handle these via a custom queued job that maybe offloads the process to a dedicated server.
I wouldn't have thought the importer should lock up the browser though. I assume that happens in order to provide the feedback about the number of rows that will be imported, but it would make sense to at least provide the option to forego that in favour of letting a background job take care of the entire thing.
There are lots of import jobs that would run fine on the same server as the website, but will still take longer to read in than is reasonable for the user to see feedback.
Valid point.
It is odd the browser is crashing before the server considering it’s just waiting on a response.
@Rohan, you can definitely use the default action as a starting point, but you'd need to create a new action that re-implements the
CanImportRecords
trait . You'd want the action part of it to save the uploaded file to the disk and dispatch a new queued job that then carries out the reading of the file and dispatching the jobs to process each chunk (the Bus::batch section of the code)
Currently the initial request reads the entire CSV in, and then chunks it up into individual jobs for each 100 records.
In your queued job you'd want to instead read the file in as a LazyCollection and then fire off a job for however many rows is reasonable for your needs, to be processed in a single job.
You should be fine using the built-in Importer
job to handle the actual import.Thank you @Tetracyclic . I am a newb so I'm using ChatGPT to help me understand and implement this and it doesn't seem to propose the trait you mentioned so this is very helpful. I will learn about it and implement the solution.
Just stick with it. You’ll get there. This is a somewhat advanced topic. So don’t apologize for it.
I might be a little too new to submit a PR, but IMHO, this approach with LazyCollections seems to deliver a more consistent experience, so I think it might be useful to have in the main framework.
The current approach has the slight advantage on smaller imports that it can give immediate feedback to the user that the CSV has been read successfully and the number of rows that will be imported.
The advantage with LazyCollection is solely that it uses less memory, but it will still take just as long to return a result. So on a large import, if it happens in the initial request, it will still lock up the response until it's read the entire CSV file in.
Sorry I don't understand what you mean here:
So on a large import, if it happens in the initial request, it will still lock up the response until it's read the entire CSV file in.By lock-up the response, do you mean it will not start importing until it has read and chunked the entire file? Based on my current understanding and knowledge, I find the LazyCollection approach very attractive, not just because it uses less memory, but because I can generally ensure a relatively consistent use of memory, rather than spikes, which I imagine helps with stability. Also, when the import completes is less of an issue as long as I can measure and communicate a timeframe to a user to set expectations, IMO.
Yep. This is also where queues come into play.
Currently when you upload the CSV file and click submit, Filament will immediately read the entire file in, creating jobs for each chunk of however many rows, and then send the response back to the browser saying it's processing them.
If you just switched to a LazyCollection, it will still read the entire file in and dispatch the jobs, it just won't keep the old chunks in memory as it goes, but it will take just as long to return the response back to the user (stopping them doing anything in the mean time.)
What I'm suggesting is that when you submit the file, it should just save the file to the server and dispatch a queued job with the filename and then immediately send a response to the user.
The job you dispatch can then use a LazyCollection to read the file in in chunks, and dispatch a job to process each of those chunks, or you could even just process the chunks one by one in that job.
You can send database notifications from that background job to inform the user about the progress, although I don't think there is any way to determine the number of rows in the file without reading the entire thing into memory.
What I'm suggesting is that when you submit the file, it should just save the file to the server and dispatch a queued job with the filename and then immediately send a response to the user.Ahh, yes, that's definitely what I want. I want it to save the file to disk and then over time read it in parts and not load it all to memory.
The job you dispatch can then use a LazyCollection to read the file in in chunks, and dispatch a job to process each of those chunks, or you could even just process the chunks one by one in that job. You can send database notifications from that background job to inform the user about the progress, although I don't think there is any way to determine the number of rows in the file without reading the entire thing into memory.Perfect. This is exactly what I'm hoping for. I will work on this. Thank you so much for your help @Tetracyclic @awcodes ! I wonder if I can provide an estimate to the user based on the file size and maybe some historical data of past imports and maybe even the overall number of jobs in queue on the server rather than the number of rows 🤔 I'll save that rabbithole for later. Thanks again!
No problem.
This is the code for the base ImportAction class that you insert into your resource, all it does is include that
CanImportRecords
trait, so you could just make a copy of that and convert it into your own custom action. And then you'll likely just need to modify the $this->action
callback to fire off a custom queued job that does most of what is already there, just using LazyCollections instead.Apologies if this is a dumb question but does this mean that I would use a regular file upload action first, and then have that trigger an import action that I can use for the mapping and creation of the read and import queues? Does that make sense?
Wait, maybe I'm overthinking...the regular import action already does the upload piece...i think...
Yep, it does.
But this line stops it from saving the file: https://github.com/filamentphp/filament/blob/6799de116ceac33778566a8aee533c0c3f97cd74/packages/actions/src/Concerns/CanImportRecords.php#L139
AH
When you upload the file, instead of saving it the closure in the
afterStateUpdated
poprety above reads in just the header row, and updates the Fieldset
that comes after to present the options for which columns to import.
Then when you submit the form, it's saved as a temporary file and the Closure in $this->action
reads that in as a stream and creates the chunked jobs. I think it actually probably has similar performance to LazyCollection, as it's reading it in chunks, it's just doing it all in that initial request to the server.
You'd basically want to save that fle permanently, and then move most of the logic in the action into a queued job.Right, got it. This is super helpful. Thank you so much!
I need to learn how to explore the Filament codebase like you have. I keep resorting to just the docs, which is pretty silly, now that I think of it.
It's using Filament's ChunkIterator class to read in the file, which works similarly to a LazyCollection.
It's really helpful for learning how to use Filament normally, as most of it is implemented in the same way you'd implement things yourself using Filament.
Filament has its conventions, but it still just Laravel, so anything in you can do in laravel can still be done in Filament.
Ie, any action just runs a callback so in that callback you can use any laravel or native php. It doesn’t matter it just running a function that you can define.
Ah, so if it works similarly to LazyCollection, the difference maker is that the approach you are proposing doesn't read it in one request, but rather queues the reading of the chunks, right?
Yep, exactly.
As soon as the file is uploaded, you'd return the request to the user with a notification saying it's being processed, and then the queued job can kick in and do all the actual processing from the saved file.
Perfect! Thank you so much!
And when the processing is done you can emit a db event that notifies the users that it is done. Which they will receive if they are still logged in
It would probably be a worthwhile improvement to the ImportAction in core so that you can instruct it to work like this.
I will either submit it, after I've finished the code or put it here so someone who has more experience than me can look it over and submit it themselves.