Import action freezes browser window when importing 86k rows of data. How to troubleshoot?

Hello I am trying to understand how filaments import action works so that I may troubleshoot it further. Could someone point me in the direction of where I could understand the Filament import action's process of importing a CSV file? Like does it load the entire file into memory or does it read it in parts? I know there is the chunking functionality but if I am not mistaken that is only for the part where it is sending chunks of data to be processed in batches for import. I'm more curious about the loading and reading of the file before it is chunked for processing/import because an app I am working on seems to freeze in the browser for larger imports. I am able to import smaller files that are about 1,000 rows long without issue. However when I attempt to import a larger file that is ~86k rows (8MB) the window freezes. I need to be able to import files that have up to a million or more rows (40+MB) I have set the max_memory_limit to 2G and the chunk size of the import action to 100. I know it defaults to this but I wanted to be sure. I am relatively new to PHP, Laravel and filament and I am not sure where to look. I have tried reading through the docs, and unless I am missing something, I don't see anything to either manage or troubleshoot the reading/loading of the file. I see stuff like lifecycle hooks and most of them seem to be related to the lifecycle after the read. Is there a way to not load the entire CSV file into memory and instead load/read the number of rows specified in chunks? Any help or insight would be appreciated.
30 Replies
Toni
Toni6mo ago
Did you use Queues?
KnightOfRohan
KnightOfRohanOP6mo ago
Yes, I have but do Queues break up the read/load of the CSV? Or is it only queuing up the batches for import *after *the file has been read/loaded into memory and batched into chunks? I thought it would be the latter.
KnightOfRohan
KnightOfRohanOP6mo ago
Update: I haven't solved the issue yet, but I did find out about Laravel's Lazy Collections feature which seems tailor made for just such a scenario as I described.
Laravel - The PHP Framework For Web Artisans
Laravel is a PHP web application framework with expressive, elegant syntax. We’ve already laid the foundation — freeing you to create without sweating the small things.
KnightOfRohan
KnightOfRohanOP6mo ago
It seems like I might have to create a custom import action to leverage this. I'm hoping I can just use the Filament pre-built import action as a starting point.
awcodes
awcodes6mo ago
I feel this is more of an issue outside of filament and has more to do with the server and its allocation of memory. But of the server takes too long eventually it will timeout. So, if you are importing this many rows it would be more prudent to handle these via a custom queued job that maybe offloads the process to a dedicated server.
Tetracyclic
Tetracyclic6mo ago
I wouldn't have thought the importer should lock up the browser though. I assume that happens in order to provide the feedback about the number of rows that will be imported, but it would make sense to at least provide the option to forego that in favour of letting a background job take care of the entire thing. There are lots of import jobs that would run fine on the same server as the website, but will still take longer to read in than is reasonable for the user to see feedback.
awcodes
awcodes6mo ago
Valid point. It is odd the browser is crashing before the server considering it’s just waiting on a response.
Tetracyclic
Tetracyclic6mo ago
@Rohan, you can definitely use the default action as a starting point, but you'd need to create a new action that re-implements the CanImportRecords trait . You'd want the action part of it to save the uploaded file to the disk and dispatch a new queued job that then carries out the reading of the file and dispatching the jobs to process each chunk (the Bus::batch section of the code) Currently the initial request reads the entire CSV in, and then chunks it up into individual jobs for each 100 records. In your queued job you'd want to instead read the file in as a LazyCollection and then fire off a job for however many rows is reasonable for your needs, to be processed in a single job. You should be fine using the built-in Importer job to handle the actual import.
KnightOfRohan
KnightOfRohanOP6mo ago
Thank you @Tetracyclic . I am a newb so I'm using ChatGPT to help me understand and implement this and it doesn't seem to propose the trait you mentioned so this is very helpful. I will learn about it and implement the solution.
awcodes
awcodes6mo ago
Just stick with it. You’ll get there. This is a somewhat advanced topic. So don’t apologize for it.
KnightOfRohan
KnightOfRohanOP6mo ago
I might be a little too new to submit a PR, but IMHO, this approach with LazyCollections seems to deliver a more consistent experience, so I think it might be useful to have in the main framework.
Tetracyclic
Tetracyclic6mo ago
The current approach has the slight advantage on smaller imports that it can give immediate feedback to the user that the CSV has been read successfully and the number of rows that will be imported. The advantage with LazyCollection is solely that it uses less memory, but it will still take just as long to return a result. So on a large import, if it happens in the initial request, it will still lock up the response until it's read the entire CSV file in.
KnightOfRohan
KnightOfRohanOP6mo ago
Sorry I don't understand what you mean here:
So on a large import, if it happens in the initial request, it will still lock up the response until it's read the entire CSV file in.
By lock-up the response, do you mean it will not start importing until it has read and chunked the entire file? Based on my current understanding and knowledge, I find the LazyCollection approach very attractive, not just because it uses less memory, but because I can generally ensure a relatively consistent use of memory, rather than spikes, which I imagine helps with stability. Also, when the import completes is less of an issue as long as I can measure and communicate a timeframe to a user to set expectations, IMO.
awcodes
awcodes6mo ago
Yep. This is also where queues come into play.
Tetracyclic
Tetracyclic6mo ago
Currently when you upload the CSV file and click submit, Filament will immediately read the entire file in, creating jobs for each chunk of however many rows, and then send the response back to the browser saying it's processing them. If you just switched to a LazyCollection, it will still read the entire file in and dispatch the jobs, it just won't keep the old chunks in memory as it goes, but it will take just as long to return the response back to the user (stopping them doing anything in the mean time.) What I'm suggesting is that when you submit the file, it should just save the file to the server and dispatch a queued job with the filename and then immediately send a response to the user. The job you dispatch can then use a LazyCollection to read the file in in chunks, and dispatch a job to process each of those chunks, or you could even just process the chunks one by one in that job. You can send database notifications from that background job to inform the user about the progress, although I don't think there is any way to determine the number of rows in the file without reading the entire thing into memory.
KnightOfRohan
KnightOfRohanOP6mo ago
What I'm suggesting is that when you submit the file, it should just save the file to the server and dispatch a queued job with the filename and then immediately send a response to the user.
Ahh, yes, that's definitely what I want. I want it to save the file to disk and then over time read it in parts and not load it all to memory.
The job you dispatch can then use a LazyCollection to read the file in in chunks, and dispatch a job to process each of those chunks, or you could even just process the chunks one by one in that job. You can send database notifications from that background job to inform the user about the progress, although I don't think there is any way to determine the number of rows in the file without reading the entire thing into memory.
Perfect. This is exactly what I'm hoping for. I will work on this. Thank you so much for your help @Tetracyclic @awcodes ! I wonder if I can provide an estimate to the user based on the file size and maybe some historical data of past imports and maybe even the overall number of jobs in queue on the server rather than the number of rows 🤔 I'll save that rabbithole for later. Thanks again!
Tetracyclic
Tetracyclic6mo ago
No problem. This is the code for the base ImportAction class that you insert into your resource, all it does is include that CanImportRecords trait, so you could just make a copy of that and convert it into your own custom action. And then you'll likely just need to modify the $this->action callback to fire off a custom queued job that does most of what is already there, just using LazyCollections instead.
KnightOfRohan
KnightOfRohanOP6mo ago
Apologies if this is a dumb question but does this mean that I would use a regular file upload action first, and then have that trigger an import action that I can use for the mapping and creation of the read and import queues? Does that make sense? Wait, maybe I'm overthinking...the regular import action already does the upload piece...i think...
KnightOfRohan
KnightOfRohanOP6mo ago
AH
Tetracyclic
Tetracyclic6mo ago
When you upload the file, instead of saving it the closure in the afterStateUpdated poprety above reads in just the header row, and updates the Fieldset that comes after to present the options for which columns to import. Then when you submit the form, it's saved as a temporary file and the Closure in $this->action reads that in as a stream and creates the chunked jobs. I think it actually probably has similar performance to LazyCollection, as it's reading it in chunks, it's just doing it all in that initial request to the server. You'd basically want to save that fle permanently, and then move most of the logic in the action into a queued job.
KnightOfRohan
KnightOfRohanOP6mo ago
Right, got it. This is super helpful. Thank you so much! I need to learn how to explore the Filament codebase like you have. I keep resorting to just the docs, which is pretty silly, now that I think of it.
Tetracyclic
Tetracyclic6mo ago
It's using Filament's ChunkIterator class to read in the file, which works similarly to a LazyCollection. It's really helpful for learning how to use Filament normally, as most of it is implemented in the same way you'd implement things yourself using Filament.
awcodes
awcodes6mo ago
Filament has its conventions, but it still just Laravel, so anything in you can do in laravel can still be done in Filament. Ie, any action just runs a callback so in that callback you can use any laravel or native php. It doesn’t matter it just running a function that you can define.
KnightOfRohan
KnightOfRohanOP6mo ago
Ah, so if it works similarly to LazyCollection, the difference maker is that the approach you are proposing doesn't read it in one request, but rather queues the reading of the chunks, right?
Tetracyclic
Tetracyclic6mo ago
Yep, exactly. As soon as the file is uploaded, you'd return the request to the user with a notification saying it's being processed, and then the queued job can kick in and do all the actual processing from the saved file.
KnightOfRohan
KnightOfRohanOP6mo ago
Perfect! Thank you so much!
awcodes
awcodes6mo ago
And when the processing is done you can emit a db event that notifies the users that it is done. Which they will receive if they are still logged in
Tetracyclic
Tetracyclic6mo ago
It would probably be a worthwhile improvement to the ImportAction in core so that you can instruct it to work like this.
KnightOfRohan
KnightOfRohanOP6mo ago
I will either submit it, after I've finished the code or put it here so someone who has more experience than me can look it over and submit it themselves.
Want results from more Discord servers?
Add your server