It seems my bucket has ground to a near
It seems my bucket has ground to a near halt. I was previously uploading a 130MiB/s+ and now I'm seeing
Completed 312.0 MiB/4.7 GiB (4.1 MiB/s) with 1 file(s) remaining
from the aws cli. df681cd207a9e2afa394d260c486fd1e
bucket tts-data
1 Reply
I have a handfull of multipart uploads I abanadoned, not sure if that matters
but with 7.8T to upload, this is a major problem lol
it seems to be going fast again after I deleted the few files that were in there
interesting, will usage that in the future. Should I use sync command or something else?
pinging reply š
gotcha, thanks will try that!
do you know if it can resume partial uploads @Space - Ping in replies ?
š¦
well s3 has resumable multipart uploads no?
Because copy skips existing files, if I cancel the current aws s3 cp command it will resume where it left off, minus the abandoned upload?
Copy files from source to dest, skipping identical files.
sure, I mean like if I have file abc already uploaded to dest and it's on disk, when I run rclone copy it will skip copying that bc it sees it is already in dest right?
ok cool that's great then I can just kill my aws scripts haha
that'd be great!
im moving from local to r2 actually haha
oh duh you said that
will this make a noticeable improvement over default settings @Space - Ping in replies ?
im seeing just a ton of 0%'s
like 500 of these lol
preallocating where?
can I do that without writing the extra files? I don't want them there if they don't have to be
on r2 and making sure to write extra metadata hashes etcwhat does this mean then? does this process take a while? oh wow ok well that might not end up being faster lol says it's checked 159 though? I don't have too much disk left though lol Is there progress on that reading I can show? This isn't a very fast disk (network volume) I also added the
no_check_bucket = true
option to the config because fo the keys
no it's a network volume on DO
sorry, like a block storage
not locally attached storage, but still a "disk"
digitalocean magic
no it shows like a local disk
it's just not as fast as a locally attached disk
I just wish I could see the progress of the preallocation, it's been going for 9 min and no % changes lol
will the preallocation need to restart though?
I guess so, since it shows 4 now lol. I guess it preallocates 4 at a time then?
oh yeah there goes the first one
@Space - Ping in replies it only looks like it's doing 1 file at a time though
and this is at 4-10gbit machine
oh ok now some are starting to run concurrently, maybe this disk is just a lot slower than I thought lol
the single files don't seem to be uploading faster than just dumb concurrent uploads from aws s3 cp though, those ran ~110MiB/s each
yeah I think this is slower because of all the file reading it has to do first hahaha damn
well not if it's limited by disk speed instead of network right?
but it has to read every file twice effectively
with rclone, is there a way to do a list of web files directly to S3?
R2
using DO just as a download/upload machine since bandwidth is cheaper and I can get "10gbit" machine which really seems to be 4 lol
I'm trying to download a 7.7TB data set and upload into my own R2
some web server with a list of direct tar files
like I have the list of links
rclone copyurl?
I don't think the server provides file listings though, but I have them in a txt file
sec
https://dl.fbaipublicfiles.com/voxpopuli/audios/
they have code to generate the list of links
but I don't think I can do them concurrently with a single rclone command, I'd have to run a bunch in parallel
issue is if that breaks, it won't know to skip already uploaded ones right?
I don't see on the docs for that page that it would
ah see that now
ill test somethin
@Space - Ping in replies I was getting 400MiB/s earlier, but sometimes get like 50
You think I could launch 552 of these concurrently? Or would that be dumb?
I don't think there's an easy way to do like 10 concurrently without writing code to manage it
let me try that
honestly that's fine, I've already been messing up for 2 days XD
that way I can save $1/hr on not having a 10TB disk XD
wdym?
LMAO
christ how
are you able to saturate even close to that from the source with concurrent downloads?
I don't want to bum your resources, I'm making a new node now to try running this
let me see hwo fast I can get this one node to trasnfer real quick and if it's slow I'll send you the details, just made a max size hetzner VM lol