R
RunPodβ€’3mo ago
Untrack4d

Training Flux Schnell on serverless

Hi there, i am using your pods to run ostris/ai-toolkit to train flux on custom images, the thing is now i want to use your serverless endpoint capabilities, can you help me out? do you have some kind of template or guide on how to do it?
68 Replies
navin_hariharan
navin_hariharanβ€’3mo ago
@Untrack4d Hii! I have the dev serverless already! I'll update schnell soon
Untrack4d
Untrack4dOPβ€’3mo ago
Do you have some demo or can I test it out?
navin_hariharan
navin_hariharanβ€’3mo ago
Give me 30min
Untrack4d
Untrack4dOPβ€’3mo ago
Ok man, thx What are you using to train it?
navin_hariharan
navin_hariharanβ€’3mo ago
{ "input": { "lora_file_name": "laksheya-geraldine_viswanathan-FLUX", "trigger_word": "geraldine viswanathan", "gender":"woman", "data_url": "dataset_zip url" }, "s3Config": { "accessId": "accessId", "accessSecret": "accessSecret", "bucketName": "flux-lora", "endpointUrl": "https://minio-api.cloud.com" } } @Untrack4d
Untrack4d
Untrack4dOPβ€’3mo ago
Thanks for sharing I will check it out what does this image contain? FROM navinhariharan/flux-lora:latest how are you handling the long time proccess of training a model?
navin_hariharan
navin_hariharanβ€’3mo ago
Disable this for long time proccess
No description
navin_hariharan
navin_hariharanβ€’3mo ago
FROM navinhariharan/flux-lora:latest These contain the flux models dev and schnell
Untrack4d
Untrack4dOPβ€’3mo ago
Thank you for the help 🫑
navin_hariharan
navin_hariharanβ€’3mo ago
Anytime πŸ™‚ So the lora is trained and sent to your s3 bucket!
Untrack4d
Untrack4dOPβ€’3mo ago
I will be hosting it in a server of mine to reduce costs
navin_hariharan
navin_hariharanβ€’3mo ago
I use minio!
Untrack4d
Untrack4dOPβ€’3mo ago
Never heard of
navin_hariharan
navin_hariharanβ€’3mo ago
open source s3
navin_hariharan
navin_hariharanβ€’3mo ago
MinIO
MinIO | S3 Compatible Storage for AI
MinIO's High Performance Object Storage is Open Source, Amazon S3 compatible, Kubernetes Native and is designed for cloud native workloads like AI.
Untrack4d
Untrack4dOPβ€’3mo ago
I will take a look
navin_hariharan
navin_hariharanβ€’3mo ago
Sure! If you have issues let me know! I'll be happy to help!
Untrack4d
Untrack4dOPβ€’3mo ago
Do you have any tips to get better results? Or to make it train faster?
navin_hariharan
navin_hariharanβ€’3mo ago
Sample dataset with default param works!
navin_hariharan
navin_hariharanβ€’3mo ago
It takes 2hours! The one in civit lora trainer is faster!
Untrack4d
Untrack4dOPβ€’3mo ago
i was using ai-toolkit what hardware are you using?
navin_hariharan
navin_hariharanβ€’3mo ago
No description
Untrack4d
Untrack4dOPβ€’3mo ago
does it work for schneel? Is is faster then ai-toolkit?
navin_hariharan
navin_hariharanβ€’3mo ago
You can deploy this to get started!
No description
navin_hariharan
navin_hariharanβ€’3mo ago
Yes! Yes! The lora size is small too without loss of quality! navinhariharan/flux-lora:latest
Untrack4d
Untrack4dOPβ€’3mo ago
With ai-toolkit i am getting about 30-40 min for 1000 steps
navin_hariharan
navin_hariharanβ€’3mo ago
I do 2000 steps!
Untrack4d
Untrack4dOPβ€’3mo ago
ok, that makes sense are you doing some kind of image selection/preprocessing?
navin_hariharan
navin_hariharanβ€’3mo ago
Yep! The captions!
Untrack4d
Untrack4dOPβ€’3mo ago
i am using florence2 for that you arent excluding low quality ones, resizing, etc?
navin_hariharan
navin_hariharanβ€’3mo ago
The images you mean? I mix a bit of everything!
Untrack4d
Untrack4dOPβ€’3mo ago
i have noticed that low quality ones can completly mess your output what have you put in this image navinhariharan/flux-lora:latest i want to costumize it, can you share the source?
navin_hariharan
navin_hariharanβ€’3mo ago
black-forest-labs/FLUX.1-schnell black-forest-labs/FLUX.1-dev These are auto downloaded by ai-toolkit! Instead of exporting env for HF_TOKEN I downloaded and made a docker image That lives here /huggingface/
Untrack4d
Untrack4dOPβ€’3mo ago
i want to store those models in a network volume, so it can be shared between serverless instances
navin_hariharan
navin_hariharanβ€’3mo ago
That's the best!
Untrack4d
Untrack4dOPβ€’3mo ago
the thing is i didnt understood how to choose where its stored another thing: def train_lora(job): if 's3Config' in job: s3_config = job["s3Config"] job_input = job["input"] job_input = download(job_input) if edityaml(job_input) == True: if job_input['gender'].lower() in ['woman','female','girl']: job = get_job('config/woman.yaml', None) elif job_input['gender'].lower() in ['man','male','boy']: job = get_job('config/man.yaml', None) job.run() how are you able to run the job, where does the get_job function come from?
navin_hariharan
navin_hariharanβ€’3mo ago
The handler bro!
Untrack4d
Untrack4dOPβ€’3mo ago
Yes but then you call job.run
navin_hariharan
navin_hariharanβ€’3mo ago
runpod.serverless.start({"handler": train_lora}) This will call the function train_lora with the input json! that is... job = { "input": { "lora_file_name": "laksheya-geraldine_viswanathan-FLUX", "trigger_word": "geraldine viswanathan", "gender":"woman", "data_url": "dataset_zip url" }, "s3Config": { "accessId": "accessId", "accessSecret": "accessSecret", "bucketName": "flux-lora", "endpointUrl": "https://minio-api.cloud.com" } } @Untrack4d
Untrack4d
Untrack4dOPβ€’3mo ago
Anda where is that function? The train_lora ?
navin_hariharan
navin_hariharanβ€’3mo ago
@Untrack4d Line 31
No description
Untrack4d
Untrack4dOPβ€’3mo ago
sorry man it was a pretty stupid question, thats what i get for trying to do n things at a time ahaha
navin_hariharan
navin_hariharanβ€’3mo ago
No issues mam! We are all learning πŸ˜„
Untrack4d
Untrack4dOPβ€’3mo ago
Have you managed to successfully use network volumes in serverless?
navin_hariharan
navin_hariharanβ€’3mo ago
I've never tried them! It shouldn't be difficult though
Sandeep
Sandeepβ€’3mo ago
is this due the container size And may I know what is the inference time , it taking for an image to generate on A100 or any other gpus , for me its taking 15 seconds , @navin_hariharan
navin_hariharan
navin_hariharanβ€’3mo ago
@Sandeep what is your input? Please remove any credentials you have and send Looks like an error while downloading dataset
Sandeep
Sandeepβ€’3mo ago
I am using flux and sdxl models in this deployment, When ever user sends flux lora request, I will generate of flux lora Same applies to sdxl Input is Lora blob url Modeltype What should be the container size
navin_hariharan
navin_hariharanβ€’3mo ago
That's all fine! How are you sending in the training dataset? @Sandeep
Sandeep
Sandeepβ€’3mo ago
This system doesn't need datasets , it just use the models from huggingface , it will import models from huggingface and download the lora and will use that lora for inference
navin_hariharan
navin_hariharanβ€’3mo ago
Could you please send the worker files so that I can take a look? And also do not forget to remove sensitive info before sending!
Sandeep
Sandeepβ€’3mo ago
getting this error when I am using runpod-volume
No description
Sandeep
Sandeepβ€’3mo ago
Use a more specific base image for efficiency FROM runpod/base:0.6.2-cuda12.2.0 Set environment variables ENV HF_HUB_ENABLE_HF_TRANSFER=0 \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ HF_HOME=/runpod-volume/huggingface-cache \ HUGGINGFACE_HUB_CACHE=/runpod-volume/huggingface-cache/hub \ WORKSPACE=/runpod-volume RUN ls -a / Create necessary directories RUN mkdir -p ${WORKSPACE}/app ${HF_HOME} Copy requirements first to leverage Docker cache for dependencies COPY requirements.txt ${WORKSPACE}/ Install dependencies in a single RUN statement to reduce layers RUN python3.11 -m pip install --no-cache-dir --upgrade pip && \ python3.11 -m pip install --no-cache-dir -r ${WORKSPACE}/requirements.txt && \ rm ${WORKSPACE}/requirements.txt Copy source code to /runpod-volume/app COPY test_input.json ${WORKSPACE}/app/ COPY src ${WORKSPACE}/app/src Set the working directory WORKDIR ${WORKSPACE}/app/src Use the built-in handler script from the source CMD ["python3.11", "-u", "runpod_handler.py"]
Zuck
Zuckβ€’3w ago
@Sandeep @navin_hariharan Did you guys ever get this working, I’m trying to do the same thing with ai-toolkit. Flux dev model. Any code you can share? There are some things in your docker image @navin_hariharan id love to be able to edit thank you!! 😭😭
navin_hariharan
navin_hariharanβ€’3w ago
@Zuck I have lost the Dockerfile of https://hub.docker.com/r/navinhariharan/flux-lora/tags
Zuck
Zuckβ€’3w ago
That’s okay ! I should be able to reverse engineer πŸ™‚ Thank you so much!!
navin_hariharan
navin_hariharanβ€’3w ago
Please send it here if you have managed to do it!
Zuck
Zuckβ€’3w ago
Deal sounds good!
navin_hariharan
navin_hariharanβ€’3w ago
@Zuck Are you free now?
navin_hariharan
navin_hariharanβ€’3w ago
Give this a test! Should work hopefully!
Zuck
Zuckβ€’3w ago
@navin_hariharan amazing okay thanks!! I uploaded the contents of the docker image to a private github, did you want me to share it with you private?
navin_hariharan
navin_hariharanβ€’3w ago
Here is the everything working! πŸ™‚
navin_hariharan
navin_hariharanβ€’3w ago
You can make it public! No issues! Many people may get benefited! Removed unnecessary code! - It's just the models the models that the FROM is pulling! - AI toolkit will now be downloaded on this Dockerfile! TO-DO: Support the schnell config
Zuck
Zuckβ€’2w ago
GitHub
GitHub - newideas99/flux-training-docker
Contribute to newideas99/flux-training-docker development by creating an account on GitHub.
Want results from more Discord servers?
Add your server