R
RunPod•2mo ago
Untrack4d

Training Flux Schnell on serverless

Hi there, i am using your pods to run ostris/ai-toolkit to train flux on custom images, the thing is now i want to use your serverless endpoint capabilities, can you help me out? do you have some kind of template or guide on how to do it?
56 Replies
navin_hariharan
navin_hariharan•2mo ago
@Untrack4d Hii! I have the dev serverless already! I'll update schnell soon
Untrack4d
Untrack4d•2mo ago
Do you have some demo or can I test it out?
navin_hariharan
navin_hariharan•2mo ago
Give me 30min
Untrack4d
Untrack4d•2mo ago
Ok man, thx What are you using to train it?
navin_hariharan
navin_hariharan•2mo ago
{ "input": { "lora_file_name": "laksheya-geraldine_viswanathan-FLUX", "trigger_word": "geraldine viswanathan", "gender":"woman", "data_url": "dataset_zip url" }, "s3Config": { "accessId": "accessId", "accessSecret": "accessSecret", "bucketName": "flux-lora", "endpointUrl": "https://minio-api.cloud.com" } } @Untrack4d
Untrack4d
Untrack4d•2mo ago
Thanks for sharing I will check it out what does this image contain? FROM navinhariharan/flux-lora:latest how are you handling the long time proccess of training a model?
navin_hariharan
navin_hariharan•2mo ago
Disable this for long time proccess
No description
navin_hariharan
navin_hariharan•2mo ago
FROM navinhariharan/flux-lora:latest These contain the flux models dev and schnell
Untrack4d
Untrack4d•2mo ago
Thank you for the help 🫡
navin_hariharan
navin_hariharan•2mo ago
Anytime 🙂 So the lora is trained and sent to your s3 bucket!
Untrack4d
Untrack4d•2mo ago
I will be hosting it in a server of mine to reduce costs
navin_hariharan
navin_hariharan•2mo ago
I use minio!
Untrack4d
Untrack4d•2mo ago
Never heard of
navin_hariharan
navin_hariharan•2mo ago
open source s3
navin_hariharan
navin_hariharan•2mo ago
MinIO
MinIO | S3 Compatible Storage for AI
MinIO's High Performance Object Storage is Open Source, Amazon S3 compatible, Kubernetes Native and is designed for cloud native workloads like AI.
Untrack4d
Untrack4d•2mo ago
I will take a look
navin_hariharan
navin_hariharan•2mo ago
Sure! If you have issues let me know! I'll be happy to help!
Untrack4d
Untrack4d•2mo ago
Do you have any tips to get better results? Or to make it train faster?
navin_hariharan
navin_hariharan•2mo ago
Sample dataset with default param works!
navin_hariharan
navin_hariharan•2mo ago
It takes 2hours! The one in civit lora trainer is faster!
Untrack4d
Untrack4d•2mo ago
i was using ai-toolkit what hardware are you using?
navin_hariharan
navin_hariharan•2mo ago
No description
Untrack4d
Untrack4d•2mo ago
does it work for schneel? Is is faster then ai-toolkit?
navin_hariharan
navin_hariharan•2mo ago
You can deploy this to get started!
No description
navin_hariharan
navin_hariharan•2mo ago
Yes! Yes! The lora size is small too without loss of quality! navinhariharan/flux-lora:latest
Untrack4d
Untrack4d•2mo ago
With ai-toolkit i am getting about 30-40 min for 1000 steps
navin_hariharan
navin_hariharan•2mo ago
I do 2000 steps!
Untrack4d
Untrack4d•2mo ago
ok, that makes sense are you doing some kind of image selection/preprocessing?
navin_hariharan
navin_hariharan•2mo ago
Yep! The captions!
Untrack4d
Untrack4d•2mo ago
i am using florence2 for that you arent excluding low quality ones, resizing, etc?
navin_hariharan
navin_hariharan•2mo ago
The images you mean? I mix a bit of everything!
Untrack4d
Untrack4d•2mo ago
i have noticed that low quality ones can completly mess your output what have you put in this image navinhariharan/flux-lora:latest i want to costumize it, can you share the source?
navin_hariharan
navin_hariharan•2mo ago
black-forest-labs/FLUX.1-schnell black-forest-labs/FLUX.1-dev These are auto downloaded by ai-toolkit! Instead of exporting env for HF_TOKEN I downloaded and made a docker image That lives here /huggingface/
Untrack4d
Untrack4d•2mo ago
i want to store those models in a network volume, so it can be shared between serverless instances
navin_hariharan
navin_hariharan•2mo ago
That's the best!
Untrack4d
Untrack4d•2mo ago
the thing is i didnt understood how to choose where its stored another thing: def train_lora(job): if 's3Config' in job: s3_config = job["s3Config"] job_input = job["input"] job_input = download(job_input) if edityaml(job_input) == True: if job_input['gender'].lower() in ['woman','female','girl']: job = get_job('config/woman.yaml', None) elif job_input['gender'].lower() in ['man','male','boy']: job = get_job('config/man.yaml', None) job.run() how are you able to run the job, where does the get_job function come from?
navin_hariharan
navin_hariharan•2mo ago
The handler bro!
Untrack4d
Untrack4d•2mo ago
Yes but then you call job.run
navin_hariharan
navin_hariharan•2mo ago
runpod.serverless.start({"handler": train_lora}) This will call the function train_lora with the input json! that is... job = { "input": { "lora_file_name": "laksheya-geraldine_viswanathan-FLUX", "trigger_word": "geraldine viswanathan", "gender":"woman", "data_url": "dataset_zip url" }, "s3Config": { "accessId": "accessId", "accessSecret": "accessSecret", "bucketName": "flux-lora", "endpointUrl": "https://minio-api.cloud.com" } } @Untrack4d
Untrack4d
Untrack4d•2mo ago
Anda where is that function? The train_lora ?
navin_hariharan
navin_hariharan•2mo ago
@Untrack4d Line 31
No description
Untrack4d
Untrack4d•2mo ago
sorry man it was a pretty stupid question, thats what i get for trying to do n things at a time ahaha
navin_hariharan
navin_hariharan•2mo ago
No issues mam! We are all learning 😄
Untrack4d
Untrack4d•2mo ago
Have you managed to successfully use network volumes in serverless?
navin_hariharan
navin_hariharan•2mo ago
I've never tried them! It shouldn't be difficult though
Sandeep
Sandeep•2mo ago
is this due the container size And may I know what is the inference time , it taking for an image to generate on A100 or any other gpus , for me its taking 15 seconds , @navin_hariharan
navin_hariharan
navin_hariharan•2mo ago
@Sandeep what is your input? Please remove any credentials you have and send Looks like an error while downloading dataset
Sandeep
Sandeep•2mo ago
I am using flux and sdxl models in this deployment, When ever user sends flux lora request, I will generate of flux lora Same applies to sdxl Input is Lora blob url Modeltype What should be the container size
navin_hariharan
navin_hariharan•2mo ago
That's all fine! How are you sending in the training dataset? @Sandeep
Sandeep
Sandeep•2mo ago
This system doesn't need datasets , it just use the models from huggingface , it will import models from huggingface and download the lora and will use that lora for inference
navin_hariharan
navin_hariharan•2mo ago
Could you please send the worker files so that I can take a look? And also do not forget to remove sensitive info before sending!
Sandeep
Sandeep•2mo ago
getting this error when I am using runpod-volume
No description
Sandeep
Sandeep•2mo ago
Use a more specific base image for efficiency FROM runpod/base:0.6.2-cuda12.2.0 Set environment variables ENV HF_HUB_ENABLE_HF_TRANSFER=0 \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ HF_HOME=/runpod-volume/huggingface-cache \ HUGGINGFACE_HUB_CACHE=/runpod-volume/huggingface-cache/hub \ WORKSPACE=/runpod-volume RUN ls -a / Create necessary directories RUN mkdir -p ${WORKSPACE}/app ${HF_HOME} Copy requirements first to leverage Docker cache for dependencies COPY requirements.txt ${WORKSPACE}/ Install dependencies in a single RUN statement to reduce layers RUN python3.11 -m pip install --no-cache-dir --upgrade pip && \ python3.11 -m pip install --no-cache-dir -r ${WORKSPACE}/requirements.txt && \ rm ${WORKSPACE}/requirements.txt Copy source code to /runpod-volume/app COPY test_input.json ${WORKSPACE}/app/ COPY src ${WORKSPACE}/app/src Set the working directory WORKDIR ${WORKSPACE}/app/src Use the built-in handler script from the source CMD ["python3.11", "-u", "runpod_handler.py"]
Want results from more Discord servers?
Add your server