Siamak
Run Lorax on Runpod (Serverless)
@ashleyk
ubuntu@150-136-88-165:~/sia/test_docker$ sudo docker run --gpus all -it --rm -p 8080:8080 --name generator runpod_test --help
2024-02-21T05:41:12.402252Z INFO lorax_launcher: Args { model_id: "TheBloke/Llama-2-13B-chat-AWQ", adapter_id: None, source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Awq), compile: false, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 4096, max_total_tokens: 5096, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, hostname: "127.0.0.1", port: 8080, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-02-21T05:41:12.402366Z INFO download: lorax_launcher: Starting download process.
--- Starting Serverless Worker | Version 1.6.2 ---
INFO | Using test_input.json as job input.
DEBUG | Retrieved local job: {'input': {'prompt': 'Hi, How are you?'}, 'id': 'local_test'}
INFO | local_test | Started.
ERROR | local_test | Captured Handler Exception
ERROR | {
"error_type": "<class 'requests.exceptions.ConnectionError'>",
"error_message": "HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3bd135f040>: Failed to establish a new connection: [Errno 111] Connection refused'))",
15 replies
Run Lorax on Runpod (Serverless)
I solved the connection issue by:
lorax-launcher --model-id $model --quantize awq --max-input-length=4096 --max-total-tokens=5096 --huggingface-hub-cache=/data --hostname=127.0.0.1 --port=8080
But I am getting this error:
15 replies