Hey there,
I'm running magic-llama-3.1 on an Threadripper 1920x and a RTX3070Ti with Ubuntu 24.04.1 LTS.
I have a few errors/questions:
- with "magic run serve --huggingface-repo-id modularai/llama-3.1"
i got this right at the Beginning "INFO: MainThread: root:
Estimated memory consumption:
Weights: 4693 MiB
KVCache allocation: 128 MiB
Total estimated: 4821 MiB used / unknown MiB free
Current batch size: 1
Current max sequence length: 512
Max recommended batch size for current sequence length: unknown
"
why is the estimated memory consumption unknown? same for recommended batch size...
- Why is only one of my NUMA nodes in use without the gpu flag?
- gpu flag isn't working on driver 550.120
2 Replies
Regarding the
unknown
memory measurement, in the 24.6 release the MAX Driver API couldn't yet read the memory statistics for a local CPU host. This has been added in the recent MAX nightlies, so if you switch to the nightly
branch for our examples you should now get accurate memory measurements when running on CPU.
While MAX currently will distribute work across the CPU cores within a NUMA node, MAX graphs at present will only run on an individual NUMA node. You can manually dispatch multiple compute graphs on multiple NUMA nodes using the CPU(id: [node])
constructor for a Driver API Device
. Multi-device distribution of a MAX graph is on our roadmap.
Apologies for not properly documenting this (we are working to do so), but the minimum NVIDIA driver version MAX supports is 555. That version or newer is needed for some of the PTX features used in MAX. If you upgrade to that version, you should be able to access your GPU via MAX.