Files
tinygrad/examples
chenyu f7f67e0cc5 simple fix llama shard with quantize (#3882)
copy scale on all device for now. naive sharding does not work because scale needs expand to really save memory.

70B does not work due to HSA_STATUS_ERROR_OUT_OF_RESOURCES.

`python3 examples/llama.py --gen 2 --size 13B --shard 6 --prompt "Hello." --count 10 --temperature 0 --timing --quantize`

13B on 6 gpus uses 47 GB v.s. 34 GB quantized
2024-03-22 18:15:37 -04:00
..
2023-03-11 16:28:10 -08:00
2024-03-14 20:44:34 -07:00
2023-10-30 18:42:26 -07:00
2023-08-22 07:36:24 -07:00
2023-09-28 18:02:31 -07:00
2024-01-01 14:58:48 -08:00
2024-03-14 13:34:14 -07:00
2023-11-28 17:36:55 -08:00
2023-12-08 12:59:38 -08:00
2024-03-14 17:33:45 -04:00