Billy
|
b9972be7f1
|
Merge branch 'model-classification-api' into stripped-models
|
2025-03-18 14:57:23 +11:00 |
|
Billy
|
e61c5a3f26
|
Merge
|
2025-03-18 14:55:11 +11:00 |
|
Ryan Dick
|
9a389e6b93
|
Add a LLaVA OneVision starter model.
|
2025-03-18 11:53:06 +11:00 |
|
Ryan Dick
|
2ef1ecf381
|
Fix copy-paste errors.
|
2025-03-18 11:53:06 +11:00 |
|
Ryan Dick
|
e9714fe476
|
Add LLaVA Onevision model loading and inference support.
|
2025-03-18 11:53:06 +11:00 |
|
Ryan Dick
|
3f29293e39
|
Add LlavaOnevision model type and probing logic.
|
2025-03-18 11:53:06 +11:00 |
|
Billy
|
3469fc9843
|
Ruff
|
2025-03-18 09:22:16 +11:00 |
|
Billy
|
7cdd4187a9
|
Update classify script
|
2025-03-18 09:21:38 +11:00 |
|
Billy
|
24218b34bf
|
Make ruff happy
|
2025-03-17 12:04:26 +11:00 |
|
Billy
|
d970c6d6d5
|
Use override fixture
|
2025-03-17 11:58:13 +11:00 |
|
Billy
|
8bcd9fe4b7
|
Extend ModelOnDisk
|
2025-03-17 09:18:51 +11:00 |
|
Billy
|
4377158503
|
Variant
|
2025-03-13 13:32:57 +11:00 |
|
Billy
|
d8b9a8d0dd
|
Merge branch 'main' into model-classification-api
|
2025-03-13 13:03:51 +11:00 |
|
Billy
|
39a4608d15
|
Fix annotations compatability 3.11
|
2025-03-13 13:01:19 +11:00 |
|
Billy
|
b86ac5e049
|
Explicit union
|
2025-03-13 10:28:07 +11:00 |
|
Billy
|
665236bb79
|
Type hints
|
2025-03-13 09:21:58 +11:00 |
|
Billy
|
f45400a275
|
Remove hash algo
|
2025-03-12 18:39:29 +11:00 |
|
psychedelicious
|
e35537e60a
|
fix(mm): move flux_redux starter model to the flux bundle, make siglip a dependency of it
|
2025-03-11 11:17:19 +11:00 |
|
Billy
|
d86b392bfd
|
Remove redundant hash_algo field
|
2025-03-11 09:16:59 +11:00 |
|
Billy
|
3e9e45b177
|
Update comments
|
2025-03-11 09:04:19 +11:00 |
|
Billy
|
907d960745
|
PR suggestions
|
2025-03-11 08:37:43 +11:00 |
|
Billy
|
bfdace6437
|
New API for model classification
|
2025-03-11 08:34:34 +11:00 |
|
psychedelicious
|
cf0cbaf0ae
|
chore: ruff (more)
|
2025-03-06 10:57:54 +11:00 |
|
psychedelicious
|
ac6fc6eccb
|
chore: ruff
|
2025-03-06 10:57:54 +11:00 |
|
Ryan Dick
|
8e28888bc4
|
Fix SigLipPipeline model size calculation.
|
2025-03-06 10:31:17 +11:00 |
|
Ryan Dick
|
f1fde792ee
|
Get FLUX Redux working: model loading and inference.
|
2025-03-06 10:31:17 +11:00 |
|
Ryan Dick
|
e82393f7ed
|
Add FLUX Redux to starter models list.
|
2025-03-06 10:31:17 +11:00 |
|
Ryan Dick
|
d5211a8088
|
Add FluxRedux model type and probing logic.
|
2025-03-06 10:31:17 +11:00 |
|
Ryan Dick
|
3b095b5945
|
Add SigLIP starter model.
|
2025-03-06 10:31:17 +11:00 |
|
Ryan Dick
|
34959ef573
|
Add SigLIP model type and probing.
|
2025-03-06 10:31:17 +11:00 |
|
Billy
|
f2689598c0
|
Formatting
|
2025-03-06 09:11:00 +11:00 |
|
Ryan Dick
|
cc9d215a9b
|
Add endpoint for emptying the model cache. Also, adds a threading lock to the ModelCache to make it thread-safe.
|
2025-01-30 09:18:28 -05:00 |
|
Ryan Dick
|
f7315f0432
|
Make the default max RAM cache size more conservative.
|
2025-01-30 08:46:59 -05:00 |
|
Ryan Dick
|
229834a5e8
|
Performance optimizations for LoRAs applied on top of GGML-quantized tensors.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
5d472ac1b8
|
Move quantized weight handling for patch layers up from ConcatenatedLoRALayer to CustomModuleMixin.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
28514ba59a
|
Update ConcatenatedLoRALayer to work with all sub-layer types.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
0db6639b4b
|
Add FLUX OneTrainer model probing.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
0cf51cefe8
|
Revise the logic for calculating the RAM model cache limit.
|
2025-01-16 23:46:07 +00:00 |
|
Ryan Dick
|
da589b3f1f
|
Memory optimization to load state dicts one module at a time in CachedModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False).
|
2025-01-16 17:00:33 +00:00 |
|
Ryan Dick
|
36a3869af0
|
Add keep_ram_copy_of_weights config option.
|
2025-01-16 15:35:25 +00:00 |
|
Ryan Dick
|
c76d08d1fd
|
Add keep_ram_copy option to CachedModelOnlyFullLoad.
|
2025-01-16 15:08:23 +00:00 |
|
Ryan Dick
|
04087c38ce
|
Add keep_ram_copy option to CachedModelWithPartialLoad.
|
2025-01-16 14:51:44 +00:00 |
|
Ryan Dick
|
b2bb359d47
|
Update the model loading logic for several of the large FLUX-related models to ensure that the model is initialized on the meta device prior to loading the state dict into it. This helps to keep peak memory down.
|
2025-01-16 02:30:28 +00:00 |
|
Ryan Dick
|
d7ab464176
|
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
|
2025-01-07 02:53:44 +00:00 |
|
Ryan Dick
|
5b42b7bd45
|
Add a utility to help with determining the working memory required for expensive operations.
|
2025-01-07 01:20:15 +00:00 |
|
Ryan Dick
|
b343f81644
|
Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.
|
2025-01-07 01:20:15 +00:00 |
|
Ryan Dick
|
fc4a22fe78
|
Allow expensive operations to request more working memory.
|
2025-01-07 01:20:13 +00:00 |
|
Ryan Dick
|
a167632f09
|
Calculate model cache size limits dynamically based on the available RAM / VRAM.
|
2025-01-07 01:14:20 +00:00 |
|
Ryan Dick
|
6a9de1fcf3
|
Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().
|
2025-01-07 00:31:53 +00:00 |
|
Ryan Dick
|
e5180c4e6b
|
Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded.
|
2025-01-07 00:31:00 +00:00 |
|