mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-08 22:48:25 -05:00
* load llama3-1B to WEBGPU device * include compile script for loading llama3 to WEBGPU * parametrize max_context in build_transformer fxn * jit_model with two different args sets * compile for webgpu, split weights * load model weight parts in browser * export all tensors from initialized transformer * run transformer inference in browser * enable tiktoken with llama bpe in browser * count total tokens on client with tiktoken.js * full client-side chat streaming, eliminate server * revert change that enabled jitting with 2 argsets * llama without Variable or cache_kv, for webgpu * have client use mask tokens / whole context * cleanup staged weights * add tiktoken.js build script, README * export CLANG for Q6_k to float32 decompression * fix and test exported CLANG code for Q6_k to fp32 * revert changes to jit and export_model * isolate clang export * test Q6_K to float32 decompression in browser * gguf_load now also returns t_infos and data_start * prepare llama-1B Q6_K gguf chunks for browser * cache and decompress quantized llama in browser * enable separate deployment of large files * fix kv cache and symbolic with llama wgpu * eliminate browser lag during decompression * hash metadata and weight chunks * delete obsolete indexeddb cache to free disk * add progress bar, track model download/decompress * refactor progress callback * skip buffer hash verification for speed * Display progress for entire loading scope * Report page load errors to user * actually display errors * skip prompt tokens already seen by model * skip prefilling with last assistant message tokens * on page load tell user if webgpu not enabled * push deployed URL root to window.history * make note of bug sources with TODO items * isolate bug in CLANG with BEAM=2 * remove clang_bug.py from diff * decompress q6k to f32 on webgpu instead of clang * remove unused code * inter-weight decomp with larger wgpu kernels * parallelize decompression submissions * refactor dequantize scheduling * add progress bar back * fix bug * temp fix for loading GGUF Q6_K to fp16 not fp32 * fix rendering of exported CLANG * remove weight casts, sketch js functions for clang * get symbolic vars from jit_cache for model export * include symbolic vars in exported CLANG * render js for clang transformer * toggle clang/webgpu deployment; refactor decomp * compile and render clang Q6_K->fp16 and int8 quant * fix rendered clang for abs(fp16), to work in wasm * simplify clang js wrapping * run compiled clang in worker * prepare llama weights in workers, q6k to int8/fp16 * tinychat on clang in browser, f32/int8 weights * move wasm inference to (now flexible) worker * don't load redundant embeddings * modest wasm perf gain with compile flags * set default backend, enable backend choice/backup * render symbolic vars in exported WEBGPU * quantize webgpu llama to int8/f32 * improve UX arising from rendered WEBGPU * clean up webgpu launch * new weights split: smaller chunks, tinygrad quant. * switch webgpu inference to int8 quant * remove unneeded clang decompression * eliminate unneeded kv cache transfer to wasm * use 1 worker for simplified clang decompression * display launch errors * refactor: stream load weight chunks to WebGPU * show loading chunk completion * quantize embeddings to int8 * test float16 as input for quantization * webgpu: use f16 source, int8 embed, eliminate q6k * simplify split weights prep: all from state_dict * revert change to nn.state.gguf_load * remove unneeded decompression from webgpu client * remove unneeded code * decrease dl chunks from 47 to 16 MiB * improve stability of webgpu loading on mobile * autodetect mobile, improve load stability * refactor: progress closure * refactor: one unified progress bar * remove unneeded code * revert changes to tinygrad core library * enforce ios18.3 nerfed max buf size * BEAM=3 webgpu * cache integrity, mobile save throttling * improve mobile UX - no autozoom on prompt box * clang: int8 from f16, remove q6k * reduce concurrent dls on mobile to 2 for stability * refactor: wasm backend with stream loading * prevent race between wasm load and indexedb save * split wasm kernels into separate modules * js wrapper for multiple wasm module inference * revert multi-module wasm to single module * make mobile wasm load more stable/fast * refactor: copy weights into wasm without crashes * fix bug in download queue; increase mobile dls * refactor exported clang wrapper, split weights * remove unnecessary code * greatly improve int8 quant quality with rounding * eliminate mobile throttling * increase webgpu context to 4096 tokens * export webgpu js functions * enable separate hosted weights for mobile/pc * enable prompt-thread switching during generation * stop generation when max_context is reached * show progress bar for prefill * tell user if webgpu fails, while wasm loads * make loading messages more concise * update font * revert changes to tinychat python app launch * cleanup quantization, add scale_dtype param * cleanup kv cache code * cleanup compile code * link tok_embeddings with output in webgpu export * refactor: export_model webgpu: symbolic vars * refactor: export_model weight loading * forgot to commit export_model.py * change CLANG to CPU * deal with pylint incorrectly failing tests * simplify f-strings for older CI python version * fix pre-python3.12 parser errors * [Int32Array] not Int32Array * cleanup webgpu compile after refactor export_model * refactor WASM export into export_model * merge WebGPU/WASM compile scripts * simplify max_contexts for local deployment * fix parser issues and whitespace * deduplicate variable defs for non-wasm clang export * cleanup code * cleanup compile scripts * simplify wasm inference wrapping * simplify webgpu symbolic vars export * refactor: unify export of symbolic variables * simplify WASM export * simplify clang/wasm export * update README and build scripts * separate files for browser/python apps * restore original python tinychat app files * browser and python tinychats share assets * minor cleanup * isolate app layer diff * add .gitignore for generated files * validate CPU/WEBGPU models in python * prevent infinite generation if validation fails * check if exported weight files are unique --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
322 lines
5.2 KiB
CSS
322 lines
5.2 KiB
CSS
/* define colors */
|
|
:root {
|
|
--primary-color: #fff;
|
|
--secondary-color: #2a2a2a;
|
|
--secondary-color-transparent: #ffffff66;
|
|
--primary-bg-color: #1a1a1a;
|
|
--foreground-color: #f0f0f0;
|
|
}
|
|
|
|
main {
|
|
width: 100%;
|
|
height: 100%;
|
|
|
|
display: flex;
|
|
flex-direction: column;
|
|
|
|
place-items: center;
|
|
}
|
|
|
|
.home {
|
|
width: 100%;
|
|
height: 90%;
|
|
|
|
margin-bottom: 10rem;
|
|
}
|
|
|
|
.title {
|
|
font-size: 3rem;
|
|
margin: 1rem 0;
|
|
margin-top: 3rem;
|
|
}
|
|
|
|
.histories-container-container {
|
|
width: 100%;
|
|
max-height: 75%;
|
|
|
|
position: relative;
|
|
}
|
|
|
|
.histories-container {
|
|
overflow-y: auto;
|
|
overflow-x: hidden;
|
|
width: 100%;
|
|
height: 100%;
|
|
|
|
display: flex;
|
|
flex-direction: column;
|
|
gap: 1rem;
|
|
align-items: center;
|
|
|
|
margin: 0;
|
|
padding: 3rem 1rem;
|
|
}
|
|
|
|
.histories-start {
|
|
height: 3rem;
|
|
width: 100%;
|
|
|
|
z-index: 999;
|
|
top: 0;
|
|
position: absolute;
|
|
|
|
background: linear-gradient(
|
|
180deg,
|
|
var(--primary-bg-color) 0%,
|
|
transparent 100%
|
|
);
|
|
}
|
|
.histories-end {
|
|
height: 3rem;
|
|
width: 100%;
|
|
|
|
z-index: 999;
|
|
bottom: 0;
|
|
position: absolute;
|
|
|
|
background: linear-gradient(
|
|
0deg,
|
|
var(--primary-bg-color) 0%,
|
|
transparent 100%
|
|
);
|
|
}
|
|
|
|
.history {
|
|
padding: 1rem;
|
|
width: 100%;
|
|
max-width: 40rem;
|
|
|
|
background-color: var(--secondary-color);
|
|
border-radius: 10px;
|
|
border-left: 2px solid var(--primary-color);
|
|
|
|
cursor: pointer;
|
|
|
|
transform: translateX(calc(1px * var(--tx, 0)));
|
|
opacity: var(--opacity, 1);
|
|
}
|
|
.history:hover {
|
|
background-color: var(--secondary-color);
|
|
}
|
|
|
|
.history-delete-button {
|
|
position: absolute;
|
|
top: 0;
|
|
right: 0;
|
|
padding: 0.5rem;
|
|
margin: 0;
|
|
outline: none;
|
|
border: none;
|
|
background-color: var(--secondary-color);
|
|
color: var(--foreground-color);
|
|
border-radius: 0 0 0 10px;
|
|
cursor: pointer;
|
|
transition: 0.2s;
|
|
}
|
|
.history-delete-button:hover {
|
|
background-color: var(--secondary-color);
|
|
padding: 0.75rem;
|
|
}
|
|
|
|
.messages {
|
|
overflow-y: auto;
|
|
height: 100%;
|
|
width: 100%;
|
|
max-width: 1200px;
|
|
|
|
display: flex;
|
|
flex-direction: column;
|
|
gap: 1rem;
|
|
align-items: center;
|
|
padding-top: 1rem;
|
|
padding-bottom: 11rem;
|
|
}
|
|
|
|
.message {
|
|
max-width: 75%;
|
|
padding: 0.5rem 1rem;
|
|
border-radius: 20px;
|
|
}
|
|
.message-role-assistant {
|
|
background-color: var(--secondary-color);
|
|
margin-right: auto;
|
|
color: #fff;
|
|
}
|
|
.message-role-user {
|
|
margin-left: auto;
|
|
background-color: var(--primary-color);
|
|
color: #000;
|
|
}
|
|
|
|
.message > pre {
|
|
white-space: pre-wrap;
|
|
}
|
|
|
|
.hljs {
|
|
width: 100%;
|
|
position: relative;
|
|
border-radius: 10px;
|
|
/* wrap code blocks */
|
|
white-space: pre-wrap;
|
|
}
|
|
/* put clipboard button in the top right corner of the code block */
|
|
.clipboard-button {
|
|
position: absolute;
|
|
top: 0;
|
|
right: 0;
|
|
padding: 0.5rem;
|
|
margin: 0;
|
|
outline: none;
|
|
border: none;
|
|
background-color: var(--secondary-color);
|
|
color: var(--foreground-color);
|
|
border-radius: 0 0 0 10px;
|
|
cursor: pointer;
|
|
transition: 0.2s;
|
|
}
|
|
.clipboard-button:hover {
|
|
background-color: var(--secondary-color);
|
|
padding: 0.75rem;
|
|
}
|
|
|
|
.input-container {
|
|
position: absolute;
|
|
bottom: 0;
|
|
|
|
/* linear gradient from background-color to transparent on the top */
|
|
background: linear-gradient(
|
|
0deg,
|
|
var(--primary-bg-color) 55%,
|
|
transparent 100%
|
|
);
|
|
|
|
width: 100%;
|
|
max-width: 1200px;
|
|
display: flex;
|
|
flex-direction: column;
|
|
justify-content: center;
|
|
align-items: center;
|
|
z-index: 999;
|
|
}
|
|
|
|
.input-performance {
|
|
margin-top: 4rem;
|
|
|
|
display: flex;
|
|
flex-direction: row;
|
|
gap: 1rem;
|
|
}
|
|
|
|
.input-performance-point {
|
|
display: flex;
|
|
flex-direction: row;
|
|
place-items: center;
|
|
gap: 0.5rem;
|
|
}
|
|
.input-performance-point > p {
|
|
height: 1rem;
|
|
line-height: normal;
|
|
}
|
|
|
|
.input {
|
|
width: 90%;
|
|
min-height: 3rem;
|
|
flex-shrink: 0;
|
|
|
|
display: flex;
|
|
flex-direction: row;
|
|
justify-content: center;
|
|
gap: 0.5rem;
|
|
|
|
align-items: flex-end;
|
|
margin-bottom: 2rem;
|
|
}
|
|
|
|
.input-form {
|
|
width: 100%;
|
|
padding: 1rem;
|
|
min-height: 3rem;
|
|
max-height: 8rem;
|
|
|
|
background-color: var(--secondary-color);
|
|
color: var(--foreground-color);
|
|
border-radius: 10px;
|
|
border: none;
|
|
resize: none;
|
|
outline: none;
|
|
}
|
|
.mobile .input-form { /* prevent auto-zoom on touching prompt box */
|
|
font-size: 16px;
|
|
}
|
|
|
|
.input-button {
|
|
height: 3rem;
|
|
width: 4rem;
|
|
|
|
background-color: var(--primary-color);
|
|
color: var(--secondary-color);
|
|
border-radius: 10px;
|
|
padding: 0.5rem;
|
|
cursor: pointer;
|
|
}
|
|
.input-button:hover {
|
|
background-color: var(--secondary-color-transparent);
|
|
}
|
|
.input-button:disabled {
|
|
background-color: var(--secondary-color);
|
|
cursor: not-allowed;
|
|
}
|
|
|
|
/* wrap text */
|
|
p {
|
|
white-space: pre-wrap;
|
|
}
|
|
|
|
/* fonts */
|
|
.megrim-regular {
|
|
font-family: monospace;
|
|
font-weight: 400;
|
|
font-style: normal;
|
|
}
|
|
|
|
.monospace {
|
|
font-family: monospace;
|
|
}
|
|
|
|
.loading-bar {
|
|
display: flex;
|
|
flex-direction: row;
|
|
align-items: center;
|
|
gap: 0.5rem;
|
|
width: 100%;
|
|
min-height: 3rem;
|
|
margin-bottom: 2rem;
|
|
}
|
|
|
|
.loading-text {
|
|
color: var(--foreground-color);
|
|
font-size: 1rem;
|
|
white-space: nowrap;
|
|
}
|
|
|
|
#progress-percentage {
|
|
color: var(--foreground-color);
|
|
font-size: 1rem;
|
|
white-space: nowrap;
|
|
}
|
|
|
|
.progress-bar {
|
|
flex-grow: 1;
|
|
height: 0.5rem;
|
|
background-color: var(--secondary-color);
|
|
border-radius: 5px;
|
|
overflow: hidden;
|
|
position: relative;
|
|
}
|
|
|
|
.progress {
|
|
width: 0%;
|
|
height: 100%;
|
|
background-color: var(--primary-color);
|
|
transition: width 0.2s ease-in-out;
|
|
} |