Commit Graph

4618 Commits

Author SHA1 Message Date
geohotstan
874dfc556c update setitem tests to test for currently supported cases (#4334)
* tests, tests, tests

* one more test

* tests tests tests tests

* t e s t

* a few more
2024-05-05 11:59:13 -04:00
David Hou
c0a048c044 batchnorm d(var)/d(mean) = 0 (#4430)
* d(var)/d(mean) = 0

* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz
cb7289f9c9 remove clang program header (#4422)
* remove clang program header

* proper max

* bools are numbers

* fix compile enet
2024-05-04 08:38:01 -07:00
qazal
267bbb57f9 Revert "Add insert_before to Linearizer Functions (#4320)" (#4421)
This reverts commit 664b563c91.
2024-05-04 17:50:21 +03:00
qazal
5f3bae378f search children in fusion (#4322)
* scheduler diff

* tests diff

* new changes

* realizes

* chores

* assign

* kind of r3

* forced_realize wont do it

* with forced_realize

* start with children

* test search

* r3 with parents

* diff cleanup

* add children

* crossing assign

* late fuse descendants

* update kernel counts

* assign diff doesnt belong here
2024-05-04 17:22:15 +03:00
qazal
249cadd106 fusing crossing diamond assign (#4403)
* refactor scheduler parents search

* assign target

* unit test

* can't chase this
2024-05-04 15:19:48 +03:00
George Hotz
9fc4465557 subbuffer support (#4397)
* subbuffer support

* diskbuffer offset

* cuda subbuffer works

* use subbuffer

* more subbuffer tests

* consecutive

* cast

* consec

* offset

* view is a better name

* offset is in nbytes

* fix view + memory planner

* delete unused DiskRunner

* reverse order

* no subbuffers on unrealized consts

* only enabled for disk

* don't reverse memory

* view supported devices

* pickle buffer view

* ring jit

* support extra view inputs in jit

* fix JIT=2 issue

* test copy jit

* p2p isn't an option anymore

* fix dep tracking issue

* fix mypy

* fix pickle

* from_nv is contents now
2024-05-03 18:05:57 -07:00
qazal
3401734e54 infra for scheduler process replay (#4405)
* use getenv

* capture ast

* fix graph

* replay schedules

* exec
2024-05-03 20:29:13 +03:00
George-the-1st
0627e26140 Added missing unittest execution code (#4400)
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
qazal
0deaaf2bc8 partial fusion spec (#4398) 2024-05-03 04:14:23 +03:00
Francis Lam
5c5b40880f search: fix edge cases on screening potential ops (#4394)
* search: fix edge cases on screening potential ops

won't change correctness, but will save a little python time by
properly deduplicating potential actions

* check for de-duplication instead of exact valid actions

* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz
2786dff26d new disk tensor tests (#4393) 2024-05-02 08:54:44 -07:00
George Hotz
c8a2047377 testing for all reduce (#4387) 2024-05-02 06:34:10 -07:00
qazal
0b47818e0f simpler reduceop children chasing (#4350)
* simplest case

* midreduce case

* all tests

* pending things

* unify tests
2024-05-02 15:15:30 +03:00
George Hotz
f635c4d273 fix define global (#4383)
* fix define global

* remove name from DEFINE_GLOBAL

* fix fuzzing

* fix ptx

* fix python
2024-05-01 22:32:56 -04:00
chenyu
826cccd54d fix mean underflow for half tensor (#4377)
* fix mean underflow for half tensor

divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var

* skip for python backend
2024-05-01 13:38:57 -04:00
chenyu
077ea6926c remove downcast_half in sum (#4376)
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
George Hotz
bd49d2854a hotfix: skip fetch tests always 2024-05-01 08:43:26 -07:00
qazal
ea06f657df fusion tests from test_opt (#4357)
* opt tests

* more sgd

* batchnorm

* models stay in external
2024-05-01 16:44:12 +03:00
George Hotz
27ee49bf30 tensor variable (#4362)
* tensor variable support

* consttype without variable?

* __setitem__

* symbolic mean works

* arange test

* more tests

* a few more tests
2024-04-30 14:08:57 -07:00
Francis Lam
0d33c54d99 kernel: change PADTO check to allow up to 4x padding (#4354)
* kernel: change PADTO check to allow up to 4x padding

also optionally remove PADTO from the search action space with
BEAM_PADTO=0.

* fix test_linearizer test_tensor_cores_padded tests

* update resnet runs to use SPLIT_REDUCEOP=1

* fix up search TC axis and amt checking

* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Elias Wahl
babe87a8ae BERT: Checkpoint loading tests (#4359)
* Move checkpoint init to helpers. Add test

* linters

* Move the steps outside of the main train loop

* Move data_get

* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
c12bcabb07 search: fix actions space checks to ignore TC axis and amt (#4360)
* search: fix actions space checks to ignore TC axis and amt

* add test for number of actions in get_linearizer_actions
2024-04-30 14:02:22 -04:00
George Hotz
d325be2540 update docs (#4356)
* update docs

* nn.md

* mnist cleanups

* rhip test is very slow
2024-04-30 16:51:42 +09:00
Francis Lam
a9a1fa6bbf wmma: add reduce axis choice to TC action space (#4328)
* wmma: add reduce axis choice to TC action space

* add test for TC multi-reduce axis choice
2024-04-29 19:15:39 -04:00
chenyu
93abcd3113 fix function.py sum backward without downcast_half (#4353)
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
Francis Lam
18c61ce077 test/fuzz_linearizer: add --atol/rtol and change half distribution (#4352) 2024-04-29 15:53:59 -04:00
Elias Wahl
27613dd881 MLPerf BERT: Main training loop (#4288)
* BERT language modeling head + trunc normal initializers

* add train loop + helpers

* shuffle in dataloaders + slight changes in main loop

* beam change

* Minor changes

* random.shuffle

* HParam update

* Use deque for dataloader

* wandb bert project name

* half fixes

* BENCHMARK + remove epoch

* cast + print()

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-29 14:35:27 -04:00
qazal
cc1797673e all fusion opportunities (#4348) 2024-04-29 19:32:23 +03:00
chenyu
f363f39e83 fix dtype of const folded sum (#4349)
const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype
2024-04-29 11:40:45 -04:00
qazal
774a9b0bca override assign_target in fuzz_schedule (#4342)
* store assign_targets

* cleanup

* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1 [MLPerf] UNet3D dataloader (#4343)
* add support for train/val datasets for kits19

* split dataset into train and val sets

* add tests for kits19 dataloader

* add MLPerf dataset tests to CI

* update unet3d model_eval script

* fix linting

* add nibabel

* fix how mock dataset gets created

* update ref implementation with permalink and no edits

* clean up test and update rand_flip implementation

* cleanups
2024-04-28 22:34:18 -04:00
chenyu
c1d8d425eb fix mean of half tensor if sum is greater than hlaf.max (#4327)
sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean
2024-04-28 18:04:54 -04:00
qazal
23445db2b9 no skipped tests in RHIP (#4337)
* delete skip

* delete split skip

* remu dev

* compiler fails here

* Revert "remu dev"

This reverts commit 28b933d4eb.
2024-04-28 12:23:05 -04:00
Obada Khalili
e4befa41d7 Fix in _reshape_mask (#4332)
* handle reshape with remainder in _reshape_mask

* remove trailing whitespce

* use helper_test_op to generate tensors from shapes

* test in shapetracket too

* remove whitespace

* revert property name in other class tests
2024-04-28 11:57:39 -04:00
Timmy
664b563c91 Add insert_before to Linearizer Functions (#4320)
* adding insert_before to linearizer functions

* uop insert_before test case

* formatting

* more formatting

* more formatting

* syntax

* removing self.cast

* addressing err

* removing noqa s
2024-04-28 11:38:36 -04:00
qazal
3372bea322 reduce children fusion tests (#4321)
* base tests

* real-world tests
2024-04-28 11:14:02 -04:00
chenyu
24a6342950 add mem/s to external_benchmark_resnet (#4309) 2024-04-26 20:07:17 -04:00
Szymon Ożóg
de832d26c6 disable bfloat16 from ptx tests (#4305) 2024-04-26 01:20:10 -04:00
Szymon Ożóg
f1ebcffb87 Ptx beam fix (#4296)
* Fix beam search for PTX

* fix ptr arm test
2024-04-25 15:39:39 -04:00
qazal
9a47ed0705 test crossing diamond assigns (#4298) 2024-04-25 21:52:05 +03:00
chenyu
5ae252ae83 use at least float32 for optim.lr (#4297)
* use at least float32 for optim.lr

when doing mixed precision training (float32 weight, default_float=half), still use float32 to store lr.
it would have been upcasted later in actual weight update, but would have lost precision.
this improved resnet convergence significantly

* undo type annotation
2024-04-25 14:42:28 -04:00
David Hou
6f792b727b More improvements for resnet layer bench (#4272)
* fix first layer size, new schedule stuff

* estimates

* get different conv layers

* \r for estimated times

* E501

* space after comma
2024-04-25 12:40:49 -04:00
qazal
74a1be88f5 test reduce graph permutations (#4291) 2024-04-25 11:34:44 +03:00
George Hotz
acb32e1766 hotfix: PM4 supports timing 2024-04-24 08:38:59 +00:00
George Hotz
ad28fdecb1 si.inputs+outputs -> bufs (#4279) 2024-04-24 15:12:34 +08:00
George Hotz
38f97aa0fe rename rawbufs to bufs in ExecItem (#4274) 2024-04-24 11:27:27 +08:00
geohotstan
17328ded7d setitem no return value (#4266)
* no ret value and just force contiguous

* ok revert contiguous stuff

* actually do force it contiguous

* revert again lol

* add simple regression test

* add assert for MLB

* guess we're contiguous everything from now on

* lol ugly af empty return...

* don't change order cuz i don't get disk
2024-04-23 16:28:14 -04:00
Elias Wahl
69341144ba Wikipedia preprocessing script (#4229)
* Preprocessing script

* short seq prob

* comments + env vars

* Add preprocessing reference. Add test

* lint fix + add eval test support

* whitespaces

* point to commit

* comment

* rename

* better comments
2024-04-23 10:28:01 -04:00
George Hotz
967638f0d5 update docs, remove corealize (#4264)
* update docs, remove corealize

* handle 0 line count

* tensor schedule
2024-04-23 12:05:29 +04:00