* use scaled attn from Tensor * add a test for bert * linter * no more tokenizer * without loading weights * remove prints * tribute to linter lords * smaller input and less runs * small bert