This function is used to clear gpu thread locals. This is mainly useful to counter the 'bug' where a rayon pool does not wait for its threads to exit, which creates sync problems between the cuda driver and the cpu thread thread_local