• vivendi@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        1 day ago

        This is unironically a technique for catching LLM errors and also for speeding up generation.

        For example in speculative decoding or mixture of experts architectures these kind of setups are used.