• vivendi@programming.dev
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    1 day ago

    This is unironically a technique for catching LLM errors and also for speeding up generation.

    For example in speculative decoding or mixture of experts architectures these kind of setups are used.