• 0 Posts
  • 136 Comments
Joined 2 years ago
cake
Cake day: June 19th, 2023

help-circle


  • Apple TV’s hardware is just so much more capable than other platforms that they’ve just been coasting along the last several generations of ”Apple TV 4K”. Our over 7 years old Gen 1 is still super capable and the only reason we picked up Gen 3 is so we can get the Thread radio in a centralized location. As an Apple user, I’m extremely glad there’s going to be a new competitor in the space, which will hopefully push Apple further along the innovation path.


  • Ask it for a second opinion on medical conditions.

    Sounds insane but they are leaps and bounds better than blindly Googling and self prescribe every condition there is under the sun when the symptoms only vaguely match.

    Once the LLM helps you narrow in on a couple of possible conditions based on the symptoms, then you can dig deeper into those specific ones, learn more about them, and have a slightly more informed conversation with your medical practitioner.

    They’re not a replacement for your actual doctor, but they can help you learn and have better discussions with your actual doctor.







  • Yep! Give granite a try. I think that would be perfect for this use case both in terms of able to answer your queries and doing them quickly, without a GPU by just using modern CPU. I was getting above 30 tokens per second on my 10th gen i5, which kind of blew my mind.

    Thinking models like r1 will be better at things like troubleshooting a faulty furnace, or user problems, so there’s benefits in pushing those envelopes. However, if all you need is to give basic instructions, have it infer your intent, and finally perform the desired tasks, then smaller mixture of experts models should be passable even without a GPU.



  • Depending on what you want to do with it, and what your expectations are; the smaller distilled versions could work on CPU, but most likely will need extra help on top, just like other similar sized models.

    This being a reasoning model, you might get a more well thought out results out of it, but at the end of the day, smaller parameter space (easiest to think as ‘less vocabulary’), smaller capabilities.

    If you just want something to very quickly chat back and forth with on a CPU, try IBM’s granite3.1-moe:3b, which is very fast even on a modern CPU, but doesn’t really excel in complex problems without additional support (ie: RAG or tool use).


  • 8B parameter tag is the distilled llama 3.1 model, which should be great for general writing. 7B is distilled qwen 2.5 math, and 14B is distilled qwen 2.5 (general purpose but good at coding). They have the entire table called out on their huggingface page, which is handy to know which one to use for specific purposes.

    The full model is 671B and unfortunately not going to work on most consumer hardwares, so it is still tethered to the cloud for most people.

    Also, it being a made in China model, there are some degree of censorship mandated. So depending on use case, this may be a point of consideration, too.

    Overall, it’s super cool to see something at this level to be generally available, especially with all the technical details out in the open. Hopefully we’ll see more models with this level of capability become available so there are even more choices and competition.