As a reminder, the same (closed-source) user-space components for OpenGL / OpenCL / Vulkan / CUDA are used regardless of the NVIDIA kernel driver option with their official driver stack.
CUDA hell remains. :(
AMD needs to get their ducks in a row. They already have the advantage of not being Nvidia
They already have the advantage of not being Nvidia
That’s just because they release worse products.
If AMD had Nvidia’s marketshare, they would be just as scummy as the business climate allows.
In fact, AMD piggybacks off of Nvidia’s scumbaggery to charge more for their GPUs rather than engage in an actual price war.
Who would’ve thunk that big, for profit, tech companies don’t care about us :T
It’s all by design.
it’s breaking down. Pytorch supports ROCm now.
ROCm is it’s own hell (unless they finally put some resources into it in the past couple years)
They put in the absolute minimum amount of resources for it.
It’s also littered with bugs as the ZLUDA project has noted
Yes, the CUDA is the only reason why I consider NVIDIA. I really hate this company but the AMD tech stack is really inferior.
I’ve heard this but don’t really understand it… At a high level, what makes cuda so much better?
So is CUDA good or bad?
I keep reading it’s hell, but the best. Apparently it’s the single one reason why Nvidia is so big with AI, but it sucks.
What is it?
Both.
The good: CUDA is required for maximum performance and compatibility with machine learning (ML) frameworks and applications. It is a legitimate reason to choose Nvidia, and if you have an Nvidia card you will want to make sure you have CUDA acceleration working for any compatible ML workloads.
The bad: Getting CUDA to actually install and run correctly is a giant pain in the ass for anything but the absolute most basic use case. You will likely need to maintain multiple framework versions, because new ones are not backwards-compatible. You’ll need to source custom versions of Python modules compiled against specific versions of CUDA, which opens a whole new circle of Dependency Hell. And you know how everyone and their dog publishes shit with Docker now? Yeah, have fun with that.
That said, AMD’s equivalent (ROCm) is just as bad, and AMD is lagging about a full generation behind Nvidia in terms of ML performance.
The easy way is to just use OpenCL. But that’s not going to give you the best performance, and it’s not going to be compatible with everything out there.
almost sounds like god doesn’t want us doing machine learning
I think this will change. Nvidia hired devs on Nouveau, NVK is coming along, etc
Last I checked, there is no evidence Nvidia has hired anyone to work on Nouveau.
Right, I’m well aware that that article is the reason why a bunch of people have been making the unsubstantiated claim that Nvidia has hired people to work on Nouveau.
Nvidia hired the former lead Nouveau maintainer and he contributed a bunch of patches a couple of months ago after they hired him. That was his first contribution since stepping down and I’m fairly certain it was his last because there’s no way Phoronix would miss the opportunity to milk this some more if they could. He had said when stepping down that he was open to contributing every once in a while, so this wasn’t very surprising either way. To be clear, it is not evidence that he or anyone else was hired by Nvidia to work on Nouveau. Otherwise, I’d like to ask what he’s been doing since, because that was over three months ago.
Well… it is an out-of-tree kernel driver that is made by the same company, and the userspace drivers are still proprietary.
This says NOTHING other than “wow NVIDIA can write good code (open source) that doesnt suck”?
I been using the open kernel driver with my Debian Workstation, it has worked better then the default driver by far with the Debian backport Kernel, I installed it using the Nvidia Cuda Repo.
Performance parity? Heck no, not until this bug with the GSP firmware is solved: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/538
How is it different. Wouldn’t just be the same software with source code available?
It’s not, they’re not open sourcing their driver. They’ve made an open source driver.
Is there a reason to reinvent the wheel?
Usually this is done for licensing reasons. They probably don’t want the old code caught up in the open license they’re shipping the new driver under.
My understanding is that the new open driver separates proprietary code into a black box binary blob that isn’t distributed under an open source license. I’m guessing that they’ve been very careful not to include anything they want to keep closed into the new open driver, whereas the old driver wasn’t written with this separation in mind.
I was wondering about what they were doing with their “secret sauce”, thanks for explaining.
Control, precedent, bean counter analysis etc. Pick your poison.
Some of it probably comes from other companies that are unable or unwilling to relicense it even if Nvidia wanted to
Anyone tried this beta version yet? Any idea how stable it is?