[ARTICLE] reverse-engineering AGX (Apple M1 GPU) drivers: the Impossible Bug

That’s because the machine learning community has been spending millions (if not billions) on Nvidia devices, like Summit. Don’t get too caught up on the ML capabilities of GPUs, they’re just fancy chips for matrix mathematics, and they’re not used for display as that would drastically increase the power consumption, even for small tasks; and it would make the ML components not available to the programs and researchers trying to use them.

It’s an interesting thought to use ML to increase performance, but there’s too many variables here to model performance for literally ever use-case. Second, any ML-assisted performance boost would by definition be worse than the GPU and other hardware chips on the device. ML is software and modeling with lots of math, which isn’t going to be faster than the billions of transistors in the chips in the GPUs.

This is an example of not understanding ML & AI. It’s just not possible, like AI/ML isn’t going to make terminators and take over the world. (But maybe get some robot insurance, just in case?)

1 Like

Well if it bothers you so much that I put meat on the fire by expanding the information, I’ll stop here, I certainly don’t want to upset your souls.
I have made it explicit in past comments that I disagree with Alissa’s ideas, that these are not strange bugs, and in my opinion, she has created workarounds to what apple used to “create performance improvements” … so you are happier if I exclude the word AI.
There is a fund of truth in what I said that you want to exclude a priori.
I followed Alyssa with her Panfrost and Lima work, I said, I didn’t immediately notice it when I read the article, then I saw her profile and I recognized who she was the author … why shouldn’t you believe me, I posted a link to a forum on armbian where we tested the LIMA driver for MALI gpu years ago.

I don’t comment anymore, you don’t need to get nervous.

linux driver retro-engineered by Alissa has nothing to do with AI, not apple’s custom gpu.

She explains in the article how she found the problem by reverse engineering the Apple driver to understand how they use the hardware. What did she find there? A perfectly logical explanation that does not involve a completely unrelated part of the hardware that no one understand how it could be used as you say.

At this point your are only embarrassing yourself by insisting on your theory despite everyone telling you that it makes no sense. If you want to prove us wrong, you need to do the same thing Alyssa did: reverse engineer the code from Apple and show us where they are using their neural network cores to optimize something. Good luck, because it’s very likely that they don’t.


@PulkoMandy only one reasoning, the 3d graphics acceleration driver on macOS is Metal right? just do me a favor watch the last video i posted on blender, and i think you will understand everything because i am convinced of what i say.

No, metal is an api. Not a driver.

There is a saying in german, if someone tells you you are a donkey ignore them. If everyone sais you are a donkey find a barn.


Qell, at least give me a little margin since neither English nor German is my first language, and I have a hard time making my intuitions explicit trying to avoid confusion.
Which is evident that I’m not too good at making myself understood.
It remains that I want to see the performance of the Alyssa driver.
At least let me do that, before accepting to be a monkey because many are saying it.

by the way, they are looking for driver engineers for their gpu, here are the requirements …


The Apple Silicon GPU Driver team needs engineers to design, develop, and support the Metal GPU drivers that power high-performance 3D rendering engines, neural networks and computational photography algorithms. We collaborate with industry specialists across Apple to design future generations of the Metal API, shading language, and Apple Silicon graphics processors. You will apply your knowledge of computer graphics and machine learning to implement the high performance software that drives Apple-designed GPUs. You will build expert-level Metal knowledge to guide Metal developers to tune their applications for maximum performance on Apple Silicon.

Well seems that ARM GPUs’ open drivers (troubles ?) are on the scene, this year !


The initial driver bug - this was covered by the GPU manufacturers around 20+ years ago. Most of it falls under GPU profiling versus AI or ML programming (in a general term). Other part falls under shader programming. A few books covered this topic as well (from 10-20+ years ago). Profiling tools are out there for doing this type of driver work from the GPU manufacturers. The writer didn’t dwell on AI-related usage through deep GPU profiling (or pre-existing tools) . So, maybe the confusion with other readers/devs started here???

Metal API - accesses other the functional areas (i.e. DirectX-like, ahem!) - but the original topic wasn’t focused on any ‘wading or deep diving’ of AI or neural engines within M1 (or Metal API usage of them).

Blender - OK… but this gets into the application development features moreso than driver development (or Metal API-specifics (well kinda (code-wise))).

Note: I used and evaluated the Mac Studio with Final Cut Pro, Blender, and a few other apps. :wink:

1 Like

I have been away from this tread for a few days, to dispose of the tension that was created.
It all started with my mistake from having too lightly published personal opinions that I have assimilated in recent years according to my background of knowledge, your comment was less aggressive because as you pointed out you have a little more clear about what impressive levels we are in terms of computational performance, which are perhaps much more evident in photorealistic three-dimensional graphics, sfx, video editing compositing and simulations …
Honestly I didn’t expect so much hostility, but I should have foreseen it a little, actually, here in this community even though there are pioneers that I admire, there is also, of course, a good dose of conservatism and stagnation, which obviously reflects a certain refusal to see the real technological advances achieved.
In my first comment, I posted that video where a demonstration was made that with the use of Ai algorithms that managed performance in some way (perhaps eliminating bottlenecks?) while maintaining the same performance, the energy needs, the necessary GHZ and even the necessary cores were reduced to a quarter. These techniques were pioneered well before Apple announced the release of the M1

I am convinced that nvidia is using similar techniques in its GPUs to achieve the high performance, especially those that have tensor cores.
This is the reason why Nvidia has decided to simplify things by releasing its opensource drivers, which have just been rewritten, it means that, as is obvious, the competition in basic performance, of Vulkan, OpenGL , video management in general is over, is superfluous and expensive, but the part concerning the management of these neural modules in relation to the GPU is kept secret and precious, putting all this part in the firmware.
Obviously I can’t prove all this, but I am convinced that in the next few years it will become clearer and clearer let’s say “my intuition”.

1 Like

The problem isn’t a refusal to see technological advances, but rather that the developers here know on an architectural level how gpus and their drivers work. The article talks about a problem, lays out the solution and had nothing to do whatsoever with what you brought up about ai cores.

Does apple use AI cores for some stuff? definetely.

Do they use it in any way related to the bug the article mentioned? definite no. Especially because the “bug” was in the open source driver that is beeing developed. Not in apples code or hardware.

1 Like

at the M2 announcement conference, with metalfx upscaling technology also for M1 I no longer have any doubt that they are using their neural modules to accelerate 3D graphics, it is the same type of technology that impressively accelerates modern rendering engines using the deep neural network (not only used by apple, also by nvidia, intel and by amd im not informed) , Obviously I’m referring Apple Metal and not linux drivers.
Boost performance with MetalFX Upscaling - WWDC22 - Videos - Apple Developer

Upscaling seems like a step backwards honestly, if you can’t render your scene at the native resolution maybe there is too much stuff in it. Pretending it is fine by upscaling it seems wierd to me.

Well, MetalFX feels like oneAPI to me…

…but Intel’s one is open source !

1 Like

I do not want to contradict what you think, but I have attached a video of the Apple addressed to the developers in the previous my comment, I think you will change your mind about what you think after you take a look at it, we are talking about upscaling managed by artificial intelligence.

Exactly started everything intel precisely with openimagedenoise a piece of openAPI …
Then Nvidia arrived with OPTIX denoiser but I’m not quite sure he came later.
Actually openimagedenoise should also be usable on blender 3 on HAIKU, since it uses the intel cpu (a little more slow, depend on cpu generation) …

So? Clearly the scene is bering done overly complex if the detail produced by a much lower resolution is “fine” AI doesn’t produce more details as such, if I ask you to redraw a blurred image you can probably reproduce an image that is not blurred and has the same “content”, but there are countless variants you could draw (essentially for a lower entropy image there is no way to know or define what a higher entropy version or higher resolution of the same image would look like), this is basically what AI upscaling entails. This is a huge step backwards for controlling the rendered output properly Imo.
Apple already used techniques like this for post processing their images, it may look “fine” but the AI simply can’t add more detail. I have some nice examples on my phone where images in a higher zoom have elements fused that make no sense in reality (for example, a branch fused with a guy standing behind it)

In essence the whole magic of “AI” and “neural engines” etc is this:
for f(x) = y with a set of known x and known y produce a function that produces “matching” y for unknown x

This is mathematically usefull but the hype around everything beeing better with AI makes no sense to me.

I have seen the WWDC keynote, yes. but this technology existed before apple used it for amd and nvidia gpus too. I still don’t think it makes sense.

It does make sense in certain use cases. For example in gaming if you go VR then your render time instantly doubles due to having to render two images at the resolution of a roughly your monitor. Rendering at lower resolution and AI-upscaling allows to hit the framerate target with little chance of somebody noticing it (especially since HMDs all have blur one way or the other).

The other use-case is allowing weaker systems to still enjoy a heavier scene. Here the user opts for slightly degraded image quality at the prospect of acceptable framerate.

Another use-case are these guys going 3-4 monitor gaming. Granted these guys are “extreme” but there you can use this technique too to reduce the load on the GPU(s). In the end it’s about choice: quality versus performance.


That’s a use case for upscaling, but is it really a use case for throwing artificial intelligence at it? Does that really help?

Yes, sure. Upscaling by setting the monitor to a lower resolution has already existed before. My point is that if you had done the scene with the constraints im mind you would get a better quality picture in the constrainst you have without upscaling. The whole “this improved your performance” part just doesn’t make sense to me.

If it helps can be debated. As mentioned above you have limited information which with you extrapolate. A simple upscaler like the one i use in my game engine is just a bilinear (box) filter. The higher resolution image is simply a blurrier version. This does not take any surrounding screen content into consideration and thus the same render pixel will give you the same 2x2 upscaled pixel each time it runs. The AI one is trained with tons of actual in-game renderings. What happens is that by taking surrounding pixels into account the obtained result has a higher likelihood to be closer to the full scale rendering. Of course this breaks down if the rendered game scene is not lining up with the training data. Still the chance for looking better than plain upscaling is higher.

Now this is the nVidia way (DLSS). AMD went a different way with FSF 2 (open sourced). It’s a temporal algorithm so you look back at what you have rendered moments ago and use this information to make a more educated guess what upscaled pixels fit better. In the end a similar approach. Use more information on screen (or back in time) to raise the chance of getting a better result.