Vulkan lavapipe software rendering is working on Haiku

Absolutely amazing stuff. I agree with the overall sentiment that Vulkan is the only API that is necessary at the driver level. OpenGL and Direct3D can be used through wrappers.

Bundle Zink and DXVK with Haiku and the user won’t notice the difference.

5 Likes

Some progress report: I improved architecture by introducing reference counted GPU buffer class (BReference<BufferObject>) and GPU memory manager. Memory manager can allocate memory of 3 types: VRAM mappable to CPU, VRAM not mappable to CPU, CPU memory mapped to GPU (GTT). GTT buffer is based on Haiku area.

I implemented GART page table that maps CPU memory to GPU GTT memory range. I used Haiku Poke driver to get area pages physical address and write it to GART page table. So CPU mamory mapping is implemented completely in userland without special kernel driver. I tested that GPU DMA engine can write to GTT memory.

Also I implemented and tested indirect buffers (IB). Indirect buffers allows to execute commands on ring buffer without cpying it to ring buffer. Instead execute indirect buffer command is written to ring buffer with intirect buffer adress and size as parameter. Vulkan driver prepares and sends commands in indirect buffers.

I currently experience some instability problems: sometimes DMA write operations have no effect and sometimes DMA engine completely stops until reboot. I probably doing something wrong and miss some initialization code.

Linux driver have firmware files loaded during GPU initialization. I am not sure what that firmwares are doing exactly, I currently don’t use it.

GTT buffer test. bufAdr is Haiku area address mapped to GPU by GART and written by GPU DMA engine.

/dev/graphics/framebuffer
  signature: framebuffer.accelerant
/dev/graphics/intel_extreme_000200
  signature: intel_extreme.accelerant
/dev/graphics/radeon_hd_010000
  signature: radeon_hd.accelerant
  RADEON_GET_PRIVATE_DATA
gSharedInfo->frame_buffer_size: 0x40000
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 384
wptr: 384
disabling rings
init ring
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 0
wptr: 0
TestGart()
(1)
GartMap()
  0: 0x12e1b8000
  0x1000: 0x12e1ba000
  0x2000: 0x12e1bb000
  0x3000: 0x12e1bc000
  0x4000: 0x12e1bd000
  0x5000: 0x12e1be000
  0x6000: 0x12e1bf000
  0x7000: 0x12e1c0000
  0x8000: 0x12e1c1000
  0x9000: 0x12e1c2000
  0xa000: 0x12e1c3000
  0xb000: 0x12e1c4000
  0xc000: 0x12e1c5000
  0xd000: 0x12e1c6000
  0xe000: 0x12e1c7000
  0xf000: 0x12e1c8000
bufAdr: 0xa680c59000
buf->gpuPhysAdr: 0x80000000
(2)
(3)
(1) bufAdr[0]: -1
(1) bufAdr[1]: -1
(1) bufAdr[2]: -1
(1) bufAdr[3]: -1
(1) bufAdr[4]: -1
(1) bufAdr[5]: -1
(1) bufAdr[6]: -1
(1) bufAdr[7]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 192
wptr: 192
*fenceAdr: 1
fenceVal: 1
(2) bufAdr[0]: -1
(2) bufAdr[1]: -1
(2) bufAdr[2]: -1
(2) bufAdr[3]: -1
(2) bufAdr[4]: -1
(2) bufAdr[5]: -1
(2) bufAdr[6]: -1
(2) bufAdr[7]: -1
rptr: 192
wptr: 192
(1) bufAdr[8]: -1
(1) bufAdr[9]: -1
(1) bufAdr[10]: -1
(1) bufAdr[11]: -1
(1) bufAdr[12]: -1
(1) bufAdr[13]: -1
(1) bufAdr[14]: -1
(1) bufAdr[15]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 384
wptr: 384
*fenceAdr: 2
fenceVal: 2
(2) bufAdr[8]: -1
(2) bufAdr[9]: -1
(2) bufAdr[10]: -1
(2) bufAdr[11]: -1
(2) bufAdr[12]: -1
(2) bufAdr[13]: -1
(2) bufAdr[14]: -1
(2) bufAdr[15]: -1
rptr: 384
wptr: 384
(1) bufAdr[16]: -1
(1) bufAdr[17]: -1
(1) bufAdr[18]: -1
(1) bufAdr[19]: -1
(1) bufAdr[20]: -1
(1) bufAdr[21]: -1
(1) bufAdr[22]: -1
(1) bufAdr[23]: -1
(2) bufAdr[16]: 16
(2) bufAdr[17]: 17
(2) bufAdr[18]: 18
(2) bufAdr[19]: 19
(2) bufAdr[20]: 20
(2) bufAdr[21]: 21
(2) bufAdr[22]: 22
(2) bufAdr[23]: 23
rptr: 576
wptr: 576
(1) bufAdr[24]: -1
(1) bufAdr[25]: -1
(1) bufAdr[26]: -1
(1) bufAdr[27]: -1
(1) bufAdr[28]: -1
(1) bufAdr[29]: -1
(1) bufAdr[30]: -1
(1) bufAdr[31]: -1
(2) bufAdr[24]: 24
(2) bufAdr[25]: 25
(2) bufAdr[26]: 26
(2) bufAdr[27]: 27
(2) bufAdr[28]: 28
(2) bufAdr[29]: 29
(2) bufAdr[30]: 30
(2) bufAdr[31]: 31
rptr: 768
wptr: 768
31 Likes

Typically, the atombios headers are compiled into the driver, this pulls card information from gpu bios, which iirc, passes some binary info to the driver

Also my working knowledge is limited and 7years old, afaik fwiw

Probably best to pm john bridgman at phoronix.com

Also, well done

1 Like

I fixed DMA write instability, the reason was wrong ring buffer memory type, it should be GTT, not VRAM. VRAM was used because GTT memory was not implemented when ring buffer was implemented. Linux driver also use GTT memory for ring buffers. Now I can go next.

GTT memory is CPU memory mapped to GPU address space if someone don’t know yet. I use Haiku areas with locked pages and Poke driver to implement GTT memory for Radeon GPU. It is already working good enough.

28 Likes

That’s all I understand, but sounds good for me. So go ahead!
Nice to follow your progress.

3 Likes

Awesome, so what steps remain before some basic level of 3d accel ? Obviously the driver will be a wip in need of optimization etc, but just curious what the road ahead still holds ???

Also is there a git repo we can follow along with ?

I think the open threaded command submission model means that any app written using this can use available GPU resources. So the challenge is that apps must be re-written to be accelerated. Also, older, PCs won’t benefit. I’m guessing you need multiple CPUs on your GPU to get the acceleration.

Hardware rendering produce some output!

screenshot37

It should be gears. Probably something is wrong with buffer stride settings.

38 Likes

This is exciting news. I agree with you, it does look like stride settings, most likely when presenting to BView from Vulkan swap chain image. Keep in mind that the tiling format is implementation specific.

If you need a single vulkan based “Save Screenshot” function (from the swapchain image), have a look at https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp, the function saveScreenshot() on line 186. It can be added with almost no other dependancies. This function works if the swapchain image is created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag.

Here is a self contained snip from my Engine using the above code.

/* PROJECT: Yarra
COPYRIGHT: 2017-2020, Zen Yes Pty Ltd, Australia
AUTHORS: Zenja Solaja
DESCRIPTION: https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp
Take a screenshot from the current swapchain image
This is done using a blit from the swapchain image to a linear image whose memory content is then saved as a ppm image
Getting the image date directly from a swapchain image wouldn’t work as they’re usually stored in an implementation dependent optimal tiling format
Note: This requires the swapchain images to be created with the VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag (see VulkanSwapChain::create)
*/

#include
#include

#include “stb/stb_image_write.h”
#include “webp/webp/encode.h”
#include “Yarra/Render/RenderManager.h”

namespace yrender
{

static VkImageMemoryBarrier imageMemoryBarrier()
{
VkImageMemoryBarrier imageMemoryBarrier {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imageMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
imageMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
return imageMemoryBarrier;
}
static void insertImageMemoryBarrier(VkCommandBuffer cmdbuffer, VkImage image, VkAccessFlags srcAccessMask, VkAccessFlags dstAccessMask,
VkImageLayout oldImageLayout, VkImageLayout newImageLayout,
VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask,
VkImageSubresourceRange subresourceRange)
{
VkImageMemoryBarrier _imageMemoryBarrier = imageMemoryBarrier();
_imageMemoryBarrier.srcAccessMask = srcAccessMask;
_imageMemoryBarrier.dstAccessMask = dstAccessMask;
_imageMemoryBarrier.oldLayout = oldImageLayout;
_imageMemoryBarrier.newLayout = newImageLayout;
_imageMemoryBarrier.image = image;
_imageMemoryBarrier.subresourceRange = subresourceRange;

vkCmdPipelineBarrier(
	cmdbuffer,
	srcStageMask,
	dstStageMask,
	0,
	0, nullptr,
	0, nullptr,
	1, &_imageMemoryBarrier);

}

/* FUNCTION: RenderManager :: SaveScreenshot
ARGUMENTS: filename (no extension)
encode_format
RETURN: n/a
DESCRIPTION: Save screenshot
*/
void RenderManager :: SaveScreenshot(const char *filename, EncodeFormat encode_format)
{
#if VULKAN_ALLOW_SCREENSHOT
bool supportsBlit = true;

// Check blit support for source and destination

// Check if the device supports blitting from optimal images (the swapchain images are in optimal format)
vk::FormatProperties formatProps = fVulkanPhysicalDevice.getFormatProperties(fVulkanSwapChainFormat);
if (!(formatProps.optimalTilingFeatures & vk::FormatFeatureFlagBits::eBlitSrc))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting from optimal tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Check if the device supports blitting to linear images
formatProps = fVulkanPhysicalDevice.getFormatProperties(vk::Format::eR8G8B8A8Unorm);
if (!(formatProps.linearTilingFeatures & vk::FormatFeatureFlagBits::eBlitDst))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting to linear tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Source for the copy is the last rendered swapchain image
vk::Image srcImage = fVulkanSwapChainImages[fCurrentFrameIndex];

// Create the linear tiled destination image to copy to and to read the memory from
vk::ImageCreateInfo imageCreateCI;
imageCreateCI.imageType = vk::ImageType::e2D;
// Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
imageCreateCI.format = vk::Format::eR8G8B8A8Unorm;
imageCreateCI.extent.width = fVulkanSwapChainExtent.width;
imageCreateCI.extent.height = fVulkanSwapChainExtent.height;
imageCreateCI.extent.depth = 1;
imageCreateCI.arrayLayers = 1;
imageCreateCI.mipLevels = 1;
imageCreateCI.initialLayout = vk::ImageLayout::eUndefined;
imageCreateCI.samples = vk::SampleCountFlagBits::e1;
imageCreateCI.tiling = vk::ImageTiling::eLinear;
imageCreateCI.usage = vk::ImageUsageFlagBits::eTransferDst;
// Create the image
vk::Image dstImage = fVulkanDevice.createImage(imageCreateCI, nullptr);
// Create memory to back up the image
vk::MemoryRequirements memRequirements = fVulkanDevice.getImageMemoryRequirements(dstImage);
vk::MemoryAllocateInfo memAllocInfo;
memAllocInfo.allocationSize = memRequirements.size;
// Memory must be host visible to copy from
memAllocInfo.memoryTypeIndex = FindMemoryType(memRequirements.memoryTypeBits, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
vk::DeviceMemory dstImageMemory = fVulkanDevice.allocateMemory(memAllocInfo, nullptr);
fVulkanDevice.bindImageMemory(dstImage, dstImageMemory, 0);

// Do the actual blit from the swapchain image to our host visible destination image
//vk::CommandBuffer copyCmd = vulkanDevice->createCommandBuffer(VK_COMMAND_BUFFER_LEVEL_PRIMARY, true);
vk::CommandBufferAllocateInfo allocInfo;
allocInfo.level = vk::CommandBufferLevel::ePrimary;
allocInfo.commandPool = fVulkanCommandPool;
allocInfo.commandBufferCount = 1;

vk::CommandBuffer copyCmd;
vk::Result res = fVulkanDevice.allocateCommandBuffers(&allocInfo, &copyCmd);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
vk::CommandBufferBeginInfo beginInfo;
copyCmd.begin(beginInfo);

// Transition destination image to transfer destination layout
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		0,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_IMAGE_LAYOUT_UNDEFINED,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition swapchain image from present to transfer source layout
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// If source and destination support blit we'll blit as this also does automatic format conversion (e.g. from BGR to RGB)
if (supportsBlit)
{
	// Define the region to blit (we will blit the whole swapchain image)
	VkOffset3D blitSize;
	blitSize.x = fVulkanSwapChainExtent.width;
	blitSize.y = fVulkanSwapChainExtent.height;
	blitSize.z = 1;
	VkImageBlit imageBlitRegion{};
	imageBlitRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.srcSubresource.layerCount = 1;
	imageBlitRegion.srcOffsets[1] = blitSize;
	imageBlitRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.dstSubresource.layerCount = 1;
	imageBlitRegion.dstOffsets[1] = blitSize;

	// Issue the blit command
	vkCmdBlitImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageBlitRegion,
		VK_FILTER_NEAREST);
}
else
{
	// Otherwise use image copy (requires us to manually flip components)
	VkImageCopy imageCopyRegion{};
	imageCopyRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.srcSubresource.layerCount = 1;
	imageCopyRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.dstSubresource.layerCount = 1;
	imageCopyRegion.extent.width = fVulkanSwapChainExtent.width;
	imageCopyRegion.extent.height = fVulkanSwapChainExtent.height;
	imageCopyRegion.extent.depth = 1;

	// Issue the copy command
	vkCmdCopyImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageCopyRegion);
}

// Transition destination image to general layout, which is the required layout for mapping the image memory later on
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_IMAGE_LAYOUT_GENERAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition back the swap chain image after the blit is done
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

copyCmd.end();
vk::SubmitInfo submitInfo;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &copyCmd;
vk::FenceCreateInfo fenceInfo;
vk::Fence fence = fVulkanDevice.createFence(fenceInfo, nullptr);

res = fVulkanGraphicsQueue.submit(1, &submitInfo, nullptr);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
fVulkanGraphicsQueue.waitIdle();
fVulkanDevice.destroyFence(fence);
fVulkanDevice.freeCommandBuffers(fVulkanCommandPool, 1, &copyCmd);


// Get layout of the image (including row pitch)
VkImageSubresource subResource { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0 };
VkSubresourceLayout subResourceLayout = fVulkanDevice.getImageSubresourceLayout(dstImage, subResource);

// Map image memory so we can start copying from it
const char* data;
vkMapMemory(fVulkanDevice, dstImageMemory, 0, VK_WHOLE_SIZE, 0, (void**)&data);
data += subResourceLayout.offset;

std::string name(filename);

if (encode_format == EncodeFormat::ePng)
{
	name.append(".png");
	stbi_write_png(name.c_str(), fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, 4, data, (int) subResourceLayout.rowPitch);
}
else if (encode_format == EncodeFormat::eWebp)
{
	name.append(".webp");
	uint8_t *output;
	size_t sz = WebPEncodeRGBA((const uint8_t *)data, fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, (int) subResourceLayout.rowPitch, 100, &output);
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);
	file.write((char *)output, sz);
	file.close();
	WebPFree(output);

}
else
{
	name.append(".ppm");
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);

	// ppm header
	file << "P6\n" << fVulkanSwapChainExtent.width << "\n" << fVulkanSwapChainExtent.height << "\n" << 255 << "\n";

	// If source is BGR (destination is always RGB) and we can't use blit (which does automatic conversion), we'll have to manually swizzle color components
	bool colorSwizzle = false;
	// Check if source is BGR
	// Note: Not complete, only contains most common and basic BGR surface formats for demonstration purposes
	if (!supportsBlit)
	{
		std::vector<vk::Format> formatsBGR = { vk::Format::eB8G8R8A8Srgb, vk::Format::eB8G8R8A8Unorm, vk::Format::eB8G8R8A8Snorm};
		colorSwizzle = (std::find(formatsBGR.begin(), formatsBGR.end(), fVulkanSwapChainFormat) != formatsBGR.end());
	}
	// ppm binary pixel data
	for (uint32_t y = 0; y < fVulkanSwapChainExtent.height; y++)
	{
		unsigned int *row = (unsigned int*)data;
		for (uint32_t x = 0; x < fVulkanSwapChainExtent.width; x++)
		{
			if (colorSwizzle)
			{
				file.write((char*)row+2, 1);
				file.write((char*)row+1, 1);
				file.write((char*)row, 1);
			}
			else
			{
				file.write((char*)row, 3);
			}
			row++;
		}
		data += subResourceLayout.rowPitch;
	}
	file.close();
}

std::cout << "Screenshot saved to disk" << std::endl;

// Clean up resources
vkUnmapMemory(fVulkanDevice, dstImageMemory);
vkFreeMemory(fVulkanDevice, dstImageMemory, nullptr);
vkDestroyImage(fVulkanDevice, dstImage, nullptr);

#endif
}

}; // namespace yrender

8 Likes

I love watching this thread. Seeing this progress a piece at a time is so cool.

7 Likes

Same. Just like the RiscV64 thread I’m coming here just to see if there is an update. Happy to see it’s getting closer to hardware accelerated support :slight_smile:

Actually, this is gears. I can see blue gear connected with red gear, which is connected with green gear. The only problem is wrong resolution on vertical / horizontal axes. Something like rasterization assumes 640 pixels width but is drawn on 1920 pixels (so, 3+ rows into one) then leaving 3+ rows blank (black).

4 Likes

Don’t forgot about donating to x512: PayPal.Me

6 Likes

Correct output is produced when using saveScreenshot() function from screenshot sample. It currenly produce to file, not window. FPS is low because WSI module is doing copy from VRAM to CPU memory by memcpy() that is terribly slow, DMA engine should be used.

screenshot30

Posted by Falkon and QtWebEngine.

27 Likes

Hi, Glad to see you post again :wink:
Please checkout the engine’s stride restrictions on the card, and compare that to the mode stride being set by the haiku driver: it might well be that the haiku driver needs to be updated to take the engine restrictions into account (by adding / modifying the slopspace).

As a intermediate test you can try if another horizontal resolution mode would work correctly. If it does, that’s a hint in this direction.

Congrats on this very important step!

EDIT: also a higher colordepth can help: on some engines the stride granularity is not in pixels, but in bytes…

8 Likes

Graphics card is currently working only for acceleration, display output is not used. Display is connected to embedded Intel GPU. Render output is copied to CPU memory and displayed as regular BBitmap. Vulkan API have ability to convert buffer to linear format and copy it to CPU memory.

2 Likes

Ah, OK. So the software copy to the screen is not the problem if all is right then. Still, the viewport might? But I’d assume that’s set by the same driver as does the 3d accel calcs… Anyhow, you need to copy line by line I’d guess to not be fooled by differences in the granularity at some point in the ‘line’.

Problem is already solved by using Vulkan API to copy video buffer to CPU memory instead of memcpy() that not takes tiling in account.

5 Likes

Still, a screenshot to file does not have slopspace per definition I’d say, while output on some desktop often does have this I think.

What does it mean? “Slopspace”.