Vulkan lavapipe software rendering is working on Haiku

This is exciting news. I agree with you, it does look like stride settings, most likely when presenting to BView from Vulkan swap chain image. Keep in mind that the tiling format is implementation specific.

If you need a single vulkan based “Save Screenshot” function (from the swapchain image), have a look at https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp, the function saveScreenshot() on line 186. It can be added with almost no other dependancies. This function works if the swapchain image is created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag.

Here is a self contained snip from my Engine using the above code.

/* PROJECT: Yarra
COPYRIGHT: 2017-2020, Zen Yes Pty Ltd, Australia
AUTHORS: Zenja Solaja
DESCRIPTION: https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp
Take a screenshot from the current swapchain image
This is done using a blit from the swapchain image to a linear image whose memory content is then saved as a ppm image
Getting the image date directly from a swapchain image wouldn’t work as they’re usually stored in an implementation dependent optimal tiling format
Note: This requires the swapchain images to be created with the VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag (see VulkanSwapChain::create)
*/

#include
#include

#include “stb/stb_image_write.h”
#include “webp/webp/encode.h”
#include “Yarra/Render/RenderManager.h”

namespace yrender
{

static VkImageMemoryBarrier imageMemoryBarrier()
{
VkImageMemoryBarrier imageMemoryBarrier {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imageMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
imageMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
return imageMemoryBarrier;
}
static void insertImageMemoryBarrier(VkCommandBuffer cmdbuffer, VkImage image, VkAccessFlags srcAccessMask, VkAccessFlags dstAccessMask,
VkImageLayout oldImageLayout, VkImageLayout newImageLayout,
VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask,
VkImageSubresourceRange subresourceRange)
{
VkImageMemoryBarrier _imageMemoryBarrier = imageMemoryBarrier();
_imageMemoryBarrier.srcAccessMask = srcAccessMask;
_imageMemoryBarrier.dstAccessMask = dstAccessMask;
_imageMemoryBarrier.oldLayout = oldImageLayout;
_imageMemoryBarrier.newLayout = newImageLayout;
_imageMemoryBarrier.image = image;
_imageMemoryBarrier.subresourceRange = subresourceRange;

vkCmdPipelineBarrier(
	cmdbuffer,
	srcStageMask,
	dstStageMask,
	0,
	0, nullptr,
	0, nullptr,
	1, &_imageMemoryBarrier);

}

/* FUNCTION: RenderManager :: SaveScreenshot
ARGUMENTS: filename (no extension)
encode_format
RETURN: n/a
DESCRIPTION: Save screenshot
*/
void RenderManager :: SaveScreenshot(const char *filename, EncodeFormat encode_format)
{
#if VULKAN_ALLOW_SCREENSHOT
bool supportsBlit = true;

// Check blit support for source and destination

// Check if the device supports blitting from optimal images (the swapchain images are in optimal format)
vk::FormatProperties formatProps = fVulkanPhysicalDevice.getFormatProperties(fVulkanSwapChainFormat);
if (!(formatProps.optimalTilingFeatures & vk::FormatFeatureFlagBits::eBlitSrc))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting from optimal tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Check if the device supports blitting to linear images
formatProps = fVulkanPhysicalDevice.getFormatProperties(vk::Format::eR8G8B8A8Unorm);
if (!(formatProps.linearTilingFeatures & vk::FormatFeatureFlagBits::eBlitDst))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting to linear tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Source for the copy is the last rendered swapchain image
vk::Image srcImage = fVulkanSwapChainImages[fCurrentFrameIndex];

// Create the linear tiled destination image to copy to and to read the memory from
vk::ImageCreateInfo imageCreateCI;
imageCreateCI.imageType = vk::ImageType::e2D;
// Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
imageCreateCI.format = vk::Format::eR8G8B8A8Unorm;
imageCreateCI.extent.width = fVulkanSwapChainExtent.width;
imageCreateCI.extent.height = fVulkanSwapChainExtent.height;
imageCreateCI.extent.depth = 1;
imageCreateCI.arrayLayers = 1;
imageCreateCI.mipLevels = 1;
imageCreateCI.initialLayout = vk::ImageLayout::eUndefined;
imageCreateCI.samples = vk::SampleCountFlagBits::e1;
imageCreateCI.tiling = vk::ImageTiling::eLinear;
imageCreateCI.usage = vk::ImageUsageFlagBits::eTransferDst;
// Create the image
vk::Image dstImage = fVulkanDevice.createImage(imageCreateCI, nullptr);
// Create memory to back up the image
vk::MemoryRequirements memRequirements = fVulkanDevice.getImageMemoryRequirements(dstImage);
vk::MemoryAllocateInfo memAllocInfo;
memAllocInfo.allocationSize = memRequirements.size;
// Memory must be host visible to copy from
memAllocInfo.memoryTypeIndex = FindMemoryType(memRequirements.memoryTypeBits, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
vk::DeviceMemory dstImageMemory = fVulkanDevice.allocateMemory(memAllocInfo, nullptr);
fVulkanDevice.bindImageMemory(dstImage, dstImageMemory, 0);

// Do the actual blit from the swapchain image to our host visible destination image
//vk::CommandBuffer copyCmd = vulkanDevice->createCommandBuffer(VK_COMMAND_BUFFER_LEVEL_PRIMARY, true);
vk::CommandBufferAllocateInfo allocInfo;
allocInfo.level = vk::CommandBufferLevel::ePrimary;
allocInfo.commandPool = fVulkanCommandPool;
allocInfo.commandBufferCount = 1;

vk::CommandBuffer copyCmd;
vk::Result res = fVulkanDevice.allocateCommandBuffers(&allocInfo, &copyCmd);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
vk::CommandBufferBeginInfo beginInfo;
copyCmd.begin(beginInfo);

// Transition destination image to transfer destination layout
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		0,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_IMAGE_LAYOUT_UNDEFINED,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition swapchain image from present to transfer source layout
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// If source and destination support blit we'll blit as this also does automatic format conversion (e.g. from BGR to RGB)
if (supportsBlit)
{
	// Define the region to blit (we will blit the whole swapchain image)
	VkOffset3D blitSize;
	blitSize.x = fVulkanSwapChainExtent.width;
	blitSize.y = fVulkanSwapChainExtent.height;
	blitSize.z = 1;
	VkImageBlit imageBlitRegion{};
	imageBlitRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.srcSubresource.layerCount = 1;
	imageBlitRegion.srcOffsets[1] = blitSize;
	imageBlitRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.dstSubresource.layerCount = 1;
	imageBlitRegion.dstOffsets[1] = blitSize;

	// Issue the blit command
	vkCmdBlitImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageBlitRegion,
		VK_FILTER_NEAREST);
}
else
{
	// Otherwise use image copy (requires us to manually flip components)
	VkImageCopy imageCopyRegion{};
	imageCopyRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.srcSubresource.layerCount = 1;
	imageCopyRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.dstSubresource.layerCount = 1;
	imageCopyRegion.extent.width = fVulkanSwapChainExtent.width;
	imageCopyRegion.extent.height = fVulkanSwapChainExtent.height;
	imageCopyRegion.extent.depth = 1;

	// Issue the copy command
	vkCmdCopyImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageCopyRegion);
}

// Transition destination image to general layout, which is the required layout for mapping the image memory later on
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_IMAGE_LAYOUT_GENERAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition back the swap chain image after the blit is done
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

copyCmd.end();
vk::SubmitInfo submitInfo;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &copyCmd;
vk::FenceCreateInfo fenceInfo;
vk::Fence fence = fVulkanDevice.createFence(fenceInfo, nullptr);

res = fVulkanGraphicsQueue.submit(1, &submitInfo, nullptr);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
fVulkanGraphicsQueue.waitIdle();
fVulkanDevice.destroyFence(fence);
fVulkanDevice.freeCommandBuffers(fVulkanCommandPool, 1, &copyCmd);


// Get layout of the image (including row pitch)
VkImageSubresource subResource { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0 };
VkSubresourceLayout subResourceLayout = fVulkanDevice.getImageSubresourceLayout(dstImage, subResource);

// Map image memory so we can start copying from it
const char* data;
vkMapMemory(fVulkanDevice, dstImageMemory, 0, VK_WHOLE_SIZE, 0, (void**)&data);
data += subResourceLayout.offset;

std::string name(filename);

if (encode_format == EncodeFormat::ePng)
{
	name.append(".png");
	stbi_write_png(name.c_str(), fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, 4, data, (int) subResourceLayout.rowPitch);
}
else if (encode_format == EncodeFormat::eWebp)
{
	name.append(".webp");
	uint8_t *output;
	size_t sz = WebPEncodeRGBA((const uint8_t *)data, fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, (int) subResourceLayout.rowPitch, 100, &output);
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);
	file.write((char *)output, sz);
	file.close();
	WebPFree(output);

}
else
{
	name.append(".ppm");
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);

	// ppm header
	file << "P6\n" << fVulkanSwapChainExtent.width << "\n" << fVulkanSwapChainExtent.height << "\n" << 255 << "\n";

	// If source is BGR (destination is always RGB) and we can't use blit (which does automatic conversion), we'll have to manually swizzle color components
	bool colorSwizzle = false;
	// Check if source is BGR
	// Note: Not complete, only contains most common and basic BGR surface formats for demonstration purposes
	if (!supportsBlit)
	{
		std::vector<vk::Format> formatsBGR = { vk::Format::eB8G8R8A8Srgb, vk::Format::eB8G8R8A8Unorm, vk::Format::eB8G8R8A8Snorm};
		colorSwizzle = (std::find(formatsBGR.begin(), formatsBGR.end(), fVulkanSwapChainFormat) != formatsBGR.end());
	}
	// ppm binary pixel data
	for (uint32_t y = 0; y < fVulkanSwapChainExtent.height; y++)
	{
		unsigned int *row = (unsigned int*)data;
		for (uint32_t x = 0; x < fVulkanSwapChainExtent.width; x++)
		{
			if (colorSwizzle)
			{
				file.write((char*)row+2, 1);
				file.write((char*)row+1, 1);
				file.write((char*)row, 1);
			}
			else
			{
				file.write((char*)row, 3);
			}
			row++;
		}
		data += subResourceLayout.rowPitch;
	}
	file.close();
}

std::cout << "Screenshot saved to disk" << std::endl;

// Clean up resources
vkUnmapMemory(fVulkanDevice, dstImageMemory);
vkFreeMemory(fVulkanDevice, dstImageMemory, nullptr);
vkDestroyImage(fVulkanDevice, dstImage, nullptr);

#endif
}

}; // namespace yrender

8 Likes

I love watching this thread. Seeing this progress a piece at a time is so cool.

7 Likes

Same. Just like the RiscV64 thread I’m coming here just to see if there is an update. Happy to see it’s getting closer to hardware accelerated support :slight_smile:

Actually, this is gears. I can see blue gear connected with red gear, which is connected with green gear. The only problem is wrong resolution on vertical / horizontal axes. Something like rasterization assumes 640 pixels width but is drawn on 1920 pixels (so, 3+ rows into one) then leaving 3+ rows blank (black).

4 Likes

Don’t forgot about donating to x512: PayPal.Me

6 Likes

Correct output is produced when using saveScreenshot() function from screenshot sample. It currenly produce to file, not window. FPS is low because WSI module is doing copy from VRAM to CPU memory by memcpy() that is terribly slow, DMA engine should be used.

screenshot30

Posted by Falkon and QtWebEngine.

27 Likes

Hi, Glad to see you post again :wink:
Please checkout the engine’s stride restrictions on the card, and compare that to the mode stride being set by the haiku driver: it might well be that the haiku driver needs to be updated to take the engine restrictions into account (by adding / modifying the slopspace).

As a intermediate test you can try if another horizontal resolution mode would work correctly. If it does, that’s a hint in this direction.

Congrats on this very important step!

EDIT: also a higher colordepth can help: on some engines the stride granularity is not in pixels, but in bytes…

8 Likes

Graphics card is currently working only for acceleration, display output is not used. Display is connected to embedded Intel GPU. Render output is copied to CPU memory and displayed as regular BBitmap. Vulkan API have ability to convert buffer to linear format and copy it to CPU memory.

2 Likes

Ah, OK. So the software copy to the screen is not the problem if all is right then. Still, the viewport might? But I’d assume that’s set by the same driver as does the 3d accel calcs… Anyhow, you need to copy line by line I’d guess to not be fooled by differences in the granularity at some point in the ‘line’.

Problem is already solved by using Vulkan API to copy video buffer to CPU memory instead of memcpy() that not takes tiling in account.

5 Likes

Still, a screenshot to file does not have slopspace per definition I’d say, while output on some desktop often does have this I think.

What does it mean? “Slopspace”.

slopspace is unused space at the right side of a screen. If your desktop has a resolution that’s not dividable by the engine granularity, such a desktop is created ‘oversized’ (horizontally) to make it possible for the engine, but also for CRTC’s to do their ‘burst’ like accesses.
The DAC output (or these days digital link) does send other timing than the desktop is configured in, so for the user it seems the requested resolution is set, while in fact, it is not.

Different parts of graphics cards have different restrictions:

  • CRTC often (horizontally): 8 pixels, but often more
  • 2D engine
  • 3d engine (could be same engine)
  • backend scaler
  • hardware colorspace converters

The trick is to find the largest granularity for each of the used components and use that to set the mode, otherwise things will fail. (for instance VLC did this wrong long time ago: I had to send a patch especially for nvidia gfx cards since these had a larger granularity than anything they saw before)

9 Likes

@x512: btw: the amount of slopspace set for a mode you can see by comparing the mode’s ‘BytesPerRow’ with the set ‘horizontal resolution * ceil(colordepth div 8)’.

2 Likes

I wish there was an other way than PayPal. I just don’t do PayPal.

1 Like

If you’re okay with donating to the project as a whole, Haiku Inc does accept Liberapay (an alternative to Pay Pal and Patreon), checks and bitcoin as alternative payment methods.
More information here: Please make a tax-deductible gift today. - Haiku, Inc.

1 Like

Wow, any idea about the patt to direct card outout for rendering ??? I see the first step for gpu compute is here .

Awesome progress

I have a regular donation going to Haiku as a whole. I just don’t use PayPal and that means donating directly to x512 for this specific work is… well… I would love to but just can’t. I do however really, apricate him taking on this task.

1 Like

Hmm, @X512, would you be willing to set up a Liberapay account for this case? https://liberapay.com/

That way the people who don’t use PayPal have an alternative platform they can donate via if they choose to directly support you.

More samples are working. Memory copy speed problem is not yet fixed, so ~11 FPS limit.

screenshot38

screenshot39

screenshot40

screenshot41

screenshot42

38 Likes