Vulkan lavapipe software rendering is working on Haiku

amonpaike · October 23, 2021, 3:05pm

we hope for the good

cosmogatokat · October 23, 2021, 6:42pm

can it be use with other old GPU i have a good old intel igp on my laptop with haiku, this gpu dont support vulkan officially but is very powerful with some emulators and games even yet.

Maybe if an old rpi gpu can handle extraoffically with vulkan, some of those old but powerful gpus(more than a rpi) can handle it too?.

amonpaike · October 23, 2021, 7:10pm

Actually, I can’t wait to see the first results of some of these potential good experiments.
Whatever it is, on any gpu that emerges.

X512 · October 24, 2021, 8:12pm

I made a class for DMA ring and run some test DMA ring code thar writes constants to GPU addresses. Next thing to investigate is GPU virtual memory and paging. GPU virtual memory is required for indirect buffer execution, that are produced by Vulkan driver.

uint64 testGpuAdr = lastGpuAdr;
uint32* testCpuAdr = (uint32*)(gSharedInfo->frame_buffer + testGpuAdr);
lastGpuAdr += 4*4;
uint64 fenceGpuAdr = lastGpuAdr;
uint32* fenceCpuAdr = (uint32*)(gSharedInfo->frame_buffer + fenceGpuAdr);
lastGpuAdr += 4;

*fenceCpuAdr = 0;
printf("*fenceCpuAdr: %#" B_PRIx32 "\n", *fenceCpuAdr);

testCpuAdr[0] = 0x10;
testCpuAdr[1] = 0x11;
testCpuAdr[2] = 0x12;
testCpuAdr[3] = 0x13;
printf("testCpuAdr[0]: %#" B_PRIx32 "\n", testCpuAdr[0]);
printf("testCpuAdr[1]: %#" B_PRIx32 "\n", testCpuAdr[1]);
printf("testCpuAdr[2]: %#" B_PRIx32 "\n", testCpuAdr[2]);
printf("testCpuAdr[3]: %#" B_PRIx32 "\n", testCpuAdr[3]);

uint32 dstVals[] = {0x20, 0x21, 0x22, 0x23};
GenDmaPacketWriteMultiple(ring, testGpuAdr + 0*4, dstVals, 4);
GenDmaPacketFence(ring, fenceGpuAdr, 100);
ring.Commit();

for (int attempt = 0; ; attempt++) {
	uint32 val = *fenceCpuAdr;
	printf("*fenceCpuAdr: %" B_PRIu32 "\n", val);
	if (val != 0) break;
	if (!(attempt < 100)) return B_ERROR;
}
ring.UpdateRptr();
printf("testCpuAdr[0]: %#" B_PRIx32 "\n", testCpuAdr[0]);
printf("testCpuAdr[1]: %#" B_PRIx32 "\n", testCpuAdr[1]);
printf("testCpuAdr[2]: %#" B_PRIx32 "\n", testCpuAdr[2]);
printf("testCpuAdr[3]: %#" B_PRIx32 "\n", testCpuAdr[3]);

Output:

*fenceCpuAdr: 0
testCpuAdr[0]: 0x10
testCpuAdr[1]: 0x11
testCpuAdr[2]: 0x12
testCpuAdr[3]: 0x13
*fenceCpuAdr: 0
*fenceCpuAdr: 0
*fenceCpuAdr: 100
testCpuAdr[0]: 0x20
testCpuAdr[1]: 0x21
testCpuAdr[2]: 0x22
testCpuAdr[3]: 0x23

SCollins · October 24, 2021, 8:45pm

Amazing, the GPU virtual memory, is that in the reserved main system ram ?? Or is that in the gpu vram ??

Mostly just curious on my part

X512 · October 24, 2021, 8:52pm

Radeon GPU has it own MMU unit so each process using 3D acceleration have its own GPU virtual address space. GPU use 2 level page translation table to map virtual addresses to GPU physical addresses.

On Southern Islands Radeon GPU physical memory layout is:

0…VRAM size: VRAM (video RAM)
VRAM size…GTT end: GTT (CPU memory mapped to GPU)

SCollins · October 24, 2021, 9:02pm

Got it, so ring buffer needs to manage the read writes to from sys ram mmu to gpu mmu with compiled shaders and drawing commands to be frd down gpu pipelines

You’ve made some incredible progress in a very short time frame.

I’m amazed and grateful

x68k · October 26, 2021, 9:04pm

Absolutely amazing stuff. I agree with the overall sentiment that Vulkan is the only API that is necessary at the driver level. OpenGL and Direct3D can be used through wrappers.

Bundle Zink and DXVK with Haiku and the user won’t notice the difference.

X512 · October 28, 2021, 4:31pm

Some progress report: I improved architecture by introducing reference counted GPU buffer class (BReference<BufferObject>) and GPU memory manager. Memory manager can allocate memory of 3 types: VRAM mappable to CPU, VRAM not mappable to CPU, CPU memory mapped to GPU (GTT). GTT buffer is based on Haiku area.

I implemented GART page table that maps CPU memory to GPU GTT memory range. I used Haiku Poke driver to get area pages physical address and write it to GART page table. So CPU mamory mapping is implemented completely in userland without special kernel driver. I tested that GPU DMA engine can write to GTT memory.

Also I implemented and tested indirect buffers (IB). Indirect buffers allows to execute commands on ring buffer without cpying it to ring buffer. Instead execute indirect buffer command is written to ring buffer with intirect buffer adress and size as parameter. Vulkan driver prepares and sends commands in indirect buffers.

I currently experience some instability problems: sometimes DMA write operations have no effect and sometimes DMA engine completely stops until reboot. I probably doing something wrong and miss some initialization code.

Linux driver have firmware files loaded during GPU initialization. I am not sure what that firmwares are doing exactly, I currently don’t use it.

GTT buffer test. bufAdr is Haiku area address mapped to GPU by GART and written by GPU DMA engine.

/dev/graphics/framebuffer
  signature: framebuffer.accelerant
/dev/graphics/intel_extreme_000200
  signature: intel_extreme.accelerant
/dev/graphics/radeon_hd_010000
  signature: radeon_hd.accelerant
  RADEON_GET_PRIVATE_DATA
gSharedInfo->frame_buffer_size: 0x40000
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 384
wptr: 384
disabling rings
init ring
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 0
wptr: 0
TestGart()
(1)
GartMap()
  0: 0x12e1b8000
  0x1000: 0x12e1ba000
  0x2000: 0x12e1bb000
  0x3000: 0x12e1bc000
  0x4000: 0x12e1bd000
  0x5000: 0x12e1be000
  0x6000: 0x12e1bf000
  0x7000: 0x12e1c0000
  0x8000: 0x12e1c1000
  0x9000: 0x12e1c2000
  0xa000: 0x12e1c3000
  0xb000: 0x12e1c4000
  0xc000: 0x12e1c5000
  0xd000: 0x12e1c6000
  0xe000: 0x12e1c7000
  0xf000: 0x12e1c8000
bufAdr: 0xa680c59000
buf->gpuPhysAdr: 0x80000000
(2)
(3)
(1) bufAdr[0]: -1
(1) bufAdr[1]: -1
(1) bufAdr[2]: -1
(1) bufAdr[3]: -1
(1) bufAdr[4]: -1
(1) bufAdr[5]: -1
(1) bufAdr[6]: -1
(1) bufAdr[7]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 192
wptr: 192
*fenceAdr: 1
fenceVal: 1
(2) bufAdr[0]: -1
(2) bufAdr[1]: -1
(2) bufAdr[2]: -1
(2) bufAdr[3]: -1
(2) bufAdr[4]: -1
(2) bufAdr[5]: -1
(2) bufAdr[6]: -1
(2) bufAdr[7]: -1
rptr: 192
wptr: 192
(1) bufAdr[8]: -1
(1) bufAdr[9]: -1
(1) bufAdr[10]: -1
(1) bufAdr[11]: -1
(1) bufAdr[12]: -1
(1) bufAdr[13]: -1
(1) bufAdr[14]: -1
(1) bufAdr[15]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 384
wptr: 384
*fenceAdr: 2
fenceVal: 2
(2) bufAdr[8]: -1
(2) bufAdr[9]: -1
(2) bufAdr[10]: -1
(2) bufAdr[11]: -1
(2) bufAdr[12]: -1
(2) bufAdr[13]: -1
(2) bufAdr[14]: -1
(2) bufAdr[15]: -1
rptr: 384
wptr: 384
(1) bufAdr[16]: -1
(1) bufAdr[17]: -1
(1) bufAdr[18]: -1
(1) bufAdr[19]: -1
(1) bufAdr[20]: -1
(1) bufAdr[21]: -1
(1) bufAdr[22]: -1
(1) bufAdr[23]: -1
(2) bufAdr[16]: 16
(2) bufAdr[17]: 17
(2) bufAdr[18]: 18
(2) bufAdr[19]: 19
(2) bufAdr[20]: 20
(2) bufAdr[21]: 21
(2) bufAdr[22]: 22
(2) bufAdr[23]: 23
rptr: 576
wptr: 576
(1) bufAdr[24]: -1
(1) bufAdr[25]: -1
(1) bufAdr[26]: -1
(1) bufAdr[27]: -1
(1) bufAdr[28]: -1
(1) bufAdr[29]: -1
(1) bufAdr[30]: -1
(1) bufAdr[31]: -1
(2) bufAdr[24]: 24
(2) bufAdr[25]: 25
(2) bufAdr[26]: 26
(2) bufAdr[27]: 27
(2) bufAdr[28]: 28
(2) bufAdr[29]: 29
(2) bufAdr[30]: 30
(2) bufAdr[31]: 31
rptr: 768
wptr: 768

SCollins · October 28, 2021, 9:57pm

Typically, the atombios headers are compiled into the driver, this pulls card information from gpu bios, which iirc, passes some binary info to the driver

Also my working knowledge is limited and 7years old, afaik fwiw

Probably best to pm john bridgman at phoronix.com

Also, well done

X512 · November 5, 2021, 3:43pm

I fixed DMA write instability, the reason was wrong ring buffer memory type, it should be GTT, not VRAM. VRAM was used because GTT memory was not implemented when ring buffer was implemented. Linux driver also use GTT memory for ring buffers. Now I can go next.

GTT memory is CPU memory mapped to GPU address space if someone don’t know yet. I use Haiku areas with locked pages and Poke driver to implement GTT memory for Radeon GPU. It is already working good enough.

brunobastardi · November 5, 2021, 6:50pm

That’s all I understand, but sounds good for me. So go ahead!
Nice to follow your progress.

SCollins · November 7, 2021, 6:47pm

Awesome, so what steps remain before some basic level of 3d accel ? Obviously the driver will be a wip in need of optimization etc, but just curious what the road ahead still holds ???

Also is there a git repo we can follow along with ?

AndrewZ · November 15, 2021, 4:30pm

I think the open threaded command submission model means that any app written using this can use available GPU resources. So the challenge is that apps must be re-written to be accelerated. Also, older, PCs won’t benefit. I’m guessing you need multiple CPUs on your GPU to get the acceleration.

X512 · November 21, 2021, 10:39pm

Hardware rendering produce some output!

screenshot37

It should be gears. Probably something is wrong with buffer stride settings.

Zenja · November 21, 2021, 11:43pm

This is exciting news. I agree with you, it does look like stride settings, most likely when presenting to BView from Vulkan swap chain image. Keep in mind that the tiling format is implementation specific.

If you need a single vulkan based “Save Screenshot” function (from the swapchain image), have a look at https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp, the function saveScreenshot() on line 186. It can be added with almost no other dependancies. This function works if the swapchain image is created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag.

Here is a self contained snip from my Engine using the above code.

/* PROJECT: Yarra
COPYRIGHT: 2017-2020, Zen Yes Pty Ltd, Australia
AUTHORS: Zenja Solaja
DESCRIPTION: https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp
Take a screenshot from the current swapchain image
This is done using a blit from the swapchain image to a linear image whose memory content is then saved as a ppm image
Getting the image date directly from a swapchain image wouldn’t work as they’re usually stored in an implementation dependent optimal tiling format
Note: This requires the swapchain images to be created with the VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag (see VulkanSwapChain::create)
*/

#include
#include

#include “stb/stb_image_write.h”
#include “webp/webp/encode.h”
#include “Yarra/Render/RenderManager.h”

namespace yrender
{

static VkImageMemoryBarrier imageMemoryBarrier()
{
VkImageMemoryBarrier imageMemoryBarrier {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imageMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
imageMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
return imageMemoryBarrier;
}
static void insertImageMemoryBarrier(VkCommandBuffer cmdbuffer, VkImage image, VkAccessFlags srcAccessMask, VkAccessFlags dstAccessMask,
VkImageLayout oldImageLayout, VkImageLayout newImageLayout,
VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask,
VkImageSubresourceRange subresourceRange)
{
VkImageMemoryBarrier _imageMemoryBarrier = imageMemoryBarrier();
_imageMemoryBarrier.srcAccessMask = srcAccessMask;
_imageMemoryBarrier.dstAccessMask = dstAccessMask;
_imageMemoryBarrier.oldLayout = oldImageLayout;
_imageMemoryBarrier.newLayout = newImageLayout;
_imageMemoryBarrier.image = image;
_imageMemoryBarrier.subresourceRange = subresourceRange;

vkCmdPipelineBarrier(
	cmdbuffer,
	srcStageMask,
	dstStageMask,
	0,
	0, nullptr,
	0, nullptr,
	1, &_imageMemoryBarrier);

}

/* FUNCTION: RenderManager :: SaveScreenshot
ARGUMENTS: filename (no extension)
encode_format
RETURN: n/a
DESCRIPTION: Save screenshot
*/
void RenderManager :: SaveScreenshot(const char *filename, EncodeFormat encode_format)
{
#if VULKAN_ALLOW_SCREENSHOT
bool supportsBlit = true;

// Check blit support for source and destination

// Check if the device supports blitting from optimal images (the swapchain images are in optimal format)
vk::FormatProperties formatProps = fVulkanPhysicalDevice.getFormatProperties(fVulkanSwapChainFormat);
if (!(formatProps.optimalTilingFeatures & vk::FormatFeatureFlagBits::eBlitSrc))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting from optimal tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Check if the device supports blitting to linear images
formatProps = fVulkanPhysicalDevice.getFormatProperties(vk::Format::eR8G8B8A8Unorm);
if (!(formatProps.linearTilingFeatures & vk::FormatFeatureFlagBits::eBlitDst))
{
	yplatform::Debug("[SaveScreenshot] Device does not support blitting to linear tiled images, using copy instead of blit!\n");
	supportsBlit = false;
}

// Source for the copy is the last rendered swapchain image
vk::Image srcImage = fVulkanSwapChainImages[fCurrentFrameIndex];

// Create the linear tiled destination image to copy to and to read the memory from
vk::ImageCreateInfo imageCreateCI;
imageCreateCI.imageType = vk::ImageType::e2D;
// Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
imageCreateCI.format = vk::Format::eR8G8B8A8Unorm;
imageCreateCI.extent.width = fVulkanSwapChainExtent.width;
imageCreateCI.extent.height = fVulkanSwapChainExtent.height;
imageCreateCI.extent.depth = 1;
imageCreateCI.arrayLayers = 1;
imageCreateCI.mipLevels = 1;
imageCreateCI.initialLayout = vk::ImageLayout::eUndefined;
imageCreateCI.samples = vk::SampleCountFlagBits::e1;
imageCreateCI.tiling = vk::ImageTiling::eLinear;
imageCreateCI.usage = vk::ImageUsageFlagBits::eTransferDst;
// Create the image
vk::Image dstImage = fVulkanDevice.createImage(imageCreateCI, nullptr);
// Create memory to back up the image
vk::MemoryRequirements memRequirements = fVulkanDevice.getImageMemoryRequirements(dstImage);
vk::MemoryAllocateInfo memAllocInfo;
memAllocInfo.allocationSize = memRequirements.size;
// Memory must be host visible to copy from
memAllocInfo.memoryTypeIndex = FindMemoryType(memRequirements.memoryTypeBits, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
vk::DeviceMemory dstImageMemory = fVulkanDevice.allocateMemory(memAllocInfo, nullptr);
fVulkanDevice.bindImageMemory(dstImage, dstImageMemory, 0);

// Do the actual blit from the swapchain image to our host visible destination image
//vk::CommandBuffer copyCmd = vulkanDevice->createCommandBuffer(VK_COMMAND_BUFFER_LEVEL_PRIMARY, true);
vk::CommandBufferAllocateInfo allocInfo;
allocInfo.level = vk::CommandBufferLevel::ePrimary;
allocInfo.commandPool = fVulkanCommandPool;
allocInfo.commandBufferCount = 1;

vk::CommandBuffer copyCmd;
vk::Result res = fVulkanDevice.allocateCommandBuffers(&allocInfo, &copyCmd);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
vk::CommandBufferBeginInfo beginInfo;
copyCmd.begin(beginInfo);

// Transition destination image to transfer destination layout
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		0,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_IMAGE_LAYOUT_UNDEFINED,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition swapchain image from present to transfer source layout
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// If source and destination support blit we'll blit as this also does automatic format conversion (e.g. from BGR to RGB)
if (supportsBlit)
{
	// Define the region to blit (we will blit the whole swapchain image)
	VkOffset3D blitSize;
	blitSize.x = fVulkanSwapChainExtent.width;
	blitSize.y = fVulkanSwapChainExtent.height;
	blitSize.z = 1;
	VkImageBlit imageBlitRegion{};
	imageBlitRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.srcSubresource.layerCount = 1;
	imageBlitRegion.srcOffsets[1] = blitSize;
	imageBlitRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageBlitRegion.dstSubresource.layerCount = 1;
	imageBlitRegion.dstOffsets[1] = blitSize;

	// Issue the blit command
	vkCmdBlitImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageBlitRegion,
		VK_FILTER_NEAREST);
}
else
{
	// Otherwise use image copy (requires us to manually flip components)
	VkImageCopy imageCopyRegion{};
	imageCopyRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.srcSubresource.layerCount = 1;
	imageCopyRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
	imageCopyRegion.dstSubresource.layerCount = 1;
	imageCopyRegion.extent.width = fVulkanSwapChainExtent.width;
	imageCopyRegion.extent.height = fVulkanSwapChainExtent.height;
	imageCopyRegion.extent.depth = 1;

	// Issue the copy command
	vkCmdCopyImage(
		copyCmd,
		srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		1,
		&imageCopyRegion);
}

// Transition destination image to general layout, which is the required layout for mapping the image memory later on
insertImageMemoryBarrier(
		copyCmd,
		dstImage,
		VK_ACCESS_TRANSFER_WRITE_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
		VK_IMAGE_LAYOUT_GENERAL,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

// Transition back the swap chain image after the blit is done
insertImageMemoryBarrier(
		copyCmd,
		srcImage,
		VK_ACCESS_TRANSFER_READ_BIT,
		VK_ACCESS_MEMORY_READ_BIT,
		VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
		VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VK_PIPELINE_STAGE_TRANSFER_BIT,
		VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });

copyCmd.end();
vk::SubmitInfo submitInfo;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &copyCmd;
vk::FenceCreateInfo fenceInfo;
vk::Fence fence = fVulkanDevice.createFence(fenceInfo, nullptr);

res = fVulkanGraphicsQueue.submit(1, &submitInfo, nullptr);
if (res != vk::Result::eSuccess)
	throw std::runtime_error(vk::to_string(res));
fVulkanGraphicsQueue.waitIdle();
fVulkanDevice.destroyFence(fence);
fVulkanDevice.freeCommandBuffers(fVulkanCommandPool, 1, &copyCmd);


// Get layout of the image (including row pitch)
VkImageSubresource subResource { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0 };
VkSubresourceLayout subResourceLayout = fVulkanDevice.getImageSubresourceLayout(dstImage, subResource);

// Map image memory so we can start copying from it
const char* data;
vkMapMemory(fVulkanDevice, dstImageMemory, 0, VK_WHOLE_SIZE, 0, (void**)&data);
data += subResourceLayout.offset;

std::string name(filename);

if (encode_format == EncodeFormat::ePng)
{
	name.append(".png");
	stbi_write_png(name.c_str(), fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, 4, data, (int) subResourceLayout.rowPitch);
}
else if (encode_format == EncodeFormat::eWebp)
{
	name.append(".webp");
	uint8_t *output;
	size_t sz = WebPEncodeRGBA((const uint8_t *)data, fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, (int) subResourceLayout.rowPitch, 100, &output);
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);
	file.write((char *)output, sz);
	file.close();
	WebPFree(output);

}
else
{
	name.append(".ppm");
	std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);

	// ppm header
	file << "P6\n" << fVulkanSwapChainExtent.width << "\n" << fVulkanSwapChainExtent.height << "\n" << 255 << "\n";

	// If source is BGR (destination is always RGB) and we can't use blit (which does automatic conversion), we'll have to manually swizzle color components
	bool colorSwizzle = false;
	// Check if source is BGR
	// Note: Not complete, only contains most common and basic BGR surface formats for demonstration purposes
	if (!supportsBlit)
	{
		std::vector<vk::Format> formatsBGR = { vk::Format::eB8G8R8A8Srgb, vk::Format::eB8G8R8A8Unorm, vk::Format::eB8G8R8A8Snorm};
		colorSwizzle = (std::find(formatsBGR.begin(), formatsBGR.end(), fVulkanSwapChainFormat) != formatsBGR.end());
	}
	// ppm binary pixel data
	for (uint32_t y = 0; y < fVulkanSwapChainExtent.height; y++)
	{
		unsigned int *row = (unsigned int*)data;
		for (uint32_t x = 0; x < fVulkanSwapChainExtent.width; x++)
		{
			if (colorSwizzle)
			{
				file.write((char*)row+2, 1);
				file.write((char*)row+1, 1);
				file.write((char*)row, 1);
			}
			else
			{
				file.write((char*)row, 3);
			}
			row++;
		}
		data += subResourceLayout.rowPitch;
	}
	file.close();
}

std::cout << "Screenshot saved to disk" << std::endl;

// Clean up resources
vkUnmapMemory(fVulkanDevice, dstImageMemory);
vkFreeMemory(fVulkanDevice, dstImageMemory, nullptr);
vkDestroyImage(fVulkanDevice, dstImage, nullptr);

#endif
}

}; // namespace yrender

DFergFLA · November 21, 2021, 11:46pm

I love watching this thread. Seeing this progress a piece at a time is so cool.

leo75 · November 22, 2021, 6:18am

Same. Just like the RiscV64 thread I’m coming here just to see if there is an update. Happy to see it’s getting closer to hardware accelerated support

alpopa · November 22, 2021, 8:10am

Actually, this is gears. I can see blue gear connected with red gear, which is connected with green gear. The only problem is wrong resolution on vertical / horizontal axes. Something like rasterization assumes 640 pixels width but is drawn on 1920 pixels (so, 3+ rows into one) then leaving 3+ rows blank (black).

RISC · November 22, 2021, 9:16am

Don’t forgot about donating to x512: PayPal.Me