we hope for the good
can it be use with other old GPU i have a good old intel igp on my laptop with haiku, this gpu dont support vulkan officially but is very powerful with some emulators and games even yet.
Maybe if an old rpi gpu can handle extraoffically with vulkan, some of those old but powerful gpus(more than a rpi) can handle it too?.
Actually, I canât wait to see the first results of some of these potential good experiments.
Whatever it is, on any gpu that emerges.
I made a class for DMA ring and run some test DMA ring code thar writes constants to GPU addresses. Next thing to investigate is GPU virtual memory and paging. GPU virtual memory is required for indirect buffer execution, that are produced by Vulkan driver.
uint64 testGpuAdr = lastGpuAdr;
uint32* testCpuAdr = (uint32*)(gSharedInfo->frame_buffer + testGpuAdr);
lastGpuAdr += 4*4;
uint64 fenceGpuAdr = lastGpuAdr;
uint32* fenceCpuAdr = (uint32*)(gSharedInfo->frame_buffer + fenceGpuAdr);
lastGpuAdr += 4;
*fenceCpuAdr = 0;
printf("*fenceCpuAdr: %#" B_PRIx32 "\n", *fenceCpuAdr);
testCpuAdr[0] = 0x10;
testCpuAdr[1] = 0x11;
testCpuAdr[2] = 0x12;
testCpuAdr[3] = 0x13;
printf("testCpuAdr[0]: %#" B_PRIx32 "\n", testCpuAdr[0]);
printf("testCpuAdr[1]: %#" B_PRIx32 "\n", testCpuAdr[1]);
printf("testCpuAdr[2]: %#" B_PRIx32 "\n", testCpuAdr[2]);
printf("testCpuAdr[3]: %#" B_PRIx32 "\n", testCpuAdr[3]);
uint32 dstVals[] = {0x20, 0x21, 0x22, 0x23};
GenDmaPacketWriteMultiple(ring, testGpuAdr + 0*4, dstVals, 4);
GenDmaPacketFence(ring, fenceGpuAdr, 100);
ring.Commit();
for (int attempt = 0; ; attempt++) {
uint32 val = *fenceCpuAdr;
printf("*fenceCpuAdr: %" B_PRIu32 "\n", val);
if (val != 0) break;
if (!(attempt < 100)) return B_ERROR;
}
ring.UpdateRptr();
printf("testCpuAdr[0]: %#" B_PRIx32 "\n", testCpuAdr[0]);
printf("testCpuAdr[1]: %#" B_PRIx32 "\n", testCpuAdr[1]);
printf("testCpuAdr[2]: %#" B_PRIx32 "\n", testCpuAdr[2]);
printf("testCpuAdr[3]: %#" B_PRIx32 "\n", testCpuAdr[3]);
Output:
*fenceCpuAdr: 0
testCpuAdr[0]: 0x10
testCpuAdr[1]: 0x11
testCpuAdr[2]: 0x12
testCpuAdr[3]: 0x13
*fenceCpuAdr: 0
*fenceCpuAdr: 0
*fenceCpuAdr: 100
testCpuAdr[0]: 0x20
testCpuAdr[1]: 0x21
testCpuAdr[2]: 0x22
testCpuAdr[3]: 0x23
Amazing, the GPU virtual memory, is that in the reserved main system ram ?? Or is that in the gpu vram ??
Mostly just curious on my part
Radeon GPU has it own MMU unit so each process using 3D acceleration have its own GPU virtual address space. GPU use 2 level page translation table to map virtual addresses to GPU physical addresses.
On Southern Islands Radeon GPU physical memory layout is:
- 0âŚVRAM size: VRAM (video RAM)
- VRAM sizeâŚGTT end: GTT (CPU memory mapped to GPU)
Got it, so ring buffer needs to manage the read writes to from sys ram mmu to gpu mmu with compiled shaders and drawing commands to be frd down gpu pipelines
Youâve made some incredible progress in a very short time frame.
Iâm amazed and grateful
Absolutely amazing stuff. I agree with the overall sentiment that Vulkan is the only API that is necessary at the driver level. OpenGL and Direct3D can be used through wrappers.
Bundle Zink and DXVK with Haiku and the user wonât notice the difference.
Some progress report: I improved architecture by introducing reference counted GPU buffer class (BReference<BufferObject>
) and GPU memory manager. Memory manager can allocate memory of 3 types: VRAM mappable to CPU, VRAM not mappable to CPU, CPU memory mapped to GPU (GTT). GTT buffer is based on Haiku area.
I implemented GART page table that maps CPU memory to GPU GTT memory range. I used Haiku Poke driver to get area pages physical address and write it to GART page table. So CPU mamory mapping is implemented completely in userland without special kernel driver. I tested that GPU DMA engine can write to GTT memory.
Also I implemented and tested indirect buffers (IB). Indirect buffers allows to execute commands on ring buffer without cpying it to ring buffer. Instead execute indirect buffer command is written to ring buffer with intirect buffer adress and size as parameter. Vulkan driver prepares and sends commands in indirect buffers.
I currently experience some instability problems: sometimes DMA write operations have no effect and sometimes DMA engine completely stops until reboot. I probably doing something wrong and miss some initialization code.
Linux driver have firmware files loaded during GPU initialization. I am not sure what that firmwares are doing exactly, I currently donât use it.
GTT buffer test. bufAdr
is Haiku area address mapped to GPU by GART and written by GPU DMA engine.
/dev/graphics/framebuffer
signature: framebuffer.accelerant
/dev/graphics/intel_extreme_000200
signature: intel_extreme.accelerant
/dev/graphics/radeon_hd_010000
signature: radeon_hd.accelerant
RADEON_GET_PRIVATE_DATA
gSharedInfo->frame_buffer_size: 0x40000
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 384
wptr: 384
disabling rings
init ring
regs[DMA_CNTL]: 0x8210400
regs[DMA_RB_CNTL]: 0x1015
regs[DMA_IB_CNTL]: 0x1
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
regs[DMA_RB_RPTR_ADDR]: 0
rptr: 0
wptr: 0
TestGart()
(1)
GartMap()
0: 0x12e1b8000
0x1000: 0x12e1ba000
0x2000: 0x12e1bb000
0x3000: 0x12e1bc000
0x4000: 0x12e1bd000
0x5000: 0x12e1be000
0x6000: 0x12e1bf000
0x7000: 0x12e1c0000
0x8000: 0x12e1c1000
0x9000: 0x12e1c2000
0xa000: 0x12e1c3000
0xb000: 0x12e1c4000
0xc000: 0x12e1c5000
0xd000: 0x12e1c6000
0xe000: 0x12e1c7000
0xf000: 0x12e1c8000
bufAdr: 0xa680c59000
buf->gpuPhysAdr: 0x80000000
(2)
(3)
(1) bufAdr[0]: -1
(1) bufAdr[1]: -1
(1) bufAdr[2]: -1
(1) bufAdr[3]: -1
(1) bufAdr[4]: -1
(1) bufAdr[5]: -1
(1) bufAdr[6]: -1
(1) bufAdr[7]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 192
wptr: 192
*fenceAdr: 1
fenceVal: 1
(2) bufAdr[0]: -1
(2) bufAdr[1]: -1
(2) bufAdr[2]: -1
(2) bufAdr[3]: -1
(2) bufAdr[4]: -1
(2) bufAdr[5]: -1
(2) bufAdr[6]: -1
(2) bufAdr[7]: -1
rptr: 192
wptr: 192
(1) bufAdr[8]: -1
(1) bufAdr[9]: -1
(1) bufAdr[10]: -1
(1) bufAdr[11]: -1
(1) bufAdr[12]: -1
(1) bufAdr[13]: -1
(1) bufAdr[14]: -1
(1) bufAdr[15]: -1
[!] attempts expired when waiting fence
regs[DMA_STATUS_REG]: 0x44c83d57({0, 1, 2, 4, 6, 8, 10, 11, 12, 13, 19, 22, 23, 26, 30})
rptr: 384
wptr: 384
*fenceAdr: 2
fenceVal: 2
(2) bufAdr[8]: -1
(2) bufAdr[9]: -1
(2) bufAdr[10]: -1
(2) bufAdr[11]: -1
(2) bufAdr[12]: -1
(2) bufAdr[13]: -1
(2) bufAdr[14]: -1
(2) bufAdr[15]: -1
rptr: 384
wptr: 384
(1) bufAdr[16]: -1
(1) bufAdr[17]: -1
(1) bufAdr[18]: -1
(1) bufAdr[19]: -1
(1) bufAdr[20]: -1
(1) bufAdr[21]: -1
(1) bufAdr[22]: -1
(1) bufAdr[23]: -1
(2) bufAdr[16]: 16
(2) bufAdr[17]: 17
(2) bufAdr[18]: 18
(2) bufAdr[19]: 19
(2) bufAdr[20]: 20
(2) bufAdr[21]: 21
(2) bufAdr[22]: 22
(2) bufAdr[23]: 23
rptr: 576
wptr: 576
(1) bufAdr[24]: -1
(1) bufAdr[25]: -1
(1) bufAdr[26]: -1
(1) bufAdr[27]: -1
(1) bufAdr[28]: -1
(1) bufAdr[29]: -1
(1) bufAdr[30]: -1
(1) bufAdr[31]: -1
(2) bufAdr[24]: 24
(2) bufAdr[25]: 25
(2) bufAdr[26]: 26
(2) bufAdr[27]: 27
(2) bufAdr[28]: 28
(2) bufAdr[29]: 29
(2) bufAdr[30]: 30
(2) bufAdr[31]: 31
rptr: 768
wptr: 768
Typically, the atombios headers are compiled into the driver, this pulls card information from gpu bios, which iirc, passes some binary info to the driver
Also my working knowledge is limited and 7years old, afaik fwiw
Probably best to pm john bridgman at phoronix.com
Also, well done
I fixed DMA write instability, the reason was wrong ring buffer memory type, it should be GTT, not VRAM. VRAM was used because GTT memory was not implemented when ring buffer was implemented. Linux driver also use GTT memory for ring buffers. Now I can go next.
GTT memory is CPU memory mapped to GPU address space if someone donât know yet. I use Haiku areas with locked pages and Poke driver to implement GTT memory for Radeon GPU. It is already working good enough.
Thatâs all I understand, but sounds good for me. So go ahead!
Nice to follow your progress.
Awesome, so what steps remain before some basic level of 3d accel ? Obviously the driver will be a wip in need of optimization etc, but just curious what the road ahead still holds ???
Also is there a git repo we can follow along with ?
I think the open threaded command submission model means that any app written using this can use available GPU resources. So the challenge is that apps must be re-written to be accelerated. Also, older, PCs wonât benefit. Iâm guessing you need multiple CPUs on your GPU to get the acceleration.
Hardware rendering produce some output!
It should be gears. Probably something is wrong with buffer stride settings.
This is exciting news. I agree with you, it does look like stride settings, most likely when presenting to BView from Vulkan swap chain image. Keep in mind that the tiling format is implementation specific.
If you need a single vulkan based âSave Screenshotâ function (from the swapchain image), have a look at https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp, the function saveScreenshot() on line 186. It can be added with almost no other dependancies. This function works if the swapchain image is created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag.
Here is a self contained snip from my Engine using the above code.
/* PROJECT: Yarra
COPYRIGHT: 2017-2020, Zen Yes Pty Ltd, Australia
AUTHORS: Zenja Solaja
DESCRIPTION: https://github.com/SaschaWillems/Vulkan/blob/master/examples/screenshot/screenshot.cpp
Take a screenshot from the current swapchain image
This is done using a blit from the swapchain image to a linear image whose memory content is then saved as a ppm image
Getting the image date directly from a swapchain image wouldnât work as theyâre usually stored in an implementation dependent optimal tiling format
Note: This requires the swapchain images to be created with the VK_IMAGE_USAGE_TRANSFER_SRC_BIT flag (see VulkanSwapChain::create)
*/
#include
#include
#include âstb/stb_image_write.hâ
#include âwebp/webp/encode.hâ
#include âYarra/Render/RenderManager.hâ
namespace yrender
{
static VkImageMemoryBarrier imageMemoryBarrier()
{
VkImageMemoryBarrier imageMemoryBarrier {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imageMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
imageMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
return imageMemoryBarrier;
}
static void insertImageMemoryBarrier(VkCommandBuffer cmdbuffer, VkImage image, VkAccessFlags srcAccessMask, VkAccessFlags dstAccessMask,
VkImageLayout oldImageLayout, VkImageLayout newImageLayout,
VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask,
VkImageSubresourceRange subresourceRange)
{
VkImageMemoryBarrier _imageMemoryBarrier = imageMemoryBarrier();
_imageMemoryBarrier.srcAccessMask = srcAccessMask;
_imageMemoryBarrier.dstAccessMask = dstAccessMask;
_imageMemoryBarrier.oldLayout = oldImageLayout;
_imageMemoryBarrier.newLayout = newImageLayout;
_imageMemoryBarrier.image = image;
_imageMemoryBarrier.subresourceRange = subresourceRange;
vkCmdPipelineBarrier(
cmdbuffer,
srcStageMask,
dstStageMask,
0,
0, nullptr,
0, nullptr,
1, &_imageMemoryBarrier);
}
/* FUNCTION: RenderManager :: SaveScreenshot
ARGUMENTS: filename (no extension)
encode_format
RETURN: n/a
DESCRIPTION: Save screenshot
*/
void RenderManager :: SaveScreenshot(const char *filename, EncodeFormat encode_format)
{
#if VULKAN_ALLOW_SCREENSHOT
bool supportsBlit = true;
// Check blit support for source and destination
// Check if the device supports blitting from optimal images (the swapchain images are in optimal format)
vk::FormatProperties formatProps = fVulkanPhysicalDevice.getFormatProperties(fVulkanSwapChainFormat);
if (!(formatProps.optimalTilingFeatures & vk::FormatFeatureFlagBits::eBlitSrc))
{
yplatform::Debug("[SaveScreenshot] Device does not support blitting from optimal tiled images, using copy instead of blit!\n");
supportsBlit = false;
}
// Check if the device supports blitting to linear images
formatProps = fVulkanPhysicalDevice.getFormatProperties(vk::Format::eR8G8B8A8Unorm);
if (!(formatProps.linearTilingFeatures & vk::FormatFeatureFlagBits::eBlitDst))
{
yplatform::Debug("[SaveScreenshot] Device does not support blitting to linear tiled images, using copy instead of blit!\n");
supportsBlit = false;
}
// Source for the copy is the last rendered swapchain image
vk::Image srcImage = fVulkanSwapChainImages[fCurrentFrameIndex];
// Create the linear tiled destination image to copy to and to read the memory from
vk::ImageCreateInfo imageCreateCI;
imageCreateCI.imageType = vk::ImageType::e2D;
// Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
imageCreateCI.format = vk::Format::eR8G8B8A8Unorm;
imageCreateCI.extent.width = fVulkanSwapChainExtent.width;
imageCreateCI.extent.height = fVulkanSwapChainExtent.height;
imageCreateCI.extent.depth = 1;
imageCreateCI.arrayLayers = 1;
imageCreateCI.mipLevels = 1;
imageCreateCI.initialLayout = vk::ImageLayout::eUndefined;
imageCreateCI.samples = vk::SampleCountFlagBits::e1;
imageCreateCI.tiling = vk::ImageTiling::eLinear;
imageCreateCI.usage = vk::ImageUsageFlagBits::eTransferDst;
// Create the image
vk::Image dstImage = fVulkanDevice.createImage(imageCreateCI, nullptr);
// Create memory to back up the image
vk::MemoryRequirements memRequirements = fVulkanDevice.getImageMemoryRequirements(dstImage);
vk::MemoryAllocateInfo memAllocInfo;
memAllocInfo.allocationSize = memRequirements.size;
// Memory must be host visible to copy from
memAllocInfo.memoryTypeIndex = FindMemoryType(memRequirements.memoryTypeBits, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
vk::DeviceMemory dstImageMemory = fVulkanDevice.allocateMemory(memAllocInfo, nullptr);
fVulkanDevice.bindImageMemory(dstImage, dstImageMemory, 0);
// Do the actual blit from the swapchain image to our host visible destination image
//vk::CommandBuffer copyCmd = vulkanDevice->createCommandBuffer(VK_COMMAND_BUFFER_LEVEL_PRIMARY, true);
vk::CommandBufferAllocateInfo allocInfo;
allocInfo.level = vk::CommandBufferLevel::ePrimary;
allocInfo.commandPool = fVulkanCommandPool;
allocInfo.commandBufferCount = 1;
vk::CommandBuffer copyCmd;
vk::Result res = fVulkanDevice.allocateCommandBuffers(&allocInfo, ©Cmd);
if (res != vk::Result::eSuccess)
throw std::runtime_error(vk::to_string(res));
vk::CommandBufferBeginInfo beginInfo;
copyCmd.begin(beginInfo);
// Transition destination image to transfer destination layout
insertImageMemoryBarrier(
copyCmd,
dstImage,
0,
VK_ACCESS_TRANSFER_WRITE_BIT,
VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });
// Transition swapchain image from present to transfer source layout
insertImageMemoryBarrier(
copyCmd,
srcImage,
VK_ACCESS_MEMORY_READ_BIT,
VK_ACCESS_TRANSFER_READ_BIT,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });
// If source and destination support blit we'll blit as this also does automatic format conversion (e.g. from BGR to RGB)
if (supportsBlit)
{
// Define the region to blit (we will blit the whole swapchain image)
VkOffset3D blitSize;
blitSize.x = fVulkanSwapChainExtent.width;
blitSize.y = fVulkanSwapChainExtent.height;
blitSize.z = 1;
VkImageBlit imageBlitRegion{};
imageBlitRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageBlitRegion.srcSubresource.layerCount = 1;
imageBlitRegion.srcOffsets[1] = blitSize;
imageBlitRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageBlitRegion.dstSubresource.layerCount = 1;
imageBlitRegion.dstOffsets[1] = blitSize;
// Issue the blit command
vkCmdBlitImage(
copyCmd,
srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
1,
&imageBlitRegion,
VK_FILTER_NEAREST);
}
else
{
// Otherwise use image copy (requires us to manually flip components)
VkImageCopy imageCopyRegion{};
imageCopyRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageCopyRegion.srcSubresource.layerCount = 1;
imageCopyRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageCopyRegion.dstSubresource.layerCount = 1;
imageCopyRegion.extent.width = fVulkanSwapChainExtent.width;
imageCopyRegion.extent.height = fVulkanSwapChainExtent.height;
imageCopyRegion.extent.depth = 1;
// Issue the copy command
vkCmdCopyImage(
copyCmd,
srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
dstImage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
1,
&imageCopyRegion);
}
// Transition destination image to general layout, which is the required layout for mapping the image memory later on
insertImageMemoryBarrier(
copyCmd,
dstImage,
VK_ACCESS_TRANSFER_WRITE_BIT,
VK_ACCESS_MEMORY_READ_BIT,
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VK_IMAGE_LAYOUT_GENERAL,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });
// Transition back the swap chain image after the blit is done
insertImageMemoryBarrier(
copyCmd,
srcImage,
VK_ACCESS_TRANSFER_READ_BIT,
VK_ACCESS_MEMORY_READ_BIT,
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
VkImageSubresourceRange{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 });
copyCmd.end();
vk::SubmitInfo submitInfo;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = ©Cmd;
vk::FenceCreateInfo fenceInfo;
vk::Fence fence = fVulkanDevice.createFence(fenceInfo, nullptr);
res = fVulkanGraphicsQueue.submit(1, &submitInfo, nullptr);
if (res != vk::Result::eSuccess)
throw std::runtime_error(vk::to_string(res));
fVulkanGraphicsQueue.waitIdle();
fVulkanDevice.destroyFence(fence);
fVulkanDevice.freeCommandBuffers(fVulkanCommandPool, 1, ©Cmd);
// Get layout of the image (including row pitch)
VkImageSubresource subResource { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0 };
VkSubresourceLayout subResourceLayout = fVulkanDevice.getImageSubresourceLayout(dstImage, subResource);
// Map image memory so we can start copying from it
const char* data;
vkMapMemory(fVulkanDevice, dstImageMemory, 0, VK_WHOLE_SIZE, 0, (void**)&data);
data += subResourceLayout.offset;
std::string name(filename);
if (encode_format == EncodeFormat::ePng)
{
name.append(".png");
stbi_write_png(name.c_str(), fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, 4, data, (int) subResourceLayout.rowPitch);
}
else if (encode_format == EncodeFormat::eWebp)
{
name.append(".webp");
uint8_t *output;
size_t sz = WebPEncodeRGBA((const uint8_t *)data, fVulkanSwapChainExtent.width, fVulkanSwapChainExtent.height, (int) subResourceLayout.rowPitch, 100, &output);
std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);
file.write((char *)output, sz);
file.close();
WebPFree(output);
}
else
{
name.append(".ppm");
std::ofstream file(name.c_str(), std::ios::out | std::ios::binary);
// ppm header
file << "P6\n" << fVulkanSwapChainExtent.width << "\n" << fVulkanSwapChainExtent.height << "\n" << 255 << "\n";
// If source is BGR (destination is always RGB) and we can't use blit (which does automatic conversion), we'll have to manually swizzle color components
bool colorSwizzle = false;
// Check if source is BGR
// Note: Not complete, only contains most common and basic BGR surface formats for demonstration purposes
if (!supportsBlit)
{
std::vector<vk::Format> formatsBGR = { vk::Format::eB8G8R8A8Srgb, vk::Format::eB8G8R8A8Unorm, vk::Format::eB8G8R8A8Snorm};
colorSwizzle = (std::find(formatsBGR.begin(), formatsBGR.end(), fVulkanSwapChainFormat) != formatsBGR.end());
}
// ppm binary pixel data
for (uint32_t y = 0; y < fVulkanSwapChainExtent.height; y++)
{
unsigned int *row = (unsigned int*)data;
for (uint32_t x = 0; x < fVulkanSwapChainExtent.width; x++)
{
if (colorSwizzle)
{
file.write((char*)row+2, 1);
file.write((char*)row+1, 1);
file.write((char*)row, 1);
}
else
{
file.write((char*)row, 3);
}
row++;
}
data += subResourceLayout.rowPitch;
}
file.close();
}
std::cout << "Screenshot saved to disk" << std::endl;
// Clean up resources
vkUnmapMemory(fVulkanDevice, dstImageMemory);
vkFreeMemory(fVulkanDevice, dstImageMemory, nullptr);
vkDestroyImage(fVulkanDevice, dstImage, nullptr);
#endif
}
}; // namespace yrender
I love watching this thread. Seeing this progress a piece at a time is so cool.
Same. Just like the RiscV64 thread Iâm coming here just to see if there is an update. Happy to see itâs getting closer to hardware accelerated support
Actually, this is gears. I can see blue gear connected with red gear, which is connected with green gear. The only problem is wrong resolution on vertical / horizontal axes. Something like rasterization assumes 640 pixels width but is drawn on 1920 pixels (so, 3+ rows into one) then leaving 3+ rows blank (black).
Donât forgot about donating to x512: PayPal.Me