A few questions about SSD memory page size and BFS

As far as I understand, that size of page (cell) is usually 4 KB or even larger for SSDs.
When installing Haiku, the suggested (recommended) size for BFS formatting is 2 KB for page size. That recommendation appears to be outdated.
Maybe someone could explain the situation, give some recommendations.

1 Like

There is a compromise to make.

You cannot allocate less than a block to a single file. If you use a large size (4K or more), all filesizes are rounded up to that size. And also, all inodes and other filesystem datastructures are rounded up to that size as well. This would result in quite a bit of wasted space.

The SSD itself has its own block size. Due to the way flash memory works, you have to erase and then rewrite a whole block, partial writes are not possible. So, if you want to write only 2K of data in a 4K block, you have to:

  • Read the first 2K (the part you don’t want to change) from the block
  • Append the new data in the SSD internal working memory
  • Erase the 4K block
  • Write the complete data to the 4K block

However if you write a full 4K, you don’t need to do the read. You can directly erase and write.

Modern SSDs do a lot of caching and other things (wear levelling, that means moving data around and continuously rewriting it so it doesn’t get erased over time). So, it’s hard to tell if changing the BFS block size will have a big effect on performance. you will have to try and see what happens on your hardware.

2 Likes
  • Are there any plans to add a 16KB page formatting option to DriveSetup?
  • Could DriveSetup automatically set media page size? (does the media tell the OS its page size?)

I have Haiku on an M2 SSD, & I didn’t do anything special when installing it, so the default seems to work OK…

1 Like

My system uses two NVME drives and both have a page size of 16 KB.
I usually format BFS with 2 KB page size for Haiku.
But it might not be the best option for such SSDs?

I tried to do some tests. I copied a folder of 6.4 GiB and about 16,000 files to a BFS partition from another drive. Before copying the files, I reformatted the partition with a different block size. The data is very inconsistent, whether this is due to the Haiku or the medium itself is unclear.

/block_size - /first_attempt/second_attempt/…
/2048 - /11s/6s
/4096 - /5s/7s
/8192 - /6s/27s/27s/8s

(Bias up to 0.5s)

The weirdest thing with a block size of 8192…


I tried to make some more accurate tests.
I restarted the computer and reformatted the partition before each copying of files.

/2048 - /12s/11s/11s/11s
/4096 - /12s/11s/34s/11s
/8192 - /31s/13s/13s/13s

So far, 2KB seems to be the best choice, although, of course, maybe the number of tests is a bit too small for such a conclusion.

In theory yes, but in reality everything pretends to use 512 bytes because some OS don’t know how to handle anything else.

Reformatting will not be enough. you would also need to tell the NVMe that the sectors are not used, using the fstrim command.

You can also use bonnie++ for filesystem benchmarks. It will do quite large transfers and repeat them several times to try to get more reliable results.

2 Likes

Tested on system:

OS: Haiku beta4 64 bit
CPU: AMD Ryzen 7500G
RAM: 16GB
Motherboard: MSI B550-A PRO

Tesded SSD (NVME): Samsung 970 EVO Plus 500GB
BFS partition ~100GB (emty)

The SSD has several partitions, almost half of the SSD is written by data.

Haiku is installed on another SSD.

Testing utility:
~> bonnie++ -d /test -s 32G -f -b -u user

Before each test Haiku was rebooted and executed:
~> fstrim /test

“2KB-1” means “BFS system with 2KB block size, first test”, and so on.

Data obtained by testing:

*Chunk Size* — 32GB
*Num files* — 16


*Sequential Output* (Speed of block writes)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
      Write (sec): 466m; 483m; 463m ; 405m; 393m; 405m ; 403m; 378m; 358m;
            % CPU: 38; 39; 38 ; 32; 33; 33 ; 33; 33; 34;
          Latency: 45ms; 35ms; 36ms ; 60ms; 56ms; 44ms ; 57ms; 35ms; 35ms;
    
*Sequential Rewrite* (Speed of rewriting data)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
    Rewrite (sec): 345m; 354m; 358m ; 363m; 359m; 361m ; 362m; 362m; 360m;
            % CPU: 24; 23; 23 ; 24; 24; 23 ; 23; 23; 23;
          Latency: 46ms; 58ms; 62ms ; 52ms; 108ms; 78ms ; 66ms; 84ms; 38ms;
   
*Sequential Input* (Speed of block reads)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
       Read (sec): 538m; 539m; 544m ; 552m; 547m; 551m ; 552m; 550m; 550m;
            % CPU: 30; 29; 29 ; 30; 30; 30 ; 30; 30; 30;
          Latency: 65ms; 39ms; 64ms ; 42ms; 67ms; 64ms ; 63ms; 65ms; 42ms;
 
*Random Seeks* (Number of random seeks per second)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
      Seeks (sec): 594.8; 618.4; 645.2 ; 606.8; 556.4; 593.1 ; 552.4; 655.2; 622.5;
            % CPU: 493; 493; 492 ; 493; 493; 493 ; 494; 493; 493; — (Is this a bug? are per mille used instead of percent?)
          Latency: 96ms; 81ms; 108ms ; 86ms; 90ms; 102ms ; 85ms; 77ms; 87ms
 
*Sequential Create* (Speed of creating files sequentially)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
     Create (sec): 16384; 16384; 16384 ; 16384; 16384; 16384 ; 16384; 16384; 16384; — (What do those "16384" mean? some error?)
            % CPU: 28; 30; 29 ; 28; 29; 31 ; 32; 29; 29;
          Latency: 45ms; 45ms; 44ms ; 47ms; 48ms; 47ms ; 50ms; 52ms; 52ms;
    
       Read (sec): +++++; +++++; +++++ ; +++++; +++++; +++++ ; +++++; +++++; +++++;
            % CPU: +++; +++; +++ ; +++; +++; +++ ; +++; +++; +++;
          Latency: 28µs; 24µs; 25µs ; 30µs; 31µs; 29µs ; 18µs; 18µs; 22µs;
    
     Delete (sec): 16384; 16384; 16384 ; 16384; 16384; 16384 ; 16384; 16384; 16384; — (What do those "16384" mean? some error?)
            % CPU: 27; 29; 29 ; 27; 28; 30 ; 29; 28; 27;
          Latency: 30ms; 30ms; 30ms ; 30ms; 32ms; 47ms ; 51ms; 53ms; 52ms;
    
*Random Create* (Speed of creating files randomly)
----------------------------------------------------------------------------
                  2KB-1; 2KB-2; 2KB-3 ; 4KB-1; 4KB-2; 4KB-3 ; 8KB-1; 8KB-2; 8KB-3;
     Create (sec): 16384; 16384; 16384 ; 16384; 16384; 16384 ; 16384; 16384; 16384; — (What do those "16384" mean? some error?)
            % CPU: 29; 31; 30 ; 29; 30; 32 ; 32; 31; 30;
          Latency: 44ms; 43ms; 42ms ; 47ms; 47ms; 46ms ; 49ms; 52ms; 52ms;
    
       Read (sec): +++++; +++++; +++++ ; +++++; +++++; +++++ ; +++++; +++++; +++++;
            % CPU: +++; +++; +++ ; +++; +++; +++ ; +++; +++; +++;
          Latency: 15µs; 8µs; 10µs ; 14µs; 15µs; 15µs ; 9µs; 7µs; 15µs;
    
     Delete (sec): 16384; 16384; 16384 ; 16384; 16384; 16384 ; 16384; 16384; 16384; — (What do those "16384" mean? some error?)
            % CPU: 29; 32; 31 ; 30; 31; 33 ; 33; 33; 32;
          Latency: 41ms; 41ms; 40ms ; 41ms; 41ms; 40ms ; 47ms; 49ms; 49ms;
         
----------------------------------------------------------------------------

  • Some µs rounded to ms.
  • “+++++” and “+++” means “no data” (action time too short to get data)

Because BFS does not format filesystems with a 16KB block size, the speculation that perhaps the matching of the internal SSD block size with the BFS block size would provide some advantage cannot yet be proven or disprooven on the SSDs I have.

Maybe, but it will most be related with the prevalent size of files one uses. When working with big files, a bigger block size will end up being written anyway, so some performance is gained from doing just one write instead of four ( 16k → 2k ) for example .

With many small files, a 2k block may be better. Since speeds are good in ssds, the easiest solution ends up being some compromise to use a reasonable value that serves for most cases.

2 Likes