Snapraid Change Data Disk

November 15th 2020

Overview

This guide covers how to replace a working drive in a functioning SnapRAID array with another working drive. The new drive can be the same size or larger than the old drive as long as its not larger than the largest parity volume.

Before starting this process it is advised to test the new drive, Badblocks is a good drive testing utility for Linux.

Synopsis

  1. Make sure nothing is writing to the array.
  2. Run the SnapRAID commands diff and sync until the diff command results in 0 changes.
  3. Identify the drive in the array to be replaced.
  4. Copy all the files, including hidden files from the old drive to the new drive.
  5. Remove the old drive from the SnapRAID config file.
  6. Add the new drive to the SnapRAID config file.
  7. Run the SnapRaid command diff, it should identify a drive has changed but no data should have changed.
  8. Return to business as usual with the array.

SnapRAID

I run SnapRAID as a plugin on an OpenMediaVault virtual machine running in Proxmox. The drives used for storage are connected to two LSI MegaRAID cards, both of which are passed through to the OpenMediaVault VM. In addition I use MergerFS through the OMV Union Filesystems plugin to pool all the disks together into a single volume. This setup requires a few more steps before we can change a SnapRAID data drive.

Note that SnapRAID also has the ability to pool drives, but its not as robust as MergerFS.

Prepare

  1. Attach the new drive to an external USB dock that is accessible from the Proxmox Server. Most of the instructions here are done on the CLI, UI steps will be noted as such.

  2. Identify the new drive using lsblk. The example below shows sdb as a 3.7TB disk with no associated partitions. A brand new disk would look this way, but if you’re re-using a drive it could have partitions.

    
    lsblk -o NAME,PATH,FSTYPE,MOUNTPOINT,LABEL,UUID,SIZE
    
     
     NAME   PATH       FSTYPE      MOUNTPOINT     LABEL         UUID                                     SIZE
     sda    /dev/sda                                                                                     2.8T
     └─sda1 /dev/sda1  ext4                       201907d001    cdee8e5f-54b3-4f1f-afeb-4d414b7e94aa     2.8T
     sdb    /dev/sdb                                                                                     3.7T
     sdc    /dev/sdc                                                                                     2.8T
     └─sdc1 /dev/sdc1  ext4                       201907d002     f4adcf58-380b-4c6c-b991-feb47e02b15a    2.8T
     sdd    /dev/sdd                                                                                     3.7T
     └─sdd1 /dev/sdd1  ext4                       202003d003    d5a72cf7-8b04-4475-9860-9c48735b3f23     3.7T
     
     

    There might also be one or more drives with many partitions, take note of these drives and do not touch them in any way! These drives are usually used by proxmox, and are easier to manage through the UI.

  3. Use Badblocks to verify the disk is free of errors. The command below is nondestructive, replace the n flag with w to do a destructive write test. This can take hours on small drives and days on large drives. The 4TB refurbished enterprise drives I use take about 2 days to finish.

     
         sudo badblocks -nsv /dev/sdb >badsectors.log
     
  4. Setup partitions on the drive using fdisk

    
         sudo fdisk /dev/sdb
     

    The output might look like the example below.
    The steps to create a new partition begins at line 8.

     
         Welcome to fdisk (util-linux 2.33.1).
         Changes will remain in memory only, until you decide to write them.
         Be careful before using the write command.
         
         Device does not contain a recognized partition table.
         Created a new DOS disklabel with disk identifier 0xe96324a1.
         
         Command (m for help): n
         Partition type
             p   primary (0 primary, 0 extended, 4 free)
             e   extended (container for logical partitions)
         Select (default p): p
         Partition number (1-4, default 1): 1
         First sector (2048-16777215, default 2048):
         Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-16777215, default 16777215):
    
         Created a new partition 1 of type 'Linux' and of size 8 GiB.
         Command (m for help): w
         The partition table has been altered.
         Calling ioctl() to re-read partition table.
         Syncing disks.
     
     
    Line 8
    The command n creates a new partition.
    Line 12
    The command p makes it a primary partition.
    Line 13 - 15
    Can be left blank as the default values are fine.
    Line 19
    The command w writes the changes to disk.
  5. Format the drive using mkfs. Ext4 is a good enough for most cases option. The L option gives the partition a label, newdatadrive for easy identification later.

     
        sudo mkfs.ext4 -L newdatadrive /dev/sdb1
    
    
        mke2fs 1.44.5 (15-Dec-2018)
        Discarding device blocks: done
        Creating filesystem with 2096896 4k blocks and 524288 inodes
        Filesystem UUID: f4adcf58-380b-4c6c-b991-feb47e02b15a
        Superblock backups stored on blocks:
                32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    
        Allocating group tables: done
        Writing inode tables: done
        Creating journal (16384 blocks): done
        Writing superblocks and filesystem accounting information: done
    
    
  6. Use lsblk to verify the new partition.

    
    lsblk -o NAME,PATH,FSTYPE,MOUNTPOINT,LABEL,UUID,SIZE
    
    
    NAME   PATH       FSTYPE      MOUNTPOINT     LABEL         UUID                                     SIZE
    sda    /dev/sda                                                                                     2.8T
    └─sda1 /dev/sda1  ext4                       201907d001    cdee8e5f-54b3-4f1f-afeb-4d414b7e94aa     2.8T
    sdb    /dev/sdb                                                                                     3.7T
    └─sdb1 /dev/sdb1  ext4                       newdatadrive  f4adcf58-380b-4c6c-b991-feb47e02b15a     3.7T
    sdc    /dev/sdc                                                                                     2.8T
    └─sdc1 /dev/sdc1  ext4                       201907d002    0cdf8cd4-5c49-460a-a65a-6e340b53a2a4     2.8T
    sdd    /dev/sdd                                                                                     3.7T
    └─sdd1 /dev/sdd1  ext4                       202003d003    d5a72cf7-8b04-4475-9860-9c48735b3f23     3.7T
    
  7. We need to access this new volume later. To mount it, first create a directory to mount it to.

    
    sudo mkdir /mnt/sdb/
    
  8. Mount the new volume using mount.

    
    sudo mount /dev/sdb1 /mnt/sdb
    

Open Media Vault

The steps below are done in the Open Media Vault CLI through ssh.

  1. Run a SnapRAID diff command on the array.

    
    sudo snapraid diff
    

    Example of diff results.

    
    Loading state from /srv/dev-disk-by-label-201907d001/snapraid.content...
    Comparing...
    ...
    ...
    ...
      125315 equal
          10 added
           7 removed
           0 updated
           0 moved
           0 copied
           0 restored
    There are differences!
    
  2. Next run the SnapRAID sync command, then repeat step 7 and 8 until step 7 returns no differences.

    
    sudo snapraid sync
    

    Example of sync results.

    
        Self test...
        Loading state from /srv/dev-disk-by-label-201907d001/snapraid.content...
        Scanning disk 201907d001...
        Scanning disk 201907d002...
        Using 2235 MiB of memory for the file-system.
        Initializing...
        Resizing...
        Saving state to /srv/dev-disk-by-label-201907d001/snapraid.content...
        Saving state to /srv/dev-disk-by-label-201907d002/snapraid.content...
        Verifying /srv/dev-disk-by-label-201907d001/snapraid.content...
        Verifying /srv/dev-disk-by-label-201907d002/snapraid.content...
        Verified /srv/dev-disk-by-label-201907d001/snapraid.content in 14 seconds
        Verified /srv/dev-disk-by-label-201907d002/snapraid.content in 15 seconds
        Syncing...
        Using 96 MiB of memory for 32 cached blocks.
        100% completed, 113784 MB accessed in 0:07     0:00 ETA
    
            201907d001  0% |
            201907d002  0% |
                parity  2% | *
                raid  1% | *
                hash  5% | ***
                sched  7% | ****
                misc  0% |
                        |_______________________________________________________
                                    wait time (total, less is better)
        Everything OK
        Saving state to /srv/dev-disk-by-label-201907d001/snapraid.content...
        Saving state to /srv/dev-disk-by-label-201907d002/snapraid.content...
        Verifying /srv/dev-disk-by-label-201907d001/snapraid.content...
        Verifying /srv/dev-disk-by-label-201907d002/snapraid.content...
        Verified /srv/dev-disk-by-label-201907d001/snapraid.content in 15 seconds
        Verified /srv/dev-disk-by-label-201907d002/snapraid.content in 16 seconds
    
    
  3. Use lsblk with a few additional options to find the UUID, label, and mount points of the disks in the snapRAID array. You want to identify the disk you want to remove within OMV, and relate that to the physical disk in Proxmox. Generally setting a good label in step 6 will make the identification easier.

    lsblk -o NAME,PATH,FSTYPE,MOUNTPOINT,LABEL,UUID,SIZE
    
        NAME   PATH      FSTYPE MOUNTPOINT                                   LABEL          UUID                                  SIZE
        sda    /dev/sda                                                                                                           16G
        ├─sda1 /dev/sda1 vfat   /boot/efi                                                  52A1-DABE                             512M
        ├─sda2 /dev/sda2 ext4   /                                                          b919071a-3a7a-4349-8bb3-6d04a2654964 14.5G
        └─sda3 /dev/sda3 swap   [SWAP]                                                     5d40e0e9-c5a5-4277-bf6a-d11a5daeb693 1020M
        vda    /dev/vda                                                                                                          2.8T
        └─vda1 /dev/vda1 ext4   /srv/dev-disk-by-label-201907d001            201907d001    c089c2df-df90-487e-a380-acf581c2a734  2.8T
        vdb    /dev/vdb                                                                                                          2.8T
        └─vdb1 /dev/vdb1 ext4   /srv/dev-disk-by-label-201907d002            201907d002    30ae1493-c449-4ec1-9825-752f88f2962e  2.8T
        vdc    /dev/vdc                                                                                                          3.7T
        └─vdc1 /dev/vdc1 ext4   /srv/dev-disk-by-label-202003d003            202003d003    95fbd539-383b-4fe2-88da-81f65c17c82f  3.7T
    
    
  4. Take note of the details of the disk to be removed, vdb. Note the 3rd disk of size 3.7T, vdc is the parity drive. The existing data drives are smaller at 2.8T, vda and vdb. The new 3.7T drive doesn’t appear here because its not mounted to the OMV VM. Some of this info can be found using the OMV UI too.

    
        Name: vdb
        Partition: vbd1
        Partition Label: 201907d002
        Path: /dev/vdb
        Mount Point: /srv/dev-disk-by-label-201907d002
        UUID: 30ae1493-c449-4ec1-9825-752f88f2962e
        Size: 2.8T
    

Proxmox

This section takes place on the proxmox host CLI.

  1. Use the details from the previous step and the lsblk command below to identify vdb on the proxmox node. If you set a good label in step 6 this should be much easier. In the previous step vdb’s partition label is 201907d002.

    
    lsblk -o NAME,PATH,FSTYPE,MOUNTPOINT,LABEL,UUID,SIZE
    
    NAME   PATH       FSTYPE      MOUNTPOINT     LABEL         UUID                                     SIZE
    sda    /dev/sda                                                                                     2.8T
    └─sda1 /dev/sda1  ext4                       201907d001    cdee8e5f-54b3-4f1f-afeb-4d414b7e94aa     2.8T
    sdb    /dev/sdb                                                                                     3.7T
    └─sdb1 /dev/sdb1  ext4                       newdatadrive  f4adcf58-380b-4c6c-b991-feb47e02b15a     3.7T
    sdc    /dev/sdc                                                                                     2.8T
    └─sdc1 /dev/sdc1  ext4                       201907d002    0cdf8cd4-5c49-460a-a65a-6e340b53a2a4     2.8T
    sdd    /dev/sdd                                                                                     3.7T
    └─sdd1 /dev/sdd1  ext4                       202003d003    d5a72cf7-8b04-4475-9860-9c48735b3f23     3.7T    
    
  2. Mount the disk to be removed, this is similar to step 7 & 8. Create a directory then mount the drive to it.

    
    sudo mkdir /mnt/sdc
    sudo mount /dev/sdc1 /mnt/sdc
    
  3. Use rsync to copy all the files off the old data disk, sdc to the new data disk, sdb. The n flag does a dry run and doesn’t actually copy anything. Remove it when you’re sure everything is good. The trailing / in mnt/sdc/ is intentional, without it the command would create a sdc directory inside /mnt/sdb. The a flag copies all files including hidden files, and doesn’t modify metadata. This will take a while, so sit back with a hot drink.

     
        sudo rsync -avn /mnt/sdc/ /mnt/sdb
    
        sending incremental file list
        ./
        file.1
        file.2
        file.3
        file.4
        file.5
        folder/
        folder/file.1.1
        folder/file.1.2
        sent 266 bytes  received 44 bytes  620.00 bytes/sec
        total size is 0  speedup is 0.00 (DRY RUN)
    
    

SnapRaid

Back to the Open Media Vault CLI to run the SnapRAID command below.

  1. Run a diff on the snapRaid array to verify nothing was written. This is similar to step 9.

     
    sudo snapraid diff
    
  2. If there are differences double check the mounted drives to ensure you’re copying from and to the correct disks. This step should not be necessary, but if it is slow down and recheck the previous steps. You might have written to the wrong disk, and if so you will need to check the snapRAID docs to restore the written files.

Open Media Vault

The steps below are done on the Open Media Vault web GUI.

  1. Login to the OMV UI and go to the SnapRAID section.
    omv-snapraid
  2. Find the old drive in the Drives tab and remove it from the array. Apply the changes to OMV.
    omv-snapraid-delete-drive
  3. Move to the Union Filesystems section.
    omv-unionfs
  4. Remove the old drive from the union filesystem it belongs to. Deselect it from the list. Save and apply changes.
    omv-unionfs-delete-drive
  5. Move to the File Systems section in OMV.
    omv-filesystem
  6. Make sure the drive is not referenced, then unmount it. Apply the changes.

Replace

  1. If hot swap is not supported, safely turn off the Proxmox node.

  2. Carefully remove the physical drive, and replace it with the new data drive.

  3. Restart the Proxmox node, then get the ID of the new disk, sdc using lsblk.

    
    lsblk -o name,label,model,serial
    
    
    NAME          LABEL          MODEL                   SERIAL
    sda                          WDC_WD4000FYYZ-05UL1B0  WD-WCC124631734
    └─sda1        201907d001
    sdb                          Hitachi_HUS724040ALE640 PK23654PAG7BP8T
    └─sdb1        newdatadrive
    sdc                          WDC_WD40EFRX-68N32N0    WD-WCC7K45XJ4LJ
    └─sdc1        201907d002
    sdd                          Hitachi_HUS724040ALE640 PK13241PAG6RZBS
    └─sdd1        202003d003
    
  4. Find the disk by its ID in /dev/disk/by-id/ using the Model and Serial.

    
    ls /dev/disk/by-id/
    
    
    ata-HITACHI_HUS724040ALE640_PK23654PAG7BP8T
    ata-HITACHI_HUS724040ALE640_PK13241PAG6RZBS
    ata-WDC_WD4000FYYZ-05UL1B0_WD-WCC124631734
    ata-WDC_WD40EFRX-68N32N0 _WD-WCC7K45XJ4LJ
    
  5. Make sure the OMV VM is turned off. Passthrough the disk to the OMV VM. 100 is the VM ID, and -virtio7 is the 7th virtio drive attached to the VM. Attaching another disk would require changing to another virtio interface, such as -virtio8.

    
    sudo qm set 100 -virtio7 /dev/disk/by-id/ata-Hitachi_HUS724040ALE640_PK23654PAG7BP8T
    
  6. Restart and log back into the OMV VM, then go to the filesystems section.

  7. Move back to the Union Filesystems section and edit the existing file system. Select to add the new drive in the list. Save and apply changes.

  8. Move to the SnapRAID section and go to the Drive tab. Select the new drive in the list of data drives. Apply the changes.

  9. Run a snapraid diff, SnapRAID should detect the disk change but it shouldn’t find any differences. If it returns anything but equal, restored, or copied check the directory structure on the new drive.

    
    sudo snapraid diff
    
  10. Run a snapraid check to verify the files on the new disk.

    
    sudo snapraid check -a -d newdatadrive
    
  11. Run a snapraid sync to finish up.

    
    sudo snapraid sync
    

That’s it! At this point write access can be given back to the SnapRAID array.

This post is written by Gouthaman Raveendran, licensed under CC BY-NC 4.0.