Typical NetApp snapshots
Typical NetApp snapshots
The current “ordinal” naming snapshot policy creates snapshots with the names like
nightly.0
nightly.1

What’s wrong with it?
Everybody are using them!

Here is an output of snap list command:

Volume xxxx
working...
 
  %/used       %/total  date          name
----------  ----------  ------------  --------
  1% ( 1%)    0% ( 0%)  Aug 17 00:00  nightly.0     
  2% ( 1%)    0% ( 0%)  Aug 16 00:00  nightly.1     
  3% ( 1%)    0% ( 0%)  Aug 15 00:00  nightly.2     
  6% ( 4%)    0% ( 0%)  Aug 14 00:00  nightly.3     
  7% ( 1%)    0% ( 0%)  Aug 13 00:00  nightly.4     
  8% ( 1%)    0% ( 0%)  Aug 12 00:00  nightly.5     
 58% (57%)    0% ( 0%)  Aug 11 00:00  nightly.6     
 58% ( 0%)    0% ( 0%)  Aug 10 00:00  nightly.7     
 58% ( 0%)    0% ( 0%)  Aug 09 00:01  nightly.8     
 58% ( 0%)    0% ( 0%)  Aug 08 00:00  nightly.9     
 58% ( 0%)    0% ( 0%)  Aug 07 00:00  nightly.10    
 58% ( 0%)    0% ( 0%)  Aug 06 00:00  nightly.11    
 58% ( 0%)    0% ( 0%)  Aug 05 00:00  nightly.12    
 58% ( 0%)    0% ( 0%)  Aug 04 00:00  nightly.13    
 58% ( 0%)    0% ( 0%)  Aug 03 00:00  nightly.14    
 58% ( 0%)    0% ( 0%)  Aug 02 00:00  nightly.15    
 58% ( 0%)    0% ( 0%)  Aug 01 00:00  nightly.16    
 58% ( 0%)    0% ( 0%)  Jul 31 00:00  nightly.17

You – as NAS admin – can see exactly when every snapshot was done and you know which snapshot folder to use for restore from particular date/time.

What’s wrong with current state?

When (e.g.) VMware admins want to restore data from (e.g.) state in 2015-07-31 17:52 they should find it in snapshot taken in 2015-08-01 00:00.

But which snapshot folder it is? Is it nightly.16? nightly.17?

VMware team has no tool (nothing like snapmanager/snapdrive, especially on NFS datastores) to find out the right snapshot. When they access .snapshot folder they see something like this:

-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.0
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.1
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.2
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.3
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.4
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.5
-rw-r--r--  1 root  root  4096 May 11 13:07 nightly.6

The dates/times listed on filesystem are not snapshot dates/times. They are dates/times of last modification of datastore’s root folder. This is date/time when there was last modification of datastore content – adding new vmdk, deleting vmdk,…

VMware team has to ask NAS team which snapshot folder to use for restore -> unnecessary complication.

Even worse – NAS team will respond (e.g) “use nightly.6 folder for restore” and will pass incident back to VMware team.

VMware team will open ticket update NEXT day but at midnight the folders was renamed. nightly.6 become nightly.7 and nightly.5 become nightly.6.

VMware team will use wrong folder for restore!!!

Solution

Use vol options schedsnapname create_time for every volume.

Then the snapshot naming will become

nightly.2015-08-17_0000
nightly.2015-08-16_0000
nightly.2015-08-15_0000
nightly.2015-08-14_0000
nightly.2015-08-13_0000
nightly.2015-08-12_0000
nightly.2015-08-11_0000

It works also for weekly snapshots (e.g. weekly.2015-08-16_00) or hourly ones (hourly.2015-08-17_1100).

Don’t worry about automatic deleting of snapshots beyond retention. Will be deleted properly.

Is there a reason to use default snapshot names?

Yes. There is a reason why this naming convention is necessary.
There can be a situations where you are using scripts that are doing some actions with last nightly snapshot (like mount it to temporary directory, run a filesystem backup or any other data processing).

Then there is easy to use nigtly.0 snapshot directory. You know that this name will have always the latest valid snapshot.
Also some recovery procedures can be based on usage of latest valid snapshot (either nightly.0 or hourly.0).

But you know. No recovery procedure data selection should be based on some automatic assumption. hourly.0 can contain already corrupted data. There should be alwas some human behind selection what to restore.

But besides some explicit usage I see NetApp non-default snapshot names always as better option.

UPDATE 2016-09-14: Seems like somebody smart in NetApp had the right power to use this better naming convention as the only one in NetApp cDOT.