Why I don’t want my clients to use SANs for Dynamics NAV (Navision)
I am sure I am not the only person that has had to argue with a client as to how hardware needs to be configured in a Dynamics NAV environment. Typically bigger companies with larger internal IT support generally have their way of doing things, and its often very hard to sway them to do things "The Navision Way" As well as new implementations, I often will be asked by a client if they should switch to SANs This Blog really is about and kind of "Virtual Storage" system, and to some extent even partitioning disks.
What is a SAN?
Basically a SAN (Storage Attached Network) is a box (or boxes) full of hard drives that are bundled together into what are called LUNs. These LUNs are accessed by the server through (typically) a fiber channel. The server can then create drives on those LUNs, much like where in a DAS (Direct Attached Storage) environment you may have logical drives created on Partitions of a Physical drive. If you are not familiar with SANs and DAS, then a quick trip to Google might be good at this time J. It will also help if you understand a little about Latency, Transfer rate, Seek time and Number of Platters.
From the point of view of this Blog, the key difference between a SAN and DAS is that a SAN can be shared across multiple Servers with ease. A DAS is really only connected to one active server at a time.
Why do users want to use SANs?
SANs generally offer the following advantages over DAS
- Cost is much lower in multi server environments
- Better utilization of total physical volume space
- More effective disaster recovery process
- Easier maintenance
- Simpler backup strategies in complex environments.
1 and 2 are generally linked, since the cost reduction is mainly due to better utilization of the actual space on a disk drive. Though of course having less boxes and power supplies etc. and even lower air-conditioning requirements makes SANs cheaper.
3, 4 and 5 relate to the fact that you have everything in one box, so you can easily switch things around as required.
The SAN magic
Before we go much further, I must say that SANs really are the greatest thing since sliced bread, and every big organization should be using SANs; just not for Navision.
Spindles and heads
So lets get to the issues. In any transaction bases Database environment, pure disk through put and thus posting speed comes down to the number of spindles and the positioning of heads on the platters. If we look at a typical medium to large Navision SQL setup (say 100+ gig db and 100+ users) , we could have a scenario where 16 drives are configures as 8 RAID 1 pairs and there are 8 ndb files on those drives, 1 per RAID array, or we might have one big NDB file and the 16 drives are configured as a large RAID 10 array.
There are advantages and disadvantages of both, but that is not relevant to this discussion. The key issue here is that the database is accessing 8 spindles. This means that we are writing 8 streams of data in parallel, so the transfer rate (and thus commit speed) to this array is about 8 times faster than if we had one large RAID 1 drive. Now lets say the Database is 100Gig, and we have say 16 x 73 gig drives, a total of 1,168 Gig of which we use less than 10%. We do this, because half of this is used for redundancy and really we are using about 20% of 500gig of mirrored space. We do this for performance. And I think most people are aware of this. OK, so if we have all the spindles we need, what can we do with that free space. Well not much; we might say that the Navision server is dedicated to SQL on Navision and we don't want any other applications running on that server. But that is not the real issue. What is the issue, is that we want the 8 spindles all being used by Navision. If we decided to put something else on the drives, even a small application, then the data through put is no longer dedicated to Navision, so the system will slow down.
In general this is pretty easy to follow, and has applied since the early days of Navision and equally applies to SQL. What is more complex a concept to grasp is that of the LOG file. So lets look at the log file.
The first thing we can notice about a LOG file, as compared to the database, is that the drives are very much quieter. Now typically in our scenario, we might have one or two log files either on a RAID 1 or RAID 10 array with 2 or 4 drives. Again lets no go into the difference between one or two log files, for now, but the principle in general. So lets say we have 2 log files on one RAID 1 array:
The log file works very different to the Database. A log file is written and read from sequentially, so the number of spindles really does not have a huge impact on performance. What really matters is keeping activity on the drive as low as possible so as to reduce the effects of latency and seek time. In fact if we can keep the heads writing a single linear transaction, then write time will be a lot faster. The reason is that if the heads move then we have issues with seek and latency to get back where we were. So the key to the log file is to have that spindle and head working ONLY on Navision's log file. Even the smallest file on the drive being accessed will degrade performance significantly.
In an ideal scenario. The head sits virtually still, and the drive rotates. Assuming Latency is matched to rotation speed, the drive will simply continue to write to the track, one it completes a full write, the head moves only a miniscule amount to the next track and continues on.
Now lets look at a scenario where we share a small part of the drive with another program. We have performance issues to contend with now.
Each time we access the other program, the head has to move, AND we have to wait for the platter to rotate to the correct position. Then we have to do this again to go back to the log file. In this scenario, you can increase your seek time by 20 times which will absolutely kill SQL performance.
The simple version
Basically what I am trying to indicate here is that if you want Navision to really fly, (and this applies to Native as well as SQL) then you MUST have dedicated multiple spindles for the database, and a dedicated head for the transaction log that does nothing but the transaction log.
Why SANs are bad
But of course, by now everyone is asking what this has to do with Sans. So lets get to that. OK, the point is that there is nothing wrong with a SAN per se. The issue is in configuring the SAN, and in that there are three issues.
- Do you really know how to configure the SAN
- Is it even possible to configure it correctly
Will you be tempted to use up all that free space.
I have done a lot of Navision implementations, and I have recovered a lot of disasters for users. I must say that in my many years, I have NEVER once seen a SAN that was correctly configured. Generally due to one of the three reasons above. 1 and 2 really fit in the same box, and often the IT department truly believes that the SAN is configured to have dedicated multiple spindles for the Database, and dedicated heads for the LOG file, where as in reality the LUNs are actually spread over multiple drives without any specific control of what is placed where. So even though they thing that the LUN they are representing as a RAID 10 drive is made up of 16 drives, it may be made up of just something that the SAN represents as 8 drives. And in fact the LOG file may be sitting on the end of one of the drives that has a part of the database on it.
But the killer in most cases is 3/ above. Often its hard to tell the bean counters that of the 1 Terra Byte of disk they bought, that they will only use 100Gig and the rest will just sit idle. So suddenly we find that a month after Navision go live, things have slowed down. With the way SANs work, the Navision specialist will look at the drives, and see 16 drives dedicated to the DB, 4 drives dedicated to the LOG file, a TEMP DB sitting on its own RAID 1, but for some reason 2 DB parts are very slow, and the log file is crawling. The client will blame Navision, and it will take a while before someone discovers that the SAN is being used to Host Exchange and Active directory, since all this is totally invisible to the Navision SQL server.
Yes SANs can be used in a Navision environment. But if you are going to use a SAN, then be fully aware of all the implications, most importantly you need a SAN dedicated to Navision, because if it is not, then eventually somehow you log file will share a spindle with a 200 user Exchange server and performance will go down, locking will be more prevalent, and the Client will say that it's a Navision error.
So if a client wants to go for a SAN then I just ask for two things.
- The SAN is dedicated to Navision
- Its very clear that the user can configure the specific mapping of spindles to LUNs
PS and for those who haven't guessed yet. I am currently working with three separate Navision Clients each that have performance issues, and so far only one of them has acknowledged that the SAN is probably not configured correctly. They are (I hope) reading this blog J