I am sure I am not the only person that has had to argue with a client as to how hardware needs to be configured in a Dynamics NAV environment. Typically bigger companies with larger internal IT support generally have their way of doing things, and its often very hard to sway them to do things "The Navision Way" As well as new implementations, I often will be asked by a client if they should switch to SANs This Blog really is about and kind of "Virtual Storage" system, and to some extent even partitioning disks.
Basically a SAN (Storage Attached Network) is a box (or boxes) full of hard drives that are bundled together into what are called LUNs. These LUNs are accessed by the server through (typically) a fiber channel. The server can then create drives on those LUNs, much like where in a DAS (Direct Attached Storage) environment you may have logical drives created on Partitions of a Physical drive. If you are not familiar with SANs and DAS, then a quick trip to Google might be good at this time J. It will also help if you understand a little about Latency, Transfer rate, Seek time and Number of Platters.
From the point of view of this Blog, the key difference between a SAN and DAS is that a SAN can be shared across multiple Servers with ease. A DAS is really only connected to one active server at a time.
SANs generally offer the following advantages over DAS
1 and 2 are generally linked, since the cost reduction is mainly due to better utilization of the actual space on a disk drive. Though of course having less boxes and power supplies etc. and even lower air-conditioning requirements makes SANs cheaper.
3, 4 and 5 relate to the fact that you have everything in one box, so you can easily switch things around as required.
Before we go much further, I must say that SANs really are the greatest thing since sliced bread, and every big organization should be using SANs; just not for Navision.
So lets get to the issues. In any transaction bases Database environment, pure disk through put and thus posting speed comes down to the number of spindles and the positioning of heads on the platters. If we look at a typical medium to large Navision SQL setup (say 100+ gig db and 100+ users) , we could have a scenario where 16 drives are configures as 8 RAID 1 pairs and there are 8 ndb files on those drives, 1 per RAID array, or we might have one big NDB file and the 16 drives are configured as a large RAID 10 array.
There are advantages and disadvantages of both, but that is not relevant to this discussion. The key issue here is that the database is accessing 8 spindles. This means that we are writing 8 streams of data in parallel, so the transfer rate (and thus commit speed) to this array is about 8 times faster than if we had one large RAID 1 drive. Now lets say the Database is 100Gig, and we have say 16 x 73 gig drives, a total of 1,168 Gig of which we use less than 10%. We do this, because half of this is used for redundancy and really we are using about 20% of 500gig of mirrored space. We do this for performance. And I think most people are aware of this. OK, so if we have all the spindles we need, what can we do with that free space. Well not much; we might say that the Navision server is dedicated to SQL on Navision and we don't want any other applications running on that server. But that is not the real issue. What is the issue, is that we want the 8 spindles all being used by Navision. If we decided to put something else on the drives, even a small application, then the data through put is no longer dedicated to Navision, so the system will slow down.
In general this is pretty easy to follow, and has applied since the early days of Navision and equally applies to SQL. What is more complex a concept to grasp is that of the LOG file. So lets look at the log file.
The first thing we can notice about a LOG file, as compared to the database, is that the drives are very much quieter. Now typically in our scenario, we might have one or two log files either on a RAID 1 or RAID 10 array with 2 or 4 drives. Again lets no go into the difference between one or two log files, for now, but the principle in general. So lets say we have 2 log files on one RAID 1 array:
The log file works very different to the Database. A log file is written and read from sequentially, so the number of spindles really does not have a huge impact on performance. What really matters is keeping activity on the drive as low as possible so as to reduce the effects of latency and seek time. In fact if we can keep the heads writing a single linear transaction, then write time will be a lot faster. The reason is that if the heads move then we have issues with seek and latency to get back where we were. So the key to the log file is to have that spindle and head working ONLY on Navision's log file. Even the smallest file on the drive being accessed will degrade performance significantly.
In an ideal scenario. The head sits virtually still, and the drive rotates. Assuming Latency is matched to rotation speed, the drive will simply continue to write to the track, one it completes a full write, the head moves only a miniscule amount to the next track and continues on.
Now lets look at a scenario where we share a small part of the drive with another program. We have performance issues to contend with now.
Each time we access the other program, the head has to move, AND we have to wait for the platter to rotate to the correct position. Then we have to do this again to go back to the log file. In this scenario, you can increase your seek time by 20 times which will absolutely kill SQL performance.
Basically what I am trying to indicate here is that if you want Navision to really fly, (and this applies to Native as well as SQL) then you MUST have dedicated multiple spindles for the database, and a dedicated head for the transaction log that does nothing but the transaction log.
But of course, by now everyone is asking what this has to do with Sans. So lets get to that. OK, the point is that there is nothing wrong with a SAN per se. The issue is in configuring the SAN, and in that there are three issues.
I have done a lot of Navision implementations, and I have recovered a lot of disasters for users. I must say that in my many years, I have NEVER once seen a SAN that was correctly configured. Generally due to one of the three reasons above. 1 and 2 really fit in the same box, and often the IT department truly believes that the SAN is configured to have dedicated multiple spindles for the Database, and dedicated heads for the LOG file, where as in reality the LUNs are actually spread over multiple drives without any specific control of what is placed where. So even though they thing that the LUN they are representing as a RAID 10 drive is made up of 16 drives, it may be made up of just something that the SAN represents as 8 drives. And in fact the LOG file may be sitting on the end of one of the drives that has a part of the database on it.
But the killer in most cases is 3/ above. Often its hard to tell the bean counters that of the 1 Terra Byte of disk they bought, that they will only use 100Gig and the rest will just sit idle. So suddenly we find that a month after Navision go live, things have slowed down. With the way SANs work, the Navision specialist will look at the drives, and see 16 drives dedicated to the DB, 4 drives dedicated to the LOG file, a TEMP DB sitting on its own RAID 1, but for some reason 2 DB parts are very slow, and the log file is crawling. The client will blame Navision, and it will take a while before someone discovers that the SAN is being used to Host Exchange and Active directory, since all this is totally invisible to the Navision SQL server.
Yes SANs can be used in a Navision environment. But if you are going to use a SAN, then be fully aware of all the implications, most importantly you need a SAN dedicated to Navision, because if it is not, then eventually somehow you log file will share a spindle with a 200 user Exchange server and performance will go down, locking will be more prevalent, and the Client will say that it's a Navision error.
So if a client wants to go for a SAN then I just ask for two things.
PS and for those who haven't guessed yet. I am currently working with three separate Navision Clients each that have performance issues, and so far only one of them has acknowledged that the SAN is probably not configured correctly. They are (I hope) reading this blog J
Are they gone? I can still see them!
Where have all my pictures gone?
no thats not my concern. My concern is that there are people out there that think you can just install what ever you want on a SAN, and the SAN will take care of everything, and they don't need to care about the configuration.
SANs can work magic, BUT only if configured correctly.
In terms, NAV and virtualization, I think Mark is the expert there, he has a blog about it.
If I understand your concern correctly, your concern is with clients deploying the full environment on a SAN? We are looking to implement Nav and intending to deploy the Nav application server on VM ESX from SAN and then the databases on a seperate high spec shared sql2005 server - what do you think?
Your're missing one very important thing.
SAN DOES cache transaction log writes, while locally attached disk does not, because SQL Server forces transaction log writes to go straight to disk. This enforcement is often ignored by lower level SANs, bigger one gives you control whether to cache writes or not, and how much RAM use to cache writes to SAN.
Just yesterday I got a notice from one of my contacts, who has recently changed to a SAN solution. We're talking about a native NAV DB with about 200GB (!!!).
(the implementation of the SAN was the first step to migrating to SQL Server (2H2008))
Here just some performance figures:
Backup DB (FBK): before 8.5 hours, after 45 minutes
Restore FBK: before 52 hours, after 5.5 hours
Generally transactions perform 50% faster, blocks almost vanished
Hello everybody i think i had say that David is allright with his meaning. We bought a HP Bladecenter with an EVA 4100. That were before 8 weeks - now we got a EVA 8100 from HP because our Navision is so slow. Our old system runs on a Siemens Primergy P250 with Dual Xeon, 4GB Ram and 2 diskshelfs with 24 HD in Raid1. Now with the same database on the EVA we can create faster an empty databasefile but when Navision starts to create the index of the files it will take about 2 days on the new system. I don't hope that David is right but if we spend much moeny and time for ***.
Actually the drawings are taken from actual notes that I had made whilst explaining SANs and the importance of dedicated spindles to a client.
I have lots of such meeting notes, so you can expect more of these types of Blogs.
Basically my Blogs are normally based on real world experience, with actual customer scenarios.
I don't agree that this is mere theory, and I also don't agree that it doesn't apply to all SANs. Regardless of which SAN manufacturer you use, you have to set it up properly. I've seen very slow NetApp because it wasn't properly configured. I even have a customer that have their SAN on RAID 5 that claim that their performance is excellent. I showed them where I/O's were a problem, their SAN people modified the setup and now the system is flying. That doesn't mean that everyone should go out and use RAID 5, it only means that they have set it up properly.
Just make a general statement that this blog is not accurate is a bit... well... inaccurate :)
This blog explains the basics in an excellent way (except those drawings... ;)), and it even specifically states that SANs are not bad IF you configure them properly.
I think you missed the jist of this Blog, basically the issue with SANs comes down to the fact that its just too tempting to cheat a little. It doesn't matter how great the SAN is, (and yes I have worked with some very big NetApp SANs that absolutely fly.
But its important for you to understand that you can't have a 200 user Exchange Database sitting on the same spindles as your Transaction Log, and not expect performance issues. You need to have a rethink, if you think its OK to do this.
Of course a high performance NetApp box is going to out perform a simple RAID 10, but once the bean counters decide to use all that spare disk space, believe me performance will drop.
I am currently working on a NetApp SAN and a very high end IBM SAN that are just crawling becuase they have been configured wrongly.
Its very important that the clients know that these spindles need to be dedicated no matter what amount of magic iis going on in the back ground.
I am not sure if this is all acurate.
If you take a NetApp, EMC or EVA storage there is a different virtual layer in between.
I have a couple of customers running on high performance EVA and NetApp's. In this case the SAN decides what goes where and does a better job than a person can do.
This theory is only for the classic SAN solution.
Hey David, nice post. But did you hire a 5 year old to do those drawings?
It can be difficult to argue with come customers about hardware recommendations. The most common comment is “What I do with the free disk space?”
From an actual conversation with a customer:
"We have a 150000 dollar SAN running Sharepoint with 500 users, all on VMWare machines, and it isn't even breaking a sweat"
and a number of other applications, all of them running on VMWare, many of which with SQL Server databases. They pretty much laughed at my recommendations. I started explaining the very same thing and got interrupted.
You can't win these arguments, and I asked the customer to please let me know if it will run properly. I am thinking about just keeping my mouth shut when it comes to hardware considerations.