This presentation was given at the SQLPASS 2010 Summit in Seattle and has been presented by Brent Ozar (of Brent Ozar PLF and SQLCruise fame) in several forms and variations at other SQL conferences such as SQLBits. Obviously the notes are most relevant when accompanying the specific source presentation video, however if you are not in possession of this, then you can should download the session he gave also in 2010 whilst presenting at SQLBits 7 in York (UK). I have now listened to his presentation around 3-4 times and have spent a little more time making them fully readable and usable without any accompanying material (just for you!) if that’s what you want…. Listen fella, it’s about what works for you best, right!?
Your CEO doesnt care that virtualization is slower!
Virtualization is not here to make things faster but is to make things cheaper.
It means less hardware, software, licensing, management time – but it almost never happens.
In virtualization, task manager shows you are running % CPU available.
How many virtual cores do we need? -many times the less virtual cores we have the quicker things will run, especially in the instance where virtual CPUs are underused.
How many idle real cores and how fast are they?
Hyper-visor will usually reuse cores when idle for different virtual machines and can tend to even reuse and wait for cores because it might then want to reuse CPU cache for those virtual machines, or it might decide to schedule on idle CPU cores and accept CPU cache miss.
Some cores share same level 3 cache like Nehalems therefore hyper-visor needs to know that it doesn’t matter which core.
Xeons have a power saving capability for when not under a heavy load all of time, therefore the virtual machine could slow down.
There is a HP Proliant DL385 bug where power saving gets stuck and processor does not run at specification.
CPU-z from cpuid.com can even use on virtual to show the box specifications (specification field) to show how fast CPU should go and then show the core (core speed) speed that it is actually going.
Always use newest version of hyper-visor.
Minimise the number of virtual cores used in a virtual machine -Brent has a blog post about this, which I believe is here.
Check the CPU speeds and cache with CPU-z, because if the SAN admin moves the virtual server across to another host then the performance capacity of the host hardware might change, providing worse virtual machine performance.
Strip out all unnecessary virtual hardware from the virtual machine configuration.
Remove all background services such as screen savers, Adobe Acrobat, filesync tools, and (perhaps) Antivirus?
Monitor for throttling of other guests.
Avoid affinity masking (only use in active nodes clustering).
If you are running Antivirus on hosts then you should still on guests. If you do then you should exclude data files, log files, trace files.
Brent has a SQL Server setup checklist on his blog, which is here.
Hyper-visor de-dupes and compresses memory in VMWare and XenServer – SQL memory generally cant be de-duped very well because it tends to consist of different pages in memory…therefore your shouldn’t really try to de-dupe.
The balloon driver fakes memory pressure in Hyper-V, VMWare and XenServer. The balloon driver lies to windows to put OS under memory pressure so that processes like SQL Server releases memory (and remember the SQL Server default minimum memory setting is zero!). Lock pages in memory is just for pages in the buffer cache and is probably not a good idea setting since SQL wont release those pages back so the result is that Windows will then get throttled and release its OS memory causing loads of OS paging (click here for more).
Set a memory reservation and think about having a high reservation i.e. 75%
Set SQL Server’s minimum and maximum memory appropriately and think about having quite a high minimum i.e. 50%
Monitor to make sure they’re not lying to you.
Avoid dynamic memory for SQL Servers today (fine for Virtual Desktops though).
Attend Allan Hirts Advanced session (at the SQL Pass 2010 Summit).
There are new perfmon counters provided for VMWare and they cannot be shut off.
With ESX3.5 they require a manual download and with vSphere 4 they are built in.
They require no extra permissions and VMWare sysadmins usually don’t even know about them…
VM Processor Counters
%Processor Time -shows % processor time for each core on a HOST
Host Processor Speed MHZ -speed can change if Admin moves you to slower host Live
Limit MHZ -VMWare sysadmins can set a MHZ limit to see that no cap has been set. It should be ZERO.
Reservation MHZ -monitor for CPU Reservation MHZ (reservation would prevent too many virtual machines).
VM Memory Counters
Memory Limit (MB) – shows limit on virtual machines so although virtual machines might look to have 16GB it might have limit set to 4GB which could cause the balloon driver to fire off all the time.
Memory Reservation (MB) – ensure this doesn’t change (especially down)! The reservation will prevent over using too many machines so the higher the better.
Memory Ballooned (MB) – anything other than zero means they are having to give up memory.
Memory Swapped (MB) – anything other than zero means they are having to give up memory.
SQL on SAN
SAN vendors provide documents on the correct way to configure the SAN for SQL e.g IBM have their Redbooks. For instance IBM has a document that explicitly states that RAID 1 or 10 must be used of database and logs.
Avoid SATA drives for databases.
They don’t have to do everything that the vendor states, but if they deviate you need to know WHY!
Put RAID10 everywhere but when you cant afford it, RAID 5 in this order:-
(Note logs are ALWAYS written sequentially and seqential writes suck on RAID 5)
- TEMPDB data files
- TEMPDB log files
- User database files
- User log files
- Backup target -this is actually one of the most important drives! (avoid RAID 0 )
Brent has more information for performant log files here. You should always only have one log file per database.
Data files should have ¼ to ½ TEMPDB data files per core according to Paul Randal.
SATA RAID 10 using 10K drives will usually outperform RAID 5 using 15K drives.
You should not thin provision (SAN) for log file drives because you need to guarantee the space.
Brent believes that thin provisioning for data drives is acceptable.
When using shared arrays for databases or logs, be aware that multiple logs on the same array will effectively cause Random IO and therefore will not be sequentially accessed meaning that RAID 1 or 10 is less important from that perspective.
Shared or Dedicated LUNs (where LUN = array)?
If you have all of the following, dedicated LUNs might be OK…
- A trusted SAN admin
- SAN performance monitoring software
- Correctly configured active/active multi-pathing
- More than a terabyte of data in one database
…Otherwise consider using a big shared pool of drives.
Share pools with dissimilar server load types, not similar load types and times
Know your neighbours backup and virus scan times.
Shared vs Dedicated LUNs (where LUN=drive letter)
Consider putting data and logs on the same LUN if :-
- You are using SAN snapshots and the manufacturer says to, and this needs to be designed from get go because your SAN vendor will have rules about whether data and logs can share same drive
- Because its all one big pool of drives anyway!
Consider using dedicated LUNs for data and logs if :-
- You are using filegroups and partitions
- Are considering filegroup backups
- You can use tiered storage (SATA+FC+SSD)
- Are going to micromanage performance (and thats a good thing!)
TEMPDB LUN guidelines (he skips over these slides)
Always use a separate LUN because you may need to improve its performance separately from the user databases and may need to avoid SAN replication for it.
Brent’s preference is for one big pool of SATA R10 for all his SQL Server TEMPDBs, because it is cheap, has a huge capacity and has similar load patterns.
You should always use the fastest interface you can, their speeds are :-
10GB iSCSI (<—I need to validate this Ed.)
The slower interface you have, the more you need multi-pathing.
Know your controllers connections.
Until you can prove connectivity isn’t the bottleneck don’t buy more drives!!