Data Compression

Just recently Michael De Voe from the MS Dynamics NAV team has published a great article about the DATA COMPRESSION – a feature introduced with SQL Server 2008 (Enterprise feature).

I highly recommend to read this article first (Thanks a lot, Michael!):
http://blogs.msdn.com/b/nav/archive/2011/02/11/microsoft-dynamics-nav-and-sql-server-database-compression.aspx

There are also several discussions about this feature on http://dynamicsusers.net and http://www.mibuso.com

But I'd like to follow up and add my thoughts and experiences herewith.

With DATA COMPRESSIONPAGE or ROW – the data is physically compressed on the data/index Pages; and the compression rates are amazing: at average you could experience compressions of about 80%! For example, this could mean that a table of 10GB is compressed down to just 2GB!
Thus, this could save a tremendous amount of disk-space! But the data is also stored in RAM in a compressed state, thus the available cache memory is used more optimized. Only if the data is queried by a client, then the data gets uncompressed – and this is causing additional load on the CPU, and that's the only potential downside of this compressions.
If a SQL Server is properly sized according to minimum requirements and best-practices, then CPU is hardly a problem with NAV – I see quite many systems where the average CPU load is something between 5 and 15 percent. Hence, before implementing any compression, you need to know your CPU time baseline! If you have enough spare CPU capacity then go for the compression.

Even with intensive usage of PAGE (or ROW) compression you may experience an increase of CPU load for about 5 to 15 percent; means if your baseline was at average of 10% before, then it would be up to 20/25% after compression.

So whenever you implement any compression it is advisable to do it step by step, always having an eye on the CPU load!

Besides saving precious disk-space the compression could also reduce I/O issues: if data is compressed, e.g. for -80%, then more information could be stored on the Pages; e.g. if 20% of the original space is used, then 5 times more information could be placed on the same Page (yeah, I know that's not really correct, but just to give an example). Hence, with every I/O more information could be retrieved, thus less I/O is necessary. This affects especially queries where huge amounts of data are queried.

Best practice for compression candidates are tables or indexes which are not heavily modified. Means, once a record has been created it should not be updated much. Michael recommends to only compress such objects with a Read/Write ratio of 80:20 at maximum.

Classic candidates are all those "data graves" we have in NAV: posted Ledger Entry tables, posted Documents, posted Dimensions etc. – those tables are usually doomed to grow and grow and grow, gazillion of records are added/created which are not changed that much afterwards. (Remark: it is not a problem if once in a while an "Open" flag is changed, or a "Due Date" adjusted etc. – piece of cake)

Never implement compression on "hot" tables where data is permanently inserted, changed and deleted, e.g. active Documents or Dimensions, or Posting Journals.

But it also depends on where the data insertion or change happens: if data is inserted in between existing records, this could be more of a problem than if the records are added to the end of a table. Again: normally all those mentioned "data graves" indeed add records sequentially due to the Entry No. or No. Series assigned – perfect candidates!

To be sure, the script published by Michael helps to determine the Read/Write ratio.
Please find attached to this article my slightly modified version of this script; which I'd like to explain further on:

 

Server Uptime:

-- Server Uptime
select
cast((datediff(hh, create_date, getdate()))/24 as varchar(3)) + ' days, '
     +
cast((datediff(hh, create_date, getdate())) % 24 as varchar(2)) + ' hours'
     as
[SQL Server Service Uptime]
from
sys.databases where name = 'tempdb'

go

To get representative statistical data about the Read/Write ratio the SQL Server service should have been running for a decent period. The query above displays how long the current Server Uptime actually is. Alternatively you could simply check the creation date of the "tempdb" in Management Studio (database properties) or look at the most recent SQL Error Log.

 

Querying the candidates:

-- Read/Write Ratio - Compression Cadidates
SELECT
--o.id,o.name,
       x.name as "index_name",

       --i.index_id,
x.type_desc,s.rowcnt,

      i.leaf_update_count * 100.0 /
          (i.range_scan_count + i.leaf_insert_count
              + i.leaf_delete_count + i.leaf_update_count
              + i.leaf_page_merge_count + i.singleton_lookup_count
          ) as [Writes %],

      (i.range_scan_count + i.singleton_lookup_count)* 100.0 /
          (i.range_scan_count + i.leaf_insert_count
              + i.leaf_delete_count + i.leaf_update_count
              + i.leaf_page_merge_count + i.singleton_lookup_count
          ) as [Reads %],

  CASE
    WHEN
x.type_desc = 'CLUSTERED' THEN 'ALTER TABLE [' + o.name + '] REBUILD WITH (DATA_COMPRESSION = PAGE)'
    WHEN
x.type_desc = 'NONCLUSTERED' THEN 'ALTER INDEX [' + x.name + '] ON [' + o.name + '] REBUILD WITH (DATA_COMPRESSION = PAGE)'
    WHEN
x.type_desc = 'HEAP' THEN '-- No compression on Heaps!'
    ELSE
'--'
  END
[TSQL (Compression)]
FROM
sys.dm_db_index_operational_stats (db_id(), NULL, NULL, NULL) i
    JOIN sys.sysobjects o ON o.id = i.object_id
    JOIN sys.indexes x ON x.object_id = i.object_id AND x.index_id = i.index_id
    JOIN sys.sysindexes s ON s.id = x.object_id and s.indid = x.index_id


I have added the column "TSQL (Compression)" which displays the brief syntax to apply the compression. For CLUSTERED INDEXES it's "Table Compression", for NON-CLUSTERED INDEXES it's "Index Compression". I highly recommend not to compress any HEAP (= table without CI); normally we don't have "Heaps" in NAV, except if you have a serious bug or maybe some system tables.
This TSQL also uses PAGE compression, which is feasible in most of all cases. Michael advises to use ROW compression instead if the impact-difference is less than 10%. This could be the case, e.g. if tables are containing BLOB fields which cannot be compressed. Again: this TSQL is just a proposal – change on demand!

[

Comparing Compression rates (Example):

EXEC sp_estimate_data_compression_savings
@schema_name
= 'dbo'

,@object_name =
'CRONUS International Ltd_$Item'
,@index_id =
1
,@partition_number =
NULL
,@data_compression =
'PAGE'

GO

EXEC sp_estimate_data_compression_savings
@schema_name
= 'dbo'

,@object_name =
'CRONUS International Ltd_$Item'
,@index_id =
1
,@partition_number =
NULL
,@data_compression =
'ROW'

GO

Means PAGE compression reduces from 408KB down to 272KB (ca. -34%), while ROW compression goes from 408KB down to 288KB (ca. -30%). Thus, the impact difference is 4%, so better go for ROW compression in this case.

]

 

WHERE (i.range_scan_count + i.leaf_insert_count
       +
i.leaf_delete_count + leaf_update_count
       +
i.leaf_page_merge_count + i.singleton_lookup_count) <> 0

    AND objectproperty(i.object_id,'IsUserTable') = 1

    -- Thresholds
    AND i.leaf_update_count * 100.0 /
          (i.range_scan_count + i.leaf_insert_count
              + i.leaf_delete_count + i.leaf_update_count
              + i.leaf_page_merge_count + i.singleton_lookup_count
          ) < 20 -- Write ratio less than 20%

    AND (i.range_scan_count + i.singleton_lookup_count)* 100.0 /
          (i.range_scan_count + i.leaf_insert_count
              + i.leaf_delete_count + i.leaf_update_count
              + i.leaf_page_merge_count + i.singleton_lookup_count
          ) > 80 -- Read ratio greater than 20%

      )

    AND s.rowcnt >= 500000 -- filter on large tables/indexes; change threshold

 

I have added the some filters to reduce the output. Here I'm filtering on Write and Read ratio, according to Michael's recommendation. I also suggest to filter on the number of records, e.g. to compress the larger tables first. Change the filters as required.

 

ORDER BY s.rowcnt DESC
GO

Last but not least I've added a descending sorting order to have the large tables/indexes on top.

 

List of compressed objects:

-- Compressed Objects
SELECT
o.name Table_Name, i.name as Index_Name, x.type_desc,
  CASE
    WHEN
p.data_compression = 1 THEN 'ROW'
    WHEN
p.data_compression = 2 THEN 'PAGE'
    ELSE
'ERROR'
  END
Compression_Type,
 
CASE
    WHEN
x.type_desc = 'CLUSTERED' THEN 'ALTER TABLE [' + o.name + '] REBUILD WITH (DATA_COMPRESSION = NONE)'
    WHEN
x.type_desc = 'NONCLUSTERED' THEN 'ALTER INDEX [' + x.name + '] ON [' + o.name + '] REBUILD WITH (DATA_COMPRESSION = NONE)'
    ELSE
'--'
  END
[TSQL (Undo Compression)]
FROM
sys.partitions p
    JOIN sys.objects o ON p.object_id = o.object_id
    JOIN sys.indexes x ON x.object_id = p.object_id AND x.index_id = p.index_id
    JOIN sys.sysindexes i ON o.object_id = i.id AND p.index_id = i.indid

    AND p.data_compression in (1,2)
ORDER
BY o.name, p.data_compression, p.index_id

GO

I just added another "TSQL (Undo Compression)" column with the syntax to undo the compression if necessary.

 

Implementing the compression:

Well, just copy & paste the "TSQL" output to a new query. I'd rather execute the compressions line by line, or add some GO between the commands. Depending on the Server Sizing (CPU, RAM, Disks) and the object size it could take a couple of seconds or several minutes to compress (or decompress) a table or index; so I'd rather implement this outside business hours.

 

Thumbs Up!

In my opinion this DATA COMPRESSION is the coolest feature available in SQL Server 2008! It is only available with Enterprise Edition, but this is IMHO really great value for the higher license fee (in addition with Resource Governor, Asynchronous Mirroring and Database Snapshots)!

Happy compressing! Your feedback is very much appreciated!

PROVIDED AS IS. USE AT OWN RISK. NO WARRANTY, NO GURANTEE, NO SUPPORT.

Bookmark and Share

Comments

# re: Data Compression

Great addition, Jörg.

Tuesday, February 15, 2011 12:29 PM by Luc van Vugt

# re: Data Compression

For me the biggest issue is testing.

Most clients have Standard Edition, so how do you test to see if the benefit is worth the cost to move to Enterprise.

By the way, great blog post, thanks.

Tuesday, February 15, 2011 2:05 PM by David Singleton

# re: Data Compression

That's true. Actually the major "saving" due to compression is disk space: so maybe you could compare "storage costs" vs. "license costs"?! But this is usually only an issue for LARGE databases ... . The potential I/O impact really depends on the queries ... .

I don't think that just the compression feature alone is reason enough for EE, but it's something to consider when selecting the SQL Server edition!

Tuesday, February 15, 2011 2:20 PM by Jörg A. Stryk

# re: Data Compression

Great! But - our server have enough ram (>64GB, databasfilesize compressed 35 gb) to hold the complete database in ram. Is the compression feature usefull in this case?

Wednesday, February 23, 2011 4:12 PM by Hubertus Hettenkofer

# re: Data Compression

Well, first of all, it is not fully true that really the full database is cached whenever the RAM is bigger than the database ... but anyway.

Regarding the Data Compression: well, I guess your db will grow! So maybe with DC you won't experience any benfeit right now, but maybe in future. So it might help to prevent problems before they actually arise ...

Wednesday, February 23, 2011 5:16 PM by Jörg A. Stryk

# re: Data Compression

Hi Jörg,

thanks for this very useful article and the provided scripts. We'd like to use it but in our appr. 80 GB database we only receive a compression of roughly 2%, though it contains data for nearly 10 years, so there should be quite a number of tables which would be worth compression.

Any hint(s) what might be the reason for this?

Regards,

Jens

Friday, March 22, 2013 12:05 PM by Jens Kruse

# re: Data Compression

Hi Jens, thanks for your comment. WHAT did you compress to get this 2% rate and which method did you use? A PAGE compression on the typical "data grave" tables usually has a rate of 80% ... ?!

Friday, March 22, 2013 1:37 PM by Jörg A. Stryk

# re: Data Compression

Updated/corrected scripts. Thanks to MZA for the support!

Friday, April 19, 2013 8:01 AM by Jörg A. Stryk