Thursday 31 July 2014

Poor performence in SSIS CDC Source component

QUick one... SSIS CDC Source component is producing a suboptimal plan.

The SSIS CDC Source component appears to be calling the fn_cdc_get_net_changes_<capture_instance> function using sp_executesql, but passes the from_lsn and to_lsn as an nvarchar(22). This results in a suboptimal plan and begins causing time-outs for what looks a trivial task. This is obviously only an issue with a high volume of changers in the capture processing range.

The same statement but with binary(10) parameters produces a much better plan and completes in a more acceptable time frame given the workload.

I suspect fn_cdc_get_all_changes_<capture_instance> also has issues, but given that the work involved is lower, I not had issues with this.

This was identified on SQL 2012 SP2 on VMWare. Our only workaround has been to massively extended the timeout setting.

Scripts Library

Probably my most valued tool is my script library.  This is a collection of scripts that I've written and collected and to help manage, audit, interrogate and repair SQL Server databases and services ever since becoming a DBA.  Actually, 'ever since becoming a DBA' is not quite accurate.  In my early days, I had not idea that a script I wrote or found might ever possibly be needed again.

I am proud to say that most of the scripts in my library are written by me.  I say most, because there are a few occasions when I've need something in a hurry and I did not have a script for it.  On these occasions, a quick goggle or a search through the many books I have will give me a script in no-time.  I sometimes add these to my collections, but often rewrite them into a style that meets my needs (like adding output columns, performing calculations or removing information I'm not interested in.)  The re-write is in part to get an understanding of the logic and internals, but also to optimize it for my needs and style.

I first kept my scripts just in my documents on the file system, but soon adopted the use of the script templates area in SSMS.  This made it quick and easy to use the script by double clicking it or dragging it into a query window.  I have considered using a SSMS Project, but have no source control plugins and never quite liked how they worked.  I have also considered using RedGates script library, I'm most attracted to it because its free.  But, I've not yet bitten the bullet and given it a try yet.  (Perhaps I will and I can then blog about my experience.)

Like I said... I'm proud most are written by me.  So I was feeling very uneasy with myself when one of my SQL idols blogged that we should not reinvent the wheel.  I do see why this might be good advice.  I don't know everything, and I don't always have the time to write a script from scratch.  Also there are may cases where a short simple solution will service.  However, I tend to be a careful DBA and I'm not just going to run something on my production server without first understanding it and testing it..  By writing my own scripts (especially the basics like backup and index management.) I firstly get a deeper understanding of how stuff works, why stuff works, and how to get the result I'm looking for.  A downloaded script written by a professional for backup might work great, but by writing it myself, I know I'm going to get the logic and features to satisfy my requirements.  I also will have a better understanding of how to tweak and tune it.  If I only ever relied on scripts written by others, how would I ever develop my skills in script writing and always get the right result.

So what is in my library?  Mostly query's to interrogate state and let me know what is happening on the instance.  These are script looking at things like current session requests, memory grants, file io and (of cause..) Wait Stats.  But I also have: templates for common task, feature demo scripts, audit scripts, and configuration scripts.  There are a number of scripts I've forgotten about or never used since I first wrote them.

So what I'm trying to say is, get your self a script library of your own.  They don't have to be your own scripts, but do take pride in any that are.  What is most important it that you know what they do before you run them.

Ownership Chainging

In SQL Server, when an object accesses another object, and both objects have the same owner, then permissions are not checked.  This has the effect of allowing a user with permissions to an object (such as a procedure) to read and write records to a table without any permissions directly on the table.

This behavior is known as 'Ownership Chaining'.  This is a unique feature to SQL Server, no other database engine has this behavior.  The behavior is intended to skip needless checks on every execution and so is a performance feature.  SQL Server also supports cross database ownership chaining.  But this is off by-default in modern day versions of SQL Server.

The owner of a object is defaulted to the owner of the objects schema.  You can specify another owner, but this is not recommended.

An Example of ownership chaining could be:  Matt owns table [Orders] and stored procedure [CreateOrder].  Matt grants Chris permission to execute the procedures.  When Chris executes the procedure, the procedure will attempt to insert into the table.  Chris is not able to perform any direct DML to the table.  But permissions checking will be bypassed because Matt owns the procedure and the table.  The insert is allowed and the procedure completes without error.

It has some desirable and undesirable consonances.  On the positive side, users do not need direct permissions to tables.  CRUD operations can be controlled with a layer of TSQL business logic and rules.

The biggest downside is that because permissions are not check, ownership chaining can circumvent deny's.

We can also use USERS WITHOUT LOGINS and PROCEDURE SIGNING to delegate and limit permissions, whilst still allowing functionality.

Policy Based Managment

When I first got my hands on SQL Server 2008, Policy Based management was once of the features I was keen to make use of.  Especially as they could be evaluated against older versions which made up most of my estate.  However, I soon learned that it was instance based unless being evaluated manually or via powershell script.  At the time, powershell scared me and I was looking for something automated to measure compliance levels.

I eventually discovered the Enterprise Policy Management Framework.  A free to use solution for evaluating SQL policy's across an enterprise and audit compliance.  It consisted of a simple database, a powershell script to perform the evaluations, and some SSRS reports to show the compliance.  The script made use of a central management server (which fitted in perfectly with me.) and was able to filter by policy category.

I used it for about 3 years across 2 employments and only ever made minor adjustments to the reporting layouts.  My biggest issues with it where that I wanted a daily audit of a few policy's, and the amount of space take by my small estate was outside of my management data storage quota.  To get by, I had to only keep a short amount of history.  I realize I could have build a rolling summary to keep a basic view of compliance.

At my next employment, my estate had more than tripled in size.  Also, historical reporting was much more important.  Rather than implement the solution again, I decided to re-invent it for my needs.

I first created a powershell script to run as an evaluation engine.  I again opted for a method that used a central management server, but built policy scheduling into the data model rather than filter by category.

I now have data for 25 policys mostly evaluated daily against 124 instances going back over 3 years in less than 300MB.

ACID properties

The ACID properties of a transaction affect the constancy and concurrency of database systems. It's important to be aware of these properties and how different systems and options affect them.

ACID stands for Atomicity, Consistency, Isolation, Durability.

Atomicity - A transaction must be atomic.  It must be all or nothing.  In SQL Server, a data modification cannot partially succeed.

Consistency - A transaction must not leave the database in an invalid state.  Rules such as null-ability and constraints must be valid.  SQL Server will ensure consistency and integrity of the data before allowing the transaction to commit.  However, it can only work with the schema rules as defined.

Isolation - A transaction must be isolated from other transactions.  The result of a transaction should not be affected by other concurrent transactions.  SQL Servers default transaction isolation level is READ COMMITTED and uses a locking model to prevent one transaction from being affected by other concurrent transaction.  However, READ COMMITTED is not the strongest isolation level and can experience two well know phenomena (More on this in a later email.)

Durability - A committed transaction must not be lost and must be durable even in the event of a subsequent error, crash or power failure.  SQL Server achieves this using the transaction log.  In fact, when a transaction is committed, the data is not written to the data file, but only the log file.  Data changers are written to the data files either on checkpoint or by the lazy-writer.  SQL Server will re-do an necessary transactions during the crash recovery part of start-up.

Saturday 5 July 2014

Taking Inventory

Its very important to know key bits of information about each database that we watch over.  Things like there purpose and RTO & RPO are obvious examples, but also things like maintenance window and data owner are vital to effective database services management.

SQL Server 2000 introduced the concept of a extended properties for SQL Server Objects (both below
and at the database level.)  This allows for custom name and value pairs to be attributed to
database objects.  There are many uses for these properties such as self documentation and control of maintenance (like index maintenance or backups).   However, our focus is on Database Level properties for the purpose of self describing.

My approach was to standardize the properties for describing the database into a policy covering both some mandatory and options properties.  By formulating them, there purpose and name was clear.  This formed a standard that I was able to roll-out and measure our compliance across all the databases.

Before I go through the properties I defined, it is important to note that I also encouraged developers to implement their own extended properties.

Mandatory Properties

These properties must be defined against all user databases within the production environment.

Authorisers

A semi comma separated list of email addresses authorized to make decisions on behalf of the database.
(such as authorizing changers or permissions.)  This is probably the most important property as it defines the data owners.  Note that I allow for more than one data owner for a database.

DatabaseType

Identifies the functional purpose of the database.  Valid values are: Internal Utility, Internal System, Customer
System.

Description

A description of the databases purpose and function.

NotificationList

A semi comma separated list of email addresses to be notified in the event of outages or system wide
configuration changers.

ProjectName

A descriptive and meaningful name of the project.


Optional Properties

These properties are considered optional.

ApplicationName

Name of the application.

ApplicationServers

A comma separated list of Application/Web Servers.

ApplicationURL

A comma separated list of associated URL’s.

BusinessOwner

The email address of the person(s) considered to be business owner and able to manage client
communication.

Clients

A list of clients utilizing the service.

MaintenanceWindowDays

Day of week restrictions for the maintenance window.  Values can be:
Sun,Mon,Tue,Wed,Thu,Fri,Sat,WD,WE,Any (comma separated for multiples)

MaintenanceWindowDuration

Duration of time allowed for the maintenance window. (formatted [d.]hh:mm)

MaintenanceWindowStart

Time of from which planned maintenance is acceptable. (formatted hh:mm:ss)

Platform

Used to signify a base platform (I.E. Liferay, DNN etc)

ProjectNumber

The internal project number assigned to the project.

ReviewDate

A date when the database should be reviewed for either archive or a new review date set. (formatted yyyymmdd)

RTO

The Recovery Time Objective for the database (formatted [d.]hh:mm:ss)

RPO

The Recovery Point Objective for the database (formatted [d.]hh:mm:ss)

SecurityClassification

The security classification of the database: Public, Internal, Confidential, Strictly Confidential

TechnicalLead

The email address of the person(s) considered to be technical lead and able to answer
 technical questions about the database/project.

Version

A version identifier (preferably using a dot notation such as 1.2 or 2.1.2)

Some self criticism

Email Addresses - The use of Email Addresses for specifying people can make it awkward to identify people.  However, the decision was made to allow automation notification of emails.

Offline Availability - By using extended properties, the information is only available when the database is online and accessible.  This can be a problem when your dealing with an outage and want to notify all the database notification lists.

A simple Indexing strategy

Indexing is very important to database performance.  But not everyone knows or cares how to do it right.  There are increasing options for indexes with every version, and code developers aren't going to be familiar with all the options and will end up make sub-optimal choices.

I like to recommend a simple 3 phase indexing strategy.  This is not aimed at experts or systems with high levels of optimization necessary.  These are general guidelines.  This is for those who are building databases, need them to perform, but do not understand SQL Server indexing.  The main idea behind them is that whist it might not be the most optimal indexing, it will perform adequately and not require additional tuning.

Phase 1 - Clustering key selection
I recommend every table having a clustered index.  You don't always have to choose this, because SQL Server will make your primary key the clustering key unless you specify otherwise.
(I also recommend every table should have a primary key, but that's a topic for another post)

The selection of a clustering key should exhibit all of the following principles:
  • Narrow - The combined size of the key should be kept to an absolute minimum
  • Static - None of the key values for a row should ever change
  • Unique - This makes the clustering key highly selective.
  • Ever Increasing - This reduces fragmentation, pages splits and creates a positive hot-spot.
  • Non-Null - This removed the need for a null bitmap

My recommendation is that every table has a surrogate key of integer type (int, smallint, etc) that is an IDENTITY field and is the primary key.  This meats all of the above principles and creates a suitable table structure.

Phase 2 - Index Foreign keys
I recommend creating a non-clustered index for every foreign key.  These are often useful for joins and play a very important role with deletes.

Phase 3 - Index Search Fields
Surrogate keys and foreign keys are not very useful to applications looking to list customers with a last-name of 'smith', or get the order lines using order number 'O43923'.  So we need to create indexes that are for the purpose of our workloads.

I can't tell you want these are, but start with business keys and natural keys.  This are usually unique within a table and have meaning outside of the database.  So things like invoice number and order number.  Also customer number and product code.

Also look for searchable fields, these are things you will put in your where clauses.  This phase is often based on the statements that you write (or that your ORM produces.)

Finally
There is much more to indexing that these simple 3 phases.  But by following this simple strategy, your off to a good start.

I hope be publishing more detailed posts about indexing in future.

First of many

I've never written a blog post before and am unsure how long or often I can keep this going.  But its something I've wanted to try for a few years now.  My sudden inspiration to start now came from reading 'Blogging-Write More' by Andy Warren.  He hit the nail right on the head.  I have a lot to say, but always feel that someone else has already said it and they said it better than I ever could. I also think I struggle to think of anything as complete.  So every time I have tried this in the past, I never got past the draft stage of the first post...

This time, nothing will stand in my way.  I aim to write 5 blog post by the end of next month (not including this one.)  After that, it depends on what response I get and how much I enjoy it.  I am a SQL Server DBA so this will be the focus of my posts.  I enjoy learning and using all things related to SQL Server, so expect some posts about SSAS, SSRS, SSIS, MDS, DQS, StreamInsight and even HDInsight.  I also have a .net developer background, so I might occasionally feel the need to post code or complain about developers (We DBA's have a special love hate relationship with developers...  We just love to hate each other..)

I've been looking for a way to share my passion and get involved with the SQL community for a few years now.  I began trying to share my passion for SQL by holding in-house short presentations and teaching sessions.  These where mostly a showcasing of technology or demonstrations of better technology's and techniques.  They never meet my own standards and I never got the enthusiasm I was looking for from my captive audience.

My first SQL Bits in London left a big impression on me.  It was amazing, I very soon wanted to become a speaker at the conference.  Time passed, and I did nothing about it.  I was busy, and I was scared. Worked piled around me and it remained a distant dream.  One SQL Bits later, I was offered a chance to speak at a SQL Group.  I would have to wait a few months for a slot, which surprised me.  But, nervously, I accepted and soon began to feel excited.

However, it was not to be.  Sadly, my circumstances changed and I had to decline the opportunity.  I only hope another opportunity comes my way one day.

So this is where I am now, and why I'm starting to blog.  I'm a SQL Server DBA with a desire to share my passion for SQL Server.  Here and now, writing a blog seems the most suitable way for me to try and share my passion with others.

This post will undoubtedly be the most boring...  Sorry about that..  But things can only get better..