Learning to say NO

Thank you to Kendra Little who is hosting this month’s T-SQL Tuesday topic on “Interviewing Patterns and Anti-Patterns” – and what advice I might have for someone preparing for an interview. But instead of regurgitating the obvious advice you can find in other places, I’d rather talk about my own personal experiences, in-particular about a specific dark period in my professional career and how I avoided jumping into yet another bad experience.

I have written about my career lows elsewhere in these pages (and if you are really interested, you can read about them further in this post: SQLRally and beyond), but for your benefit, let’s just describe this particular stage of my career as me having hit a professional brick wall. The early promise and excitement of my (then) new job had turned into daily dread and anxiety as I drove into work knowing that I had 8 hours of unhappiness ahead of me each day.

How did this happen yet again? Another position, another problem, and another difficult situation to navigate and I was starting to think that it really *must* be me. There must be something wrong with me to be experiencing problems all over again.

One of the beautiful things about a long commute to work each day is that it gives you time to think. Time to mull over your problems, and time to think about potential solutions. How did I get here? Why are things the way they are? And then it dawned on me – I had never said NO to any job offer EVER. When I first interviewed for this specific position I had some niggling doubts. They weren’t big doubts, but nether-the-less these numerous niggling little doubts existed in the back of my mind. They were formed from subconscious observations of characters that I had met in the interview, things they said, the type of environment I would be working in, the skill level and skill sets of the team, and the management style of the people who would ultimately control my destiny. But my conscious mind took control and I had accepted the offer. I told myself that this really could be a great move if my subconscious was wrong…

The problem was that by the time I interviewed for positions I was already unhappy in my current job and was ready to leave, so when offers came my way I was usually so grateful for a new opportunity (and fresh start), that regardless of whether the job offer was a great move or not I said yes. I didn’t value myself highly enough, and perhaps my confidence was taking a beating.

I started thinking back, and wondered how things would have worked out had I said “thanks, but no thanks” to some of those offers.

inthecar

I decided that I needed to learn how to say NO in order to understand my own worth, so I set out on a mission to reject a job offer. Now of course, had I found and been offered the greatest job in the world, then I would probably be forced to delay my mission a little longer, but knowing that the slightest doubt in my mind about a position (and anything less than punching the air over the salary package) would result in me rejecting an offer gave me great strength of purpose, direction, and momentum. If I got one offer then another offer would surely only be around the corner.

So I started applying for jobs and attended my first interview fairly soon after. The company was a relatively small player in the global market, but the opportunity sounded quite good, however, I was not punching the air and there were a few minor niggles regarding some potential responsibilities that I’d have to take on. Whilst the position was an (apparent) improvement on the one I was currently doing, I decided that I needed to put my plan into effect.

I said NO.

I suddenly felt empowered and respected myself just a little more for doing so. I needed to do this again!

Within weeks I had been offered another position, and that did not tick every box. I readily turned it down and was starting to enjoy being able to choose the turns in my own career path rather than stumble blindly down it.

Several more offers later I eventually received one that I absolutely could not refuse and the rest (they say) is history. That is not to say that every move I make works out, but I am doing them for all the right reasons. Not through desperation to get out, but because they have potential. Now I frequently reject offers that are either not good enough for me or not suitable for me. I understand my own self-worth a little more and understand how (and when) to say NO.

Posted in Personal Development, SQLServerPedia Syndication | Tagged | 2 Comments

Using conditional COUNT(*)s

Whilst recently working with historic financial data, I ran across a situation that needed an aggregate view of transactional data grouped by a certain set of attributes in order to backfill some missing aggregate data sets. In front of me, I had transactional data from time immemorial along with a series of pre-built (validated) historic aggregates which had been created from a different (but now unknown) process.

My mission then, was to understand how this source transactional data must be aggregated so that it could be compared against the sets of pre-built aggregate data. Once I validated the process, I could (re)populate the missing aggregate tables in confidence.

The source transactional table (simplified for brevity) consisted of:
Transaction_Date, Credit_Card_Type, Transaction_Type, and Sales_Value.

The aggregated destination table consisted of:
Transaction_YYMM, Credit_Card, Number_Of_Sales, and Sales_Value

The task in itself appeared relatively trivial and looked to require a grouping on Transaction_YYMM (a calculated field from Transaction_Date) and Credit_Card_Type, with Number_Of_Sales being a simple COUNT(*) of each transaction. Therefore Sales_Value appeared to be a simple SUM of the transaction Sales_Value -easy right!

If we run the following query:

SELECT
	CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(max)) AS 'Transaction_YYMM',
	Credit_Card_Type,
	COUNT(*) AS 'Number_Of_Sales',
	SUM(Transaction_Amount) AS 'Sales_Value'
FROM #Transactions
GROUP BY
	CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(max)),
	Credit_Card_Type
GO

We get the following results:

results1

When I came to compare the results against aggregated data that I had, I noticed that the values were off and it became fairly obvious that the transactional data also contained refunds and rebates (positive values but logically reflected as negative by the Transaction_Type status) and these were not just causing inaccuracies for the SUM on Sales_Value, but were also causing the COUNT for Number_Of_Sales to be wrong. In other words, refunds and rebates must be removed from the SUM total and not aggregated in the Number_Of_Sales columns. Now at this stage, you might be thinking that we can do this by a simple WHERE clause to filter them from the aggregates, but not only is it wrong to “throw away” data, I realised that my target tables also contained aggregate columns for refunds and rebates.

Therefore our target table now consists of the following columns:
Transaction_Date, Credit_Card_Type, Transaction_Type, and Sales_Value, Number_Of_Refunds, Refund_Value

Conditional COUNT

In order to achieve this goal from our source data, it is obvious that we need a conditional statement to aggregate the payment data only within the payment columns and aggregate the refund data only in the refund columns. Implementing this logic within the SUM statement was easy and was as follows:

SELECT ...
SUM(CASE WHEN Transaction_Type IN ('PAYM') THEN Transaction_Amount
ELSE 0 END) AS 'Sales_Value'
FROM ...

In the function above, SUM will simply aggregate all Transaction_Amount(s) if they are of Transaction_Type PAYM, otherwise it will consider that specific value as zero (thereby excluding it from the aggregation). When it came to the COUNT statement, things were not quite so obvious. I confess to mostly using COUNT in its COUNT(*) guise in the past, but imagine my surprise after browsing the online pages for the COUNT statement, I notice the expression keyword (see below).

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

If expressions are also permitted inside the COUNT function (like SUM) then perhaps I can do something similar to the way I calculated the conditional SUM?

I tried the following function call for that aggregation:

SELECT ...
COUNT(CASE WHEN Transaction_Type IN ('PAYM') THEN 1
ELSE 0 END) AS 'Number_Of_Sales'
FROM ...

And while this code code ran, it did not make any difference to the counts returned (let’s demonstrate this using the full code for payments and refunds):

SELECT
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)) AS 'Transaction_YYMM',
    Credit_Card_Type,
    COUNT(CASE WHEN Transaction_Type IN ('PAYM') THEN 1
	ELSE 0 END) AS 'Number_Of_Sales',
    SUM(CASE WHEN Transaction_Type IN ('PAYM') THEN Transaction_Amount
	ELSE 0 END) AS 'Sales_Value',
    COUNT(CASE WHEN Transaction_Type IN ('REVR') THEN 1
	ELSE 0 END) AS 'Number_Of_Refunds',
    SUM(CASE WHEN Transaction_Type IN ('REVR') THEN Transaction_Amount
	ELSE 0 END) AS 'Refund_Value'
FROM #Transactions
GROUP BY
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)),
    Credit_Card_Type
GO

We get the following results:

results2

We can quite clearly see that this time the Sales_Value and Refund_Value aggregates (using SUM) are correct (see the difference for BARCLAYSCARD data), but the counts are obviously wrong -in fact the “conditional COUNT” appears to just be ignoring the expression.

However I realised that I could use SUM instead of COUNT as a workaround to this problem as follows:

SELECT
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)) AS 'Transaction_YYMM',
    Credit_Card_Type,
    SUM(CASE WHEN Transaction_Type IN ('PAYM') THEN 1
	ELSE 0 END) AS 'Number_Of_Sales',
    SUM(CASE WHEN Transaction_Type IN ('PAYM') THEN Transaction_Amount
	ELSE 0 END) AS 'Sales_Value',
    SUM(CASE WHEN Transaction_Type IN ('REVR') THEN 1
	ELSE 0 END) AS 'Number_Of_Refunds',
    SUM(CASE WHEN Transaction_Type IN ('REVR') THEN Transaction_Amount
	ELSE 0 END) AS 'Refund_Value'
FROM #Transactions
GROUP BY
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)),
    Credit_Card_Type
GO

And this time we get the results we were looking for:
results3

It was only after getting a working solution that I revisited using a conditional count to perform the aggregation. After thinking a little more about what COUNT is actually doing (increments a “count for every item in a group”), my expression logic didn’t quite make sense. In that case COUNT was incrementing regardless of whether a 1 or 0 was returned, so instead of returning a 0, this time I decided to return a NULL. The code fragment is as follows:

SELECT ...
COUNT(CASE WHEN Transaction_Type IN ('PAYM') THEN 1
ELSE NULL END) AS 'Number_Of_Sales'
FROM ...

The results were a success! However we are not entirely done since this code could be refactored further to remove the ELSE clause since a NULL would be implicitly returned if the CASE statement did not find a match.
The final solution is as follows:

SELECT
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)) AS 'Transaction_YYMM',
    Credit_Card_Type,
    COUNT(CASE WHEN Transaction_Type IN ('PAYM') THEN 1 END) AS 'Number_Of_Sales',
    SUM(CASE WHEN Transaction_Type IN ('PAYM') THEN Transaction_Amount
	ELSE 0 END) AS 'Sales_Value',
    COUNT(CASE WHEN Transaction_Type IN ('REVR') THEN 1 END) AS 'Number_Of_Refunds',
    SUM(CASE WHEN Transaction_Type IN ('REVR') THEN Transaction_Amount
	ELSE 0 END) AS 'Refund_Value'
FROM #Transactions
GROUP BY
    CAST(FORMAT(Transaction_Date, 'yyyy-MM') AS VARCHAR(7)),
    Credit_Card_Type
GO

And the results are identical as before.

Should we use SUM or COUNT?

While my head tells me that using the COUNT operator to perform conditional counts is the purest (and most logical) solution,  I personally think it is less readable (despite being less code) for the layman and instead quite like the SUM approach since it is self-explanatory.

Furthermore, while the query plans of both approaches appear to be identical upon first glance, I noticed that the COUNT approach had (very) slightly higher query costs. I haven’t compared execution times or other performance metrics, but I suspect there will be little difference between them. In other words, use whatever works for you!

Summary

After using the COUNT operator for well over 20 years, this exercise has proved to me that there are still functional nuances lurking that will bite me from time to time. This is not a SQL Server problem, it’s just a ME problem (in not always understanding operations correctly)!

If you would like to try this out yourself I have provided the setup code below:


CREATE TABLE #Transactions
	(Transaction_Date DATE,
	 Credit_Card_Type VARCHAR(16),
	 Transaction_Type CHAR(4),
	 Transaction_Amount INT)
GO

INSERT INTO #Transactions VALUES
	('2010-11-14',
	 'BARCLAYSCARD',
	 'PAYM',
	 12000
	),
	('2010-11-14',
	 'GOLDENFISH',
	 'PAYM',
	 10000
	),
	('2010-11-14',
	 'BARCLAYSCARD',
	 'REVR',
	 12000
	),
	('2010-11-15',
	 'BARCLAYSCARD',
	 'PAYM',
	 9000
	),
	('2010-11-18',
	 'AMERICANEXPRESSO',
	 'PAYM',
	 9000
	),
	('2010-11-18',
	 'AMERICANEXPRESSO',
	 'PAYM',
	 5610
	)
GO

 

Posted in SQL, SQLServerPedia Syndication | Tagged | 2 Comments

Compressing files from the SSIS Script Task

I hate SSIS. It seems to me that it is full of certain nuances and unless you are regularly developing SSIS packages, they are easy to forget or it is easy to miss specific important steps. I first started using SSIS back in 2005 when it was directly introduced to replace DTS, but even today I am constantly going around in circles whenever I have to return to write certain functionality.Therefore I have decided to put together a “Bitesize” series of posts that encapsulate simple operations in order to help not just you, but more importantly, remind me! Hopefully, this will save me time in the long run…

Ok, so before I get started I will caveat this quick post by saying that there is an easier (or preferred) way to perform compression on files in SSIS using the ZipFile Class in the recent version of the .NET framework, this is sadly not available if you are using any legacy deployments.

While I have not spent ages adapting the following code to perform what should be a simple thing to do, I have spent enough time to justify sharing what I have done so hopefully you will also find some benefit from the code below too.

    Before you get started, add in the namespaces System.IO and System.IO.Compression into the Namespaces region of your script task script.
    Define the static Compress method which takes a FileInfo type parameter (the FileInfo Class is essentially a wrapper class created from a file path).
    Utilize the Compress method by instantiating a new FileInfo object from a source file name.
#region Namespaces
...
...
using System.IO;
using System.IO.Compression;
#endregion

public static void Compress(FileInfo file)
{
	//Compress file
	using (FileStream originalFileStream = file.OpenRead())
	{
		if (File.GetAttributes(file.FullName) != FileAttributes.Hidden & file.Extension != ".gz")
		{
			using (FileStream compressedFileStream = File.Create(file.FullName + ".gz"))
			{
				using (GZipStream compressionStream = new GZipStream(compressedFileStream,
				   CompressionMode.Compress))
				{
					originalFileStream.CopyTo(compressionStream);
				}
			}
		}
	}
}

public void Main()
{
	// get the source file 
	string inputfile = (string)Dts.Variables["$Project::TemporaryFile"].Value;

	FileInfo file = new FileInfo(inputfile);
	Compress(file);

	Dts.TaskResult = (int)ScriptResults.Success;
}

In the code above I simply use a SSIS Project variable to pass in the required file path to create the compressed archive. Obviously you should modify the code to meet your needs, but his is enough for you to get started.

If you are interested in the reference article that I have adapted (and simplified) to produce this code, then visit How to: Compress Files.

Posted in SQLServerPedia Syndication, SSIS | Tagged | 1 Comment