Category Archives: Programming

Querying Azure VM state with PowerShell

I was recently given the task of identifying the state of an Azure VM so that an automation script using the az vm run-command invoke would not fail if the VM was down or under a reboot.

I initially thought the task would be really easy and a simple query of the VM state using Get-AzVM would provide us with a running state property of the VM, but as it happens the state is a little abstracted.

Using Get-AzVM cmdlet

We will query our Virtual Machine running state using the Get-AzVM cmdlet using the -Status switch.

Get-AzVM -resourcegroupname "Demo" -name "server1" -Status

Running this command will return an object of type PSVirtualMachineInstanceView that we can use to test the status of the VM. The output to the above command is as follows:

ResourceGroupName          : Demo
Name                       : server1
HyperVGeneration           : V1
BootDiagnostics            : 
  ConsoleScreenshotBlobUri : https://ortodiag.blob.core.windows.net/bootdiagnostics-server1-cb292d11-c4c2-4abd-8f28-c0de31e68
010/server1.cb292d11-c4c2-4abd-8f28-c0de31e68010.screenshot.bmp
  SerialConsoleLogBlobUri  : https://ortodiag.blob.core.windows.net/bootdiagnostics-server1-cb292d11-c4c2-4abd-8f28-c0de31e68
010/server1.cb292d11-c4c2-4abd-8f28-c0de31e68010.serialconsole.log
Disks[0]                   : 
  Name                     : server1_OsDisk_1_6311f5e2cf24419eb27ac5c3734e952b
  Statuses[0]              : 
    Code                   : ProvisioningState/succeeded
    Level                  : Info
    DisplayStatus          : Provisioning succeeded
    Time                   : 12/05/2020 12:37:11
Disks[1]                   : 
  Name                     : server1_log_drive
  Statuses[0]              : 
    Code                   : ProvisioningState/succeeded
    Level                  : Info
    DisplayStatus          : Provisioning succeeded
    Time                   : 12/05/2020 12:37:11
Statuses[0]                : 
  Code                     : ProvisioningState/succeeded
  Level                    : Info
  DisplayStatus            : Provisioning succeeded
  Time                     : 12/05/2020 12:37:11
Statuses[1]                : 
  Code                     : PowerState/deallocated
  Level                    : Info
  DisplayStatus            : VM deallocated

This is quite good in that we have lots of information about our VM and associated resource states. We can see that it is a Generation 1 VM and see the state of our disks and our VM. More specifically we see the Statuses[1] element and it’s Code property to get our VM status.

The possible Code states are:
– PowerState/deallocated (VM is stopped and deallocated)
– PowerState/starting (VM is starting)
– PowerState/running (VM is running – the Azure Portal may show a VM as running before this state returns)
– PowerState/deallocating (VM is stopping)

We now have enough information to write our PowerShell code.

Querying Azure VM state

As touched upon earlier, we need to access the PSVirtualMachineInstanceView object and access the Statuses[1] element and it’s Code property. This gives us a $provisioningState value that we can test against our static state (in our case PowerState/running).

If the state is not running then we will keep looping with a 5 second wait. The previous state is tracked only for output sugar so that we will only write to the screen on a state change:

$vmName = "server1"
$resourceGroup = "Demo"
$lastProvisioningState = ""
$provisioningState = (Get-AzVM -resourcegroupname $resourceGroup -name $vmName -Status).Statuses[1].Code
$condition = ($provisioningState -eq "PowerState/running")
while (!$condition){
    if ($lastProvisioningState -ne $provisioningState){
        write-host $vmName "under" $resourceGroup "is" $provisioningState "(waiting for state change)"
    }
    $lastProvisioningState = $provisioningState

    sleep -Seconds 5
    $provisioningState = (Get-AzVM -resourcegroupname $resourceGroup -name $vmName -Status).Statuses[1].Code

    $condition = ($provisioningState -eq "PowerState/running")
}
write-host $vmName "under" $resourceGroup "is" $provisioningState -fore green

Results

The script is executed and we stop and (ultimately) start the server1 VM. The output from the script was as follows:
VM State
As can be seen, this simple backoff script correctly reported on shutdown and startup of our VM.

Conclusion

Whilst I do not pretend for one moment that this script is particularly clever (or even optimally written), it does at least demonstrate how easy it is to query our Azure VM state (with appropriate backoffs) through PowerShell so that we can accurately and predictably perform follow up actions on them.

Obviously this can be adapted for your specific use-case.

Removing and maintaining Azure resource group deployments based upon deployment count

Whenever you create or update an Azure resource, a new deployment is created under the resources’ configured resource group. This deployment history is retained ad-infinitum until you eventually hit the hard limit of 800 deployments (per resource group). You may think this figure is more than enough to accommodate all the possible resource changes that could ever be made in a resource group, but if you are running CICD pipelines to push out your Infrastructure as Code (IaC) (or create lots of resources per resource group) then it is very likely you will exhaust this figure very quickly.

Every time a release pipeline runs, regardless of whether you are changing resources or not, all configured and enabled deployments in the pipeline will result in a new deployment record. You can view all historic deployments in the Azure Portal for each resource group by selecting its Deployments item under the Settings pane (see below).


In the example above you will note that we only have 4 deployments that have been created in this resource group. When the hard limit is eventually hit, all subsequent deployments to that specific resource group will fail.

Microsoft’s solution

Microsoft provide a solution to this in the MS doc titled Resolve error when deployment count exceeds 800 which allows you to programmatically remove deployments (through Azure CLI or PowerShell Az) based upon a deployment date and this is made possible because of the Timestamp property. I have also seen many blog posts that simply seem to regurgitate this Microsoft code giving really just one solution – to maintain deployments based on date.

This is all well and good if your deployments span many weeks or months and that the counts are predictable, based on date-time, but what happens if you have highly active, highly unpredictable, or high number of resources per resource group?

Deployment count solution

Perhaps a far better solution would be to maintain a set deployment count that will allow each release to succeed each time. If you are only deploying a single resource, then clearly you would only have to ensure a spare deployment slot is available. If you are deploying resources through a CICD pipeline then you simply need to ensure that you have at least that number of resources in your pipeline available.

Running from Azure CLI or PowerShell Az command-line

If you are manually running the maintenance code either from a remote command-line session or directly within the Portal command-line itself, you will have to set your subscription context that you wish to maintain. We can do this easily in PowerShell by running the following code (ensuring that you change the subscription text to the one you wish to target):

$subscriptionName =  "ACMEPRODSUB"
$subscription = Get-AzSubscription -SubscriptionName $subscriptionName
Set-AzContext -Subscription $subscription | Out-Null

Once you have set your subscription you can then use the subsequent code (detailed in the Running from within an Azure DevOps Pipeline section).

Running from within an Azure DevOps Pipeline

I have generally found that running a maintenance step at the start of any infrastructure Release Pipeline is a good point of execution. It will reduce the time to cycle through and delete any excess deployments to a minimum, and will also ensure there are enough deployment slots to prevent release failure. For our pipelines, maintaining a deployment count of 700 is a good compromise -it leaves 100 spare slots for each run and plenty of past deployment history.

In the release pipeline, we can create an Azure PowerShell task within a release stage.

For convenience we can use our code as an inline script -though you may ultimately decide to parameterize the $retainCount variable and publish the script from a repo instead.

Use your common sense when setting the number of deployments you wish to retain.

$retainCount = 700

Since our Azure PowerShell task has explicit settings for the subscription that we wish to execute the script against, we do not need to worry about changing subscription context. All we are concerned about is the functionality of the code itself.

In the code below we are first looping through all resource groups in the current subscription context. For each resource group we return all its deployments, and for any deployment that is above the iterator threshold set by $retainCount (assuming there are any) -it will be deleted.

foreach ($resourceGroup in Get-AzResourceGroup){
    $resourceGroupName = $resourceGroup.ResourceGroupName
    $deployments = Get-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName

    write-host "Resource group" $resourceGroupName "has" $deployments.Count "deployments..."

    $iterationCount = 1
    foreach ($deployment in $deployments) { #deployments are returned sorted by age desc
        if ($iterationcount -gt $retainCount){

            Remove-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName -Name $deployment.DeploymentName | Out-Null
            write-host "   Deleted deployment on" $deployment.Timestamp -fore magenta
        }

        $iterationCount = $iterationCount + 1
    }
    Write-Host "   Resource group deployment maintenance complete." -fore green
}

This results in the following output:

If you are using an Azure DevOps release task to execute this code you will not see coloured text in the task output.

Conclusion

If you are manually maintaining your resource group deployments or wish to automate it through Azure DevOps, the timestamped solution provided by Microsoft may not fit your requirements given the frequency of your deployments or other considerations. Given that the deployments are returned in a time sorted descending order, we can easily delete deployments based upon the deployment count -always leaving enough space for future deployments and not removing those based upon date alone. Ensuring that this maintenance task is run prior to any automated infrastructure releases can improve the success rate of your release pipelines in highly active environments.

Inconsistent result sets and another case against pessimistic isolation

I’m a big fan of demonstrating when transaction processing goes bad in Database Management Systems because it is a reminder to us all to ensure that we know exactly what we are doing when we write our code. Not only should you test and execute your code multiple times in serial (for instance running in sequence in isolation), but you should also execute in parallel with itself or other potentially dependent transactions.

Elsewhere in this blog you will find many other concurrency and consistency examples which will make your hairs stand on end with fear and dread of the dangers that might lurk in your system, and whilst I would love to claim full credit for this latest toe-curling installment, I feel it is only fair to blow smoke towards the mighty Erland Sommarskog (b) for first raising a similar observation within a private forum many months ago. The comments and code are mine.

As we all know, Database Management Systems are designed with consistency in mind. That is, the level of consistency we observe should adhere to the level of isolation we chose. You will all no doubt be aware by now that the use of NOLOCK locking hint (or more specifically the READ UNCOMMITTED ISOLATION) is our way of telling the DBMS to favor concurrency over consistency and lead to reading result sets that are either incomplete or have not yet been committed. In other words, we are able to read in-flight transactional changes.

Both (on-disk) pessimistic and optimistic isolation in SQL Server implement pre-defined behaviors with the use of row, page, or object-level locks for the non-in-memory (or traditional/ on-disk) levels of isolation. Even on-disk “optimistic” isolation takes out write-level locking (but uses row versioning for reads). Locking (and latching) is often the root cause of many transaction processing bad behaviors that we find in SQL Server.


Scenario

We have a table called Cars which is pre-loaded with 10 records.

We will have two user sessions, both running under (the default) READ COMMITTED isolation level. The first user session is continuously looping whilst each time executing a single transaction that deletes all records in the Cars table before inserting 10 records and finally committing. The second user session is continuously looping over a select statement to query our table.
Therefore, given that the delete and insert during session 1 are isolated under a transaction, we would expect that the query from session 2 would either be temporarily blocked (by session 1) OR return 10 records.

Implementation

Setup

My examples utilize the Cars table. You can set this up as follows:

CREATE TABLE Cars (
   id uniqueidentifier DEFAULT NEWID(),
   carname VARCHAR(20),
   lastservice datetime DEFAULT getdate(),
   SpeedMPH INT, Details CHAR (7000) CONSTRAINT [PK__Cars] PRIMARY KEY CLUSTERED ([id])
)

Now we need to configure two sessions that will run continuously until we hit a failure.

Session 1

In our first session, we start with 10 records in the Cars table and all subsequent changes are made within a transaction scope deleting and inserting 10 records at a given time. We understand there should be an atomic guarantee that success or failure of the transaction will ensure that 10 records should only ever be visible outside of Session 1 regardless of that outcome.

There is a tiny delay implemented within this first transaction, and whilst it is not the root cause of this problem, it helps to exacerbate the issue in testing. Feel free to do testing with our without.

Our Session 1 code is as follows:

WHILE 1=1
BEGIN 
   BEGIN TRAN
      DELETE FROM Cars

      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Ferrari', 170, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Porsche', 150, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Lamborghini', 175, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Mini', 110, '')  
      WAITFOR DELAY '00:00:00.02'
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Datsun', 90, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Ford', 125, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Audi', 138, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('BMW', 120, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Honda', 87, '')
      INSERT INTO Cars(Carname, SpeedMPH, Details) VALUES('Mercedes', 155, '')   
   COMMIT TRAN
END

Remember that out of the box (under READ COMMITTED isolation level), SQL Server uses exclusive locks to serialize data access to the table data and block access by other sessions to it until they are released. The exclusive locks are of course held until the end of the transaction.

Session 2

In our second session, the code will break if we manage to query less than 10 records from Cars table. If we successfully read 10 records we increment a @ConsistentResults counter and attempt another read. We expect session 2 to continue running, but if it breaks we will see how many times we managed to read 10 records.

It is worth pointing out that when we query session 2, the results are inserted into a table variable. The only reason we do this is so that we can count how many records are queried and return those rows back later.

Our Session 2 code is as follows:

DECLARE @Cars TABLE (id uniqueidentifier DEFAULT NEWID(), carname VARCHAR(20), 
    lastservice datetime DEFAULT getdate(), SpeedMPH INT, Details CHAR (7000));
DECLARE @ConsistentResults INT = 0
WHILE 1=1
BEGIN
    DELETE FROM @Cars
    INSERT INTO @Cars SELECT * FROM Cars
    IF @@ROWCOUNT <> 10
        BREAK

    SET @ConsistentResults = @ConsistentResults + 1
    WAITFOR DELAY '00:00:00.013'
END
SELECT @ConsistentResults AS SuccessfulPriorRuns
SELECT * FROM @Cars

Test results

Our first run of Session 2 code gives us:

SuccessfulPriorRuns
-------------------
4

id                                   carname              lastservice             SpeedMPH    Details
------------------------------------ -------------------- ----------------------- ----------- -------
4587384F-E79A-4D4C-9A69-1D969F31E431 Porsche              2020-04-19 18:58:25.627 150                                                                                                                                                                                                                                                                         
F9AC6FFF-51B7-44E6-B773-311B17F0C5D7 Lamborghini          2020-04-19 18:58:25.627 175                                                                                                                                                                                                                                                                         
3C0C196A-95DE-40F5-B6A3-432574CECBA5 Ford                 2020-04-19 18:58:25.640 125                                                                                                                                                                                                                                                                         
5F844E89-13A4-4D46-874A-55513F389A3C Audi                 2020-04-19 18:58:25.640 138                                                                                                                                                                                                                                                                         
BADB1001-C4F2-4C7E-A65C-5A4099C43244 BMW                  2020-04-19 18:58:25.640 120                                                                                                                                                                                                                                                                         
E0E4EA82-DF24-44A9-93BB-89BB43B8FEB6 Datsun               2020-04-19 18:58:25.640 90                                                                                                                                                                                                                                                                          
DD29F38F-01E2-459C-A14E-DE8D456B21FE Mercedes             2020-04-19 18:58:25.640 155                                                                                                                                                                                                                                                                         
32BF996D-6077-4AAE-B23F-E5BB3A99365F Ferrari              2020-04-19 18:58:25.627 170                                                                                                                                                                                                                                                                         
B5E854C9-382B-4EB8-8E63-FB76F45B6EB5 Mini                 2020-04-19 18:58:25.627 110                                                                                                                                                                                                                                                                         

Completion time: 2020-04-19T18:58:25.6962254+01:00

As you can see, we managed to successfully read 10 records in the Cars table four times before reading only 9 records.


Our second run of Session 2 code gives us:

SuccessfulPriorRuns
-------------------
35

id                                   carname              lastservice             SpeedMPH    Details
------------------------------------ -------------------- ----------------------- ----------- -------
0B537CEF-6C8A-49A4-9295-054D29FEADC2 Porsche              2020-04-19 19:01:01.947 150                                                                                                                                                                                                                                                                         
DFC2990C-E410-45AF-A225-34405AF83E17 Ford                 2020-04-19 19:01:01.993 125                                                                                                                                                                                                                                                                         
9AC2AF9B-25C1-44BD-B503-38BE8D1C656A Audi                 2020-04-19 19:01:01.993 138                                                                                                                                                                                                                                                                         
65D57774-BDE8-4072-BA3D-59A093AC4103 Honda                2020-04-19 19:01:01.993 87                                                                                                                                                                                                                                                                          
3BF7C449-7FCA-4A93-B2CD-6688141410D2 Lamborghini          2020-04-19 19:01:01.947 175                                                                                                                                                                                                                                                                         
1D5D62B3-11C7-4EC5-984F-67FB87296021 Mini                 2020-04-19 19:01:01.947 110                                                                                                                                                                                                                                                                         
68A32AE3-F2C7-41DA-BFAD-8FA0276F543A Datsun               2020-04-19 19:01:01.993 90                                                                                                                                                                                                                                                                

Completion time: 2020-04-19T19:01:02.1127856+01:00

This time we managed to successfully read 10 records in the Cars table thirty-five times before reading only 7 records.


Our final run of Session 2 code gives us:

SuccessfulPriorRuns
-------------------
0

id                                   carname              lastservice             SpeedMPH    Details
------------------------------------ -------------------- ----------------------- ----------- -------
85227E5F-4C0F-4863-9895-21BA0232CFF5 Porsche              2020-04-19 19:05:43.380 150                                                                                                                                                                                                                                                                         
F9AE7CE7-F092-4F09-9A69-25423CD0D9F4 Mercedes             2020-04-19 19:05:43.410 155                                                                                                                                                                                                                                                                         
8E0F5C2D-3744-422A-A2B0-27D50BC89A96 Datsun               2020-04-19 19:05:43.410 90                                                                                                                                                                                                                                                                          
D48E0EBD-9171-4278-A76B-32E2BCD64260 BMW                  2020-04-19 19:05:43.410 120                                                                                                                                                                                                                                                                      

Completion time: 2020-04-19T19:05:43.5190517+01:00

The results this time were even worse. Our code failed on the first read of the Cars table reading only 4 records

Isolation Levels and In-Memory OLTP

If we run session 2 under all Pessimistic Isolation Levels (using on-disk), we get the following results:

Isolation Level Outcome Description
READ UNCOMMITTED FAIL result set size varies widely
READ COMMITTED FAIL this was the default used in the examples
REPEATABLE READ PARTIAL FAIL result set size varies and the occasional deadlock
SERIALIZABLE SUCCESS either a successful run OR deadlock

If we now run session 2 under the Optimistic Isolation Levels (using on-disk), we get the following results:

Isolation Level Outcome Description
SNAPSHOT SUCCESS all runs successful
READ COMMITTED SNAPSHOT SUCCESS all runs successful

If we implement the Cars table as an In-Memory OLTP table and run session 2 we get the following results:

Isolation Level Outcome Description
* SUCCESS all runs successful

Summary

As with nearly all cases of inconsistencies and strange transactional behaviors, they are usually displayed in the lower transactional isolation levels and this situation is no different. As is also demonstrated in our testing, we can usually fix these bad behaviors by using more restrictive levels. As you can see in our results, the problem still exists in REPEATABLE READ -although it appears to partially reduce the problem through the occasional deadlock (deadlocks exist for consistency reasons – so you must obviously handle them). SERIALIZABLE, which is the most restrictive isolation level of all, fully resolves the issue by deadlocking every single time that a successful run isn’t made. This clearly will be terminal for performance if your application handles and replays transactions on deadlock.

SNAPSHOT isolation comes (as expected) to the rescue and we do not see any consistency bad behaviors for any read, but quite surprisingly so does the optimistic implementation of READ COMMITTED (known to you and me as READ COMMITTED SNAPSHOT). I wasn’t really expecting that to work.

Finally, when we look towards In-Memory OLTP (IMOLTP), given that under the covers IMOLTP uses an entirely different (and optimistic) concurrency model and is completely lock and latch free, it is perhaps no surprise that no issues were experienced there. These tests add yet more weight (if you even needed it) to a move towards (on-disk) optimistic or In-Memory OLTP.

When this issue was first reported by Erland, Paul Randal (b|t) of SQLSkills confirmed that was a known behavior and offered up his post titled Read committed doesn’t guarantee much…. While a quick read of Paul’s post will on the face of it look like the same issue as the one I describe, the one thing that does not align is the use of a HEAP table -since we have added a clustered index our example.

That aside, I believe the biggest trigger for this specific problem is very similar and that the following are pivotal to things going wrong:
– The use of a table Cluster GUID
– The use of mixed extents – and how SQL Server accesses them

Final Thought

I have been saying for a very long time now that no-one should be using pessimistic isolation on their SQL Servers (despite it still being the default) with the only caveat being a 3rd party support contract requirement (and if the 3rd party doesn’t support it, you should look to move to other software). Furthermore, in SQL Server 2016 and onwards, mixed extents are disabled by default so I would expect this issue not to raise its ugly head – however, that is the only scenario I haven’t tested so I cannot confirm this.

This issue WILL NOT occur with optimistic isolation nor with In-Memory OLTP -so perhaps it is time for you to change!

How to restore a deleted Azure DevOps repository

If you are using Azure DevOps, you might be comforted that your Git repo is “in the Cloud” and automatically has availability and disaster guarantees. However you (or someone else) still have the ability to accidentally (or maliciously) delete repos from Azure DevOps Repos. Surprisingly, at the time of writing, there is no GUI based option to restore your repo. This might initially instill a sense of panic as you frantically search for the latest local clone to replace your remote – but there is a better way.

When you delete an Azure DevOps repository, it is initially soft-deleted to the “recycle bin”. After a period of time (oddly I have failed to find an offical Microsoft reference stating exactly what this but I believe it is 28 days) it is automatically purged and hard-deleted. Although there is no GUI support to restore your soft-deleted repositories, that ability is exposed through the Azure DevOps REST API, but frustratingly the Microsoft Azure DevOps Services REST API Reference does not provide a worked example in the Repositories – Restore Repository From Recycle Bin API call page.

To make your life easier, I will provide the solution below!


Getting started with Azure DevOps REST API and PAT token

Within my blog so far I have provided several worked examples of making a REST API call to Azure DevOps. If you are new to this, I suggest you first check out my post titled Querying Azure DevOps REST API with PowerShell.

Once you have assigned your $header variable from an encoded PAT token (as documented in the aforementioned article) you are ready to roll!

Set your repository’s Organisation and project

Each project will contain its own set of Azure repositories. Ensure you provide the correct values for your organization and project- and ensure that for any names with spaces are correctly replaced using %20 (so that a valid url can be formed).

$organization = "retracement"
$project = "ACME%20Corp"

REST API call to list repositories in the recycleBin

From the Microsoft Azure DevOps Services REST API Reference we can call the Repositories – List REST API call to return a list of all deleted repositories in our recycleBin for our organization’s project.

We will first build up our $url using the variables set earlier.

$url = "https://dev.azure.com/$organization/$project/_apis/git/recycleBin/repositories?api-version=5.1-preview.1"

Now that all variables are set we can make our REST API call and iterate over all deleted repositories

$deletedRepos = Invoke-RestMethod -Uri $url -Method Get -ContentType "application/json" -Headers $header
  # for each repository
Write-Host "Deleted repositories"
Write-Host "--------------------"

$deletedRepos.value | ForEach-Object {
    $repoId = $_.id
    $repoName = $_.name
    $deletedBy = $_.deletedBy.displayName
    $deletedDate = $_.deletedDate
    Write-Host "repoId:" $repoId $repoName "deleted on" $deletedDate "deleted by" $deletedBy
}

The following output is returned:

Deleted repositories
--------------------
repoId: 3b1bbfe0-470d-4724-bc69-6ec29ff88cb5 SuperImportantRepo deleted on 2020-04-08T13:44:21.807Z deleted by Mark Broadbent
repoId: 4c3abef0-520a-2461-ac70-1ad30ef11ab7 NotImportantRepo deleted on 2020-04-12T10:00:01.201Z deleted by Mark Broadbent

We have now identified that someone (me!) has deleted a super important repository by accident. Using the repoId we can use this to restore it from the recycleBin.


Recover soft deleted repository

First we need to set a variable $repoId to the deleted repository (SuperImportantRepo) repoId that we identified earlier. This will be used in our next REST API call.

$repoId = "3b1bbfe0-470d-4724-bc69-6ec29ff88cb5"

Now we can return back to the Repositories – Restore Repository From Recycle Bin REST API call page as use this to build out our new url.
As you will see, the url contains our $repoId and we will also create a $body variable set to a JSON key value pair setting the deleted property to false. This JSON body is passed into our REST API call using the Patch Method.

$url = "https://dev.azure.com/$organization/$project/_apis/git/recycleBin/repositories/" + $repoId +"?api-version=5.1-preview.1"
$body = ConvertTo-Json @{“deleted”= "false"}
Invoke-RestMethod -Uri $url -Method Patch -Body ($body) -ContentType "application/json" -Headers $header

The output of our final REST API call results in:

id            : 3b1bbfe0-470d-4724-bc69-6ec29ff88cb5
name          : SuperImportantRepo
url           : https://dev.azure.com/retracement/e6fa212f-3520-4c30-8c28-d6bd88926ff2/_apis/git/repositories/3b1bbfe0-470d-4724-bc69-6ec29ff88cb5
project       : @{id=e6fa212f-3520-4c30-8c28-d6bd88926ff2; name=ACME%20Corp; description=Super Important Repository for mission critical systems; url=https://dev.azure.com/retracement/_apis/projects/e6fa212f-3520-4c30-8c28-d6bd88926ff2; state=wellFormed; revision=626; 
                visibility=private; lastUpdateTime=2019-11-20T15:49:09.773Z}
defaultBranch : refs/heads/master
size          : 731
remoteUrl     : https://retracement@dev.azure.com/retracement/ACME%20Corp/_git/SuperImportantRepo
sshUrl        : git@ssh.dev.azure.com:v3/retracement/ACME%20Corp/SuperImportantRepo
webUrl        : https://dev.azure.com/retracement/ACME%20Corp/_git/SuperImportantRepo

As we can see from the above output success!


Summary

As I have shown, deleting a repository by accident in Azure DevOps does not have to be a disaster recovery situation since the recycleBin and Azure DevOps REST API makes it relatively simply to view and restore (when you know how!). However it is worth pointing out that for Git repositories, no similar situation exists if you delete a branch (unlike with Tfs Repos in Azure DevOps). So the moral of the story is to ensure you periodically back up all your remote repositories AND set branch policies to protect them against accidental deletion.

Hope you enjoyed the post!

Cannot delete old build definitions in Azure DevOps

I have been experiencing a problem for quite a while now in my current environment, in that some of our old builds cannot be deleted. When you attempt to do so it results in the following error:

One or more builds associated with the requested pipeline(s) are retained by a release. The pipeline(s) and builds will not be deleted.

Many of our pipelines have undergone a lot of change over time to the degree it is not even obvious anymore why (or indeed where) these builds are being prevented from being dropped. The only thing that is clear is that until they can be, the old build definitions will remain.

I have tried to set the Stop retaining the build setting for all builds associated with a build definition to no avail. The setting just does not seem to want to take in most cases.

I have also tried playing around with build retention policies and even tried tidying up the release pipelines (and releases) themselves. Unfortunately for me, those darn build pipelines do not want to delete.

Today I decided to put some of my recent Powershell and Azure DevOps REST API experiences (see previous posts in this blog) to the test and attempt to get to the bottom of the problem. As it turns out there is a build property called retainedByRelease that is exposed through the REST API which is the reason why a build cannot be removed -resulting in our irritating error.

Using the same technique that I wrote about in Querying Azure DevOps REST API with PowerShell I first decided to try an report on this property. Please refer back to the post above for more explanation on utilizing the REST API, but I realized I would need to make two REST API calls. The first would be to query one or more build definitions and the second would be to query all builds for each build definition. More specifically, with this last call I would report on the retainedByRelease property.


Querying the build definition builds

In the first piece of code we create our authorization token.

$personalToken = "tiksj25oumfavuzr4316vhpxw2mywzbapxj7sw3x2xet3dml1ygy"
$token = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes(":$($personalToken)"))
$header = @{authorization = "Basic $token"}

Next we set our organization and project variables.

$organization = "retracement"
$project = "ACME%20Corp"

Our first REST API call queries all build definitions within the project.

#all build definitions
$url = "https://dev.azure.com/$organization/$project/_apis/build/definitions?api-version=6.0-preview.7"
$builddefinitions = Invoke-RestMethod -Uri $url -Method Get -ContentType "application/json" -Headers $header
$builddefinitions.value | Sort-Object id|ForEach-Object {
Write-Host $_.id $_.name $_.queueStatus

#all builds for a definition
$url = "https://dev.azure.com/$organization/$project/_apis/build/builds?definitions=" + $_.id + "&api-version=6.0-preview.5"
$builds = Invoke-RestMethod -Uri $url -Method Get -ContentType "application/json" -Headers $header

$builds.value | Sort-Object id|ForEach-Object {
#report on retain status
Write-Host " BuildId" $_.id "- retainedByRelease:" $_.retainedByRelease
}
Write-Host
}

For brevity I provide only a subset of the results:

339 SQL Dacpac Build enabled
BuildId 43045 - retainedByRelease: False
BuildId 43051 - retainedByRelease: False
BuildId 43053 - retainedByRelease: True
BuildId 43307 - retainedByRelease: True
BuildId 43325 - retainedByRelease: True

366 Databricks Notebooks Build enabled
BuildId 45338 - retainedByRelease: False
BuildId 45340 - retainedByRelease: False
BuildId 45346 - retainedByRelease: True
BuildId 46032 - retainedByRelease: True

375 ARM Templates Build enabled
BuildId 46452 - retainedByRelease: False
BuildId 46454 - retainedByRelease: True

As you can see, from the three active build definitions listed, each one has at least one build that is marked for retention by release.


Setting the build retainedByRelease property

Now we have a procedure in place to query the retainedByRelease property, it is just as easy to set it. If you are trying to remove a specific Build Definition (or builds), you can implement a filter in the builddefinitions iterator. So:

$builddefinitions.value | Sort-Object id|ForEach-Object {

Would now become:

$builddefinitions.value | where {$_.name -eq "ARM Templates Build"}|Sort-Object id|ForEach-Object {

In the above example we are filtering on a single build definition, but feel free to use the filter of your choosing.

The final thing we need to do is make a REST API call to update each build returned by this filtered build definition. We can so this as follows by adding the following line inside our build iterator:

Invoke-RestMethod -Uri $url -Method Patch -Body (ConvertTo-Json @{"retainedByRelease"='false'}) -ContentType "application/json" -Headers $header

You will note the use of -Method Patch within this call rather than -Method Get and the JSON body. The patch method allows us to partially update resources (in this case one field) with the JSON body provided.


Putting it all together

So if we wanted to update the builds of one specific Build Definition called ARM Templates Build we would run the following code:

$personalToken = "tiksj25oumfavuzr4316vhpxw2mywzbapxj7sw3x2xet3dml1ygy"
$token = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes(":$($personalToken)"))
$header = @{authorization = "Basic $token"}

$organization = "retracement"
$project = "ACME%20Corp"

#all build definitions
$url = "https://dev.azure.com/$organization/$project/_apis/build/definitions?api-version=6.0-preview.7"
$builddefinitions = Invoke-RestMethod -Uri $url -Method Get -ContentType "application/json" -Headers $header

$builddefinitions.value | where {$_.name -eq "ARM Templates Build"}|Sort-Object id|ForEach-Object {
Write-Host $_.id $_.name $_.queueStatus

#all builds for a definition
$url = "https://dev.azure.com/$organization/$project/_apis/build/builds?definitions=" + $_.id + "&api-version=6.0-preview.5"
$builds = Invoke-RestMethod -Uri $url -Method Get -ContentType "application/json" -Headers $header

$builds.value | Sort-Object id|ForEach-Object {
#report on retain status
Write-Host " BuildId" $_.id "- retainedByRelease:" $_.retainedByRelease

#api call for a build
$url = "https://dev.azure.com/$organization/$project/_apis/build/builds/" + $_.id + "?api-version=6.0-preview.5"

#set retainedByRelease property to false
Invoke-RestMethod -Uri $url -Method Patch -Body (ConvertTo-Json @{"retainedByRelease"='false'}) -ContentType "application/json" -Headers $header
}
Write-Host
}

Now that all your builds for the ARM Templates Build Build Definition are deleted, you should be able to remove this build definition without further error (you do not need to first remove its builds).


Summary

There are certain issues that you might experience in Azure DevOps which seem almost impossible to resolve through the GUI, but yet again the Azure DevOps API can come to our rescue. In this specific example we have easily queried aspects of DevOps through PowerShell, and this time even changed information through it to resolve our problem.

I hope you find this post useful for this rather frustrating problem!

Azure subscription is not registered to use Cosmos DB namespace

It is usually the simplest things that often leave me feeling like I am a complete dummy. One such issue I ran into fairly recently when trying to deploy Cosmos DB into an Azure subscription.

From Azure DevOps I received the following error:

2019-05-31T16:28:55.4288261Z ##[error]MissingSubscriptionRegistration : The subscription is not registered to use namespace ‘Microsoft.DocumentDB’. See https://aka.ms/rps-not-found for how to register subscriptions.
2019-05-31T16:28:55.5005526Z ##[section]Finishing: Deploy Cosmos Account and Database with Shared Capacity

I initially assumed that the error was Azure DevOps related so I attempted to deploy using PowerShell and ran into an almost identical error.

I had deployed this Cosmos DB template successfully many times in our other subscriptions and could not understand why a simple deployment to an alternative subscription would fail. Looking back at the error message I followed the link provided which took me to a Microsoft doc titled Troubleshoot common Azure deployment errors with Azure Resource Manager and linked within I ended up on Resolve errors for resource provider registration.

It turns out that the Azure resource providers, which you can almost think of as Class libraries, can (like their programmatic counterparts) be in either Registered or NotRegistered state. When they are in a NotRegistered state, this means that we are unable to call that provider to create a specific resource (such as Cosmos DB in my case).

We can use PowerShell Az or the Azure CLI (both talked about elsewhere in this blog) to report what resources are available. In this specific example, I am going to return all providers that match the wildcard pattern of Microsoft.D*. The code searches for and sets the relevant subscription using Azure Cloud Shell (for simplicities sake), but you can do this through a remote Azure CLI or PowerShell Az session if you would prefer (and connectivity allows).

$subscription = Get-AzSubscription | Where-Object Name -Match "MySubscription1*"
Select-AzSubscription $subscription 
$providers = Get-AzResourceProvider -listavailable |Select-Object ProviderNamespace, RegistrationState
$providers  | Where-Object ProviderNamespace -Match "Microsoft.D*"

We get the following results:

ProviderNamespace               RegistrationState
-----------------               -----------------
Microsoft.DBforPostgreSQL       Registered
Microsoft.DevTestLab            Registered
Microsoft.Databricks            Registered
Microsoft.DataLakeStore         Registered
Microsoft.DataLakeAnalytics     Registered
Microsoft.DBforMySQL            Registered
Microsoft.DevSpaces             Registered
Microsoft.Devices               Registered
Microsoft.DataFactory           Registered
Microsoft.DataBox               NotRegistered
Microsoft.DataBoxEdge           NotRegistered
Microsoft.DataCatalog           NotRegistered
Microsoft.DataMigration         NotRegistered
Microsoft.DataShare             NotRegistered
Microsoft.DBforMariaDB          NotRegistered
Microsoft.DeploymentManager     NotRegistered
Microsoft.DesktopVirtualization NotRegistered
Microsoft.DevOps                NotRegistered
Microsoft.DigitalTwins          NotRegistered
Microsoft.DocumentDB            NotRegistered
Microsoft.DomainRegistration    NotRegistered

Notice that the Microsoft.DocumentDB namespace is disabled. If you are wondering, DocumentDB was the precursor name of the Cosmos DB SQL API (before Cosmos DB supported multiple APIs). Like many other Microsoft products, early names tend to stick with the products :).

To register this namespace we can simply run the following line of code against the subscription using the Register-AzResourceProvider cmdlet.

Register-AzResourceProvider -ProviderNamespace Microsoft.DocumentDB

The following output is returned:

ProviderNamespace : Microsoft.DocumentDB
RegistrationState : Registering
ResourceTypes     : {databaseAccounts, databaseAccountNames, operations, operationResults…}
Locations         : {Australia Central, Australia East, Australia Southeast, Canada Central…}

If it is not obvious you would unregister a provider namespace (if you wanted to make it unavailable) using the Unregister-AzResourceProvider cmdlet as follows:

UnRegister-AzResourceProvider -ProviderNamespace Microsoft.DocumentDB

Once I had registered the Microsoft.DocumentDB namespace, I was able to deploy my Cosmos DB template into my subscription without error!

Summary

Depending upon your subscription and region, your enabled provider namespaces may vary, however in my case someone had explicitly un-registered Microsoft.DocumentDB from it. You might ask why someone might do that? Well, it is a good way to prevent deployments of certain resource types if they go against your company policy.

As you can see, if you run into a similar problem or want to start using resource types that are by default NotRegistered you can register and start using them incredibly easily.

Custom Job scheduling, remote queries, and avoiding false negatives using PowerShell and SQL Agent

Say what you want about the SQL Server Agent, it is the lifeblood of SQL Server for scheduling maintenance routines, data loading, and other such ongoing operations. It has been around since the dawn of time and has grown into a rich (but simple) job execution scheduler, yet at the same time, still lacks certain capabilities that mildly annoy me.

Scheduling exclusions

One such annoyance is the inability to create intelligent exclusions for job schedules. In other words the “Don’t run this job if” scenarios that you might commonly run into at retailers, education or health sectors, financial institutions and other environments that may have time-sensitive or complex execution logic.

Now I know what you are thinking. You are thinking that you have created plenty of job schedules in the past and managed to add exceptions to the schedule, and with this, you are partially correct.

In the SQL Agent Job Schedule dialog above, we have a weekly schedule that is set to run every Monday to Friday, running only twice a day, starting at 00:00:00. By very definition, this schedule will not run on Saturday or Sunday (and therefore we have an exception), but we are constrained within the bounds of the dialog interface and available fields and logic.

Complex scheduling logic requirements

Consider an environment that consumes lots of ETL feeds from third parties on a (working) daily basis. The feeds, however, are not created during holidays, meaning that the dependent job/s would fail if they ran on those days. There are three obvious ways to work around this situation:

  1. Allow failures to occur (rolling back changes on failure)
  2. Embed feed existence logic *1 (and/or exception logic) within the code/ package (to prevent the package execution from failing)
  3. Prevent job execution on-condition within a job step

Option 1 (quite clearly) is not a great solution and would involve false negative reporting (in systems such as SCOM since the error would be just that) and require human interaction to review. Jobs would fail, but the failure would only be a result of missing feeds and nothing more. Recovery in this instance would (hopefully!) only require allowing the job to run on schedule once the feeds are delivered the following day.

Option 2 would address the problem of failure in this instance but does not implement schedule exception logic. In other words, the job (and its package) would execute successfully, but this in itself creates false success reporting (the package did not perform its usual workload). Implementing schedule exception logic inside the code/package is also far from ideal since its function is easily lost in the logic of the actual job step and maintaining and building upon this logic could become unwieldy over time.

Option 3 is perhaps the best choice and provides isolation between the job schedule logic and the job function itself (because requirements can change) and improved visibility (feedback) of why the job did not run – but we still need to understand how to implement this for situations that are outside the capability of the Job Schedule dialog.

*1 Whilst it is fair to argue that existence logic is not the solution to our problem, it should be obvious to the reader that creating packages that gracefully handle failure is a given. In reality, all package dependencies should be tested for, and reported on error -and I tend to favor the “bubbling up approach” to failure.

Remote queries for exclusion lookups

In this specific scenario, we need to implement our custom logic to perform our holiday test and raise an error if the current day is one (we will address the handling of this error shortly). In our environment, we maintain a centralized calendar (in a table of course!) of all important dates and public holidays up to five years in advance, and this makes for an excellent reference point for our test query. We want to store our table on a different (but highly available) SQL instance so it can be used by all SQL instance Agent jobs, but its location introduces yet another challenge to our problem -how do we query this remote data from our Agent job?

If you automatically jumped to the conclusion that we should use a linked server, then I can tell you are a bit of a masochist. One of the (several) problems with this approach is that we would have to create a linked server to our calendar data instance within every instance that needs to reference it. In an enterprise environment, this could be a problem to deploy and maintain.

SQL Server 2008 R2 introduced PowerShell Agent job steps, and this gives us a very easy mechanism to programmatically run our remote query without a linked server.

In the following piece of code, we are simply querying our remote server hosting the calendar table ([DBAAdmin].[dbo].[FullCalendar]) and searching for today’s date.

$serverinstance = "server1"
$sqlCommandText = @"
DECLARE @testdate DATE = GETDATE()
SELECT IsHoliday
FROM [DBAAdmin].[dbo].[FullCalendar]
WHERE DateKey=@testdate
"@

if ((Invoke-Sqlcmd $sqlCommandText -ServerInstance $serverinstance).IsHoliday) {
throw "Current date is a Holiday"
}

If today’s date is classed in the table as a holiday (see the evaluation against the returned dataset’s IsHoliday property), then an exception will be thrown and the step will be failed.

On step failure success

The only problem with failing the job when today *is* a holiday, is that we are essentially raising a false-negative which is not much better than just allowing a job just to fail because a feed wasn’t delivered for that day. In the ideal world, we need the job to quit but terminate in such a way that is uniquely different to a “success” (i.e. that today is not a holiday) or “failure” (i.e. that the job has failed for some other reason that needs investigation).

The trick here is to change the job step On failure action to “Quit the job reporting success” via the Job Step/ Advanced Properties as follows:

So your job steps are like this:

By doing so, your skip logic will result in the job step failure showing the job execution as an alert (rather than failure). Only the step itself will show as a failure -which, I think you will agree, is pretty cool! This means that not only do you have a visual indicator that job was canceled by design, but you should be able to programmatically exclude these alerts in SCOM.

Summary

In this post, we have done several things in order to deliver a richer scheduling solution for our agent jobs. These are as follows:

  1. We have prevented our scheduling custom logic job step from causing false negatives through setting its “On Failure: Quit the job reporting success” to allow the job to report only an alert but and only fail the job step.
  2. We have created a centralized table to store custom metadata to allow all SQL instances to query it for scheduling logic scale-out (where you might need to!).
  3. We have utilized PowerShell to query our remote metadata in order to avoid the need to create (and maintain) instance-specific linked servers. This is clearly a good thing for the deployment of standardized scripts.

I hope you can see from this very simple example that not only can we now implement very rich custom scheduling solutions for our jobs, but we can also use some of these techniques to avoid the unnecessary use of linked servers and other things. Enjoy!