Reading very BIG text files using PowerShell

I frequently have to troubleshoot a variety of data feeds (populated from very very large files) which are regularly ingested into SQL Server for reporting and data validation purposes. From time to time, these feeds break due to malformed or invalid data provided by the third parties, and often due to the size of the files and limitations of third-party text file applications, loading them into a text file viewer is not possible.

I recently ran into this very same problem in which the bcp error message stated:

An error occurred while processing the file “D:\MyBigFeedFile.txt” on data row 123798766555678.

Given that the text file was far in excess of notepad++ limits (or even easy human interaction usability limits), I decided not to waste my time trying to find an editor that could cope and simply looked for a way to pull out a batch of lines from within the file programmatically to compare the good rows with the bad.

I turned to PowerShell and came across the Get-Content cmdlet to read the file, which looked like it would give me at least part of what I wanted. At this point, there are quite a few ways to look at specific lines in the file, but in order to look at the line in question and its surrounding lines I could only come up with one perfect solution. By passing the pipeline output into the Select-Object cmdlet I could use its rich skip and next functionality.

Please note that the following code examples all use PowerShell’s multiple line wrap functionality for readability purposes on this web page only. Normally I’d put all these statements on a single line (obviously minus the ` line wrapping character/s)

Basically, the following will read the file in question, skipping the first 123798766555672 lines and return the next 6 lines (I can therefore deduce that my error row will be the last line in the set with 5 good lines above):

Get-Content D:\MyBigFeedFile.txt | `
Select-Object -skip 123798766555672 -first 6

Ok. So that’s cool and works very quickly given the size of the file (in this case it was approximately 3 minutes to return!). Perhaps the final thing we would want to do is to pipe into another text file for repeat viewing (so we don’t have to keep re-reading this huge text file over and over) or perhaps you might want to use the Out-GridView cmdlet. For either option, all we need to do is direct the pipeline output now into the respective cmdlet.

Therefore the full solution to my specific problem is:

Get-Content D:\MyBigFeedFile.txt | `
Select-Object -skip 123798766555672 -first 6 | `
Out-File Output.txt

Conclusion

I think this final solution is pretty cool, is incredibly fast and works like a dream. It appears much quicker than anything else I could put together (or use out of the box) and seems to avoid any of the limitations (particularly in 32bit applications) of GUI based editors. However it is worth pointing out that I have not monitored memory consumption whilst executing this piece of code on very large files so I cannot confirm how efficient or safe this would be to do on a mission-critical server, but it is my belief that the Get-Content cmdlet is a stream operation and so should be safe. Either way you should do your own testing folks!

Posted in PowerShell, SQLServerPedia Syndication | Tagged | 1 Comment

Installing Docker on Linux Mint

Ok, so first things first. This is not a ground shaking post of revelation, and ultimately all the information you need can be found directly from Docker, but like all good posts this is intended to address any confusion or ambiguity you may find when installing Docker on Linux Mint and join all the dots for you.

A web search will almost certainly point you to lots of similar posts, mostly (if not) all of which start instructing you to add unofficial or unrecognized sources, keys etc. Therefore my intention with this post is not to replace official documentation, but to make the process as simple as possible, whilst still pointing to all the official documentation so that you can be confident you are not breaking security or other such things!

You can head over to the following Docker page Get Docker CE for Ubuntu for the initial setup and updates, but for simplicity, you can follow along below.

First run the script below in order to update your local files from configured repositories, install required packages, and add the official Docker GPG key.

# Ensure your repositories are up to date
sudo apt-get update

# Install required packages
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

#Add Docker’s official GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

#Check the GPG fingerprint successfully added (should see output from this command)
sudo apt-key fingerprint 0EBFCD88

Now that the package repository has been added, you can now install Docker Community Edition from apt as follows:

sudo apt-get update
sudo apt-get install docker-ce

Once this has been done, next up is perhaps the most important step (in terms of potential problems) -and that is adding the correct repository for your version of Linux Mint. The issue you face is that Linux Mint uses its own release codenames and so the default script (provided by Docker) picks this up rather than the (required) Ubuntu release -its the $(lsb_release -cs) piece of code in their script. Instead, you will need to find out your Mint release name and replace this with the correct Ubuntu package base.

Find out your Linux Mint short codename by running the following:

lsb_release -cs

In my case I find that I am running Linux Mint Serena. Next, you need to find out the short codename of the Ubuntu base build that your edition of Mint is derived from. To do this, visit the Linux Mint Releases page.

From this page, I can see that Serena uses the Xenial package base (as below):

So now all we need to do is add the right repository for the right package base (note I have added xenial to the script below). In your case, you may be using a new or older edition of Mint, so simply replace the word “xenial” in the script with the correct package base relevant to the version of Mint you are using.

#Add the Docker repository for the Xenial build
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   xenial \
   stable"

Once this is completed you then need to perform the Docker post-installation tasks which you can find here. These tasks are really there to prevent you having to keep running all Docker commands using the privileged sudo command. For instance, without going any further you *could* already now run the following command to list all current downloaded docker images (there should be none).

#list docker images (using privelaged mode)
sudo docker image ls

But we can avoid having to keep specifying sudo by running the following:

# Create a new docker group
sudo groupadd docker

# Add your user to the docker group.
# This script assumes that your current user
# is the one you want to be a docker admin
sudo usermod -aG docker $USER

It is now important that you log out of your session and back in, in order to pick up the new security context in your session, otherwise you may be greeted with the following text when attempting to run your docker command without sudo:

retracement@mylinuxhost ~ $ docker images
Got permission denied while trying to connect to the
Docker daemon socket at unix:///var/run/docker.sock:
Get http://%2Fvar%2Frun%2Fdocker.sock/v1.35/images/json:
dial unix /var/run/docker.sock: connect: permission denied

So if you have followed the instructions correctly, you should be able to list docker images (or any other docker command) without requiring sudo as follows:

#list docker images (non-privelaged mode)
docker image ls

And that’s it. Docker is now ready for you to run containers on your shiny Linux Mint desktop.

Posted in Docker, Linux | Tagged | Leave a comment

4 days of Dockercon – Day 4

After doing some more work in the evening for Microsoft and watching my Football match, I didn’t get to sleep until midnight, so I woke with my usual conference groggy feeling. At my age, I really need more hours in bed! Thankfully I had done most of my packing the night before so didn’t have an awful lot to do. However given the fact that I had skipped dinner/ supper the night before I was determined to grab a big breakfast. I’d skipped my evening meal since I really couldn’t be bothered to wander down to the Shopping Mall and waste 40 minutes on a round trip in doing so, nor did I fancy a meal in the hotel restaurant wasting 2 hours of messing around.

This morning I was out of my room by 8 am. and promptly checked out, booked my transit back to the airport and checked my luggage into the Conference drop off point and headed to the post-conference Summits. I had somehow managed to register for two (the Moby Project Summit and Enterprise Summit) and whilst I suspected that the Moby Project would hold slightly more interest for me, I felt that it probably would hold less business value than the Enterprise option. So I deregistered myself from Moby, and went to get myself a freshly-born bouncing plate of bacon, eggs, beans, and… cheese! (yes these Danish folks are nut-cases!!!).

The Enterprise Summit kicked off talking even more about the MTA program and perhaps labored a little too long on its overview (especially since we had witnessed much of this material during the conference and keynotes). A few demos later and I am impressed that MTA and Docker Enterprise are a very good business proposition for most (if not all) businesses, though I really must try these things out myself on some problematic apps. You know the saying that  “if it sounds too good to be true….”? That’s partially how I feel at the moment, and I would rather hear about the serious problems encountered and failures experienced – that (I believe) would be more useful for us to understand the limitations of this service. The Enterprise Summit in truth was an extended series of sessions and regurgitated material and took us up to lunchtime (only 3 hours after breakfast – seriously guys?!) but I decided to take the opportunity now, given the proximity of the hotel to anything else and also taking into account my arranged departure time to the airport (5.30 pm.). The afternoon section of the “Enterprise Summit” consisted of completing the lab exercises, so I decided to finish lunch early and head over to my favorite spot.

Unfortunately for me, by 3pm. I was kicked out of the Conference center since Dockercon was “officially over”, and therefore had no option but to head back to the hotel and continue the labs from the bar area where I managed to do some cool stuff in swarm (playing around with node failures, container scaling and failed upgrades/ rollbacks).

This. Is. The. Future. Folks. (And the future is now…).

Thankfully I’d just finished up and grabbed my ride to the airport and 15 minutes later hit checkin, security, etc. After refuelling on airport pizza I decide to crack open the labs for one last time and 2 hours later remember it is probably a good time to hit the button on this blog post :).

All in all, a very productive 1st Dockercon and I have got much more return on investment than I could ever dream from certain other conferences (which shan’t be named!). I did miss bumping into many of my friends and familiar faces during this conference and certainly found the (presumably) Danes very reserved and hard to have a conversation with at the dinner tables -so after several abortive attempts to get them speaking I ultimately gave up. I already have another (bucket list) conference firmly on my watch list for next year, but really hope I can add Dockercon US/EU also into my budget.

Other posts in this series
4 days of Dockercon – Day 3
4 days of Dockercon – Day 2
4 days of Dockercon – Day 1
4 days of Dockercon – Day 0

Posted in Community, Events | Tagged | Leave a comment