Thursday 9 October 2014

Copying records between tables in different Azure accounts : The Next Generation

A few months back I posted a small piece of code for copying objects between Azure Table Storage Accounts. I have 2 bug-bears with the code I previously posted.

  1.  It's all very custom, you need to give it the type that is in the table and add lambdas to make sure it doesn't try to select the wrong object. 
  2. Due to a change in the Azure Storage Library, it no longer works. 

 I hadn't used this script in a while but now I have an impending need for something like it but better that will allow me to copy the contents of all the tables or a defined subset of tables in a given account, enter the DynamicTableEntity which allows you to get an entry from a table as a dynamic object.
To run the code, open up Linqpad and add a reference to the Windows Azure Storage library, best way to do this is using Nuget.

void Main()
{
  var srcClient = CreateClient("source account connection string");
  var destClient = CreateClient("destination account connection string");
  
  var mappings = new List<Tuple<string,string>>();
  
  //Manually setup mappings.
  //mappings.Add(new Tuple<string,string>("table1","table1copy"));
  //mappings.Add(new Tuple<string,string>("table2","table2copy"));
  
  //Copy all tables from the src account in to identically named tables in the destination account.
  var tables = srcClient.ListTables(null, new TableRequestOptions(){PayloadFormat = TablePayloadFormat.JsonNoMetadata});
  mappings = tables.Select (t => new Tuple<string,string>(t.Name,t.Name)).ToList();

  Copy(srcClient,destClient,mappings);
}

public void Copy(CloudTableClient src, CloudTableClient dest, List<Tuple<string,string>> mappings) {

  mappings.ForEach(x=>{
    var st = src.GetTableReference(x.Item1);
    var dt = dest.GetTableReference(x.Item2);
    dt.CreateIfNotExists();
    
    var query = new TableQuery<DynamicTableEntity>();

    foreach (var entity in st.ExecuteQuery(query))
    {
      dt.Execute(TableOperation.InsertOrReplace(entity));
    }
  });
}

public CloudTableClient CreateClient(string connString){
  var account = CloudStorageAccount.Parse(connString);
  return account.CreateCloudTableClient();
}

In the above code, we create CloudTableClients to represent the source and destination accounts, then we build a mapping of source and destination tables.
We can do this manually if we only want to copy some tables and/or we want the destination tables to have different names to the source tables.
Alternatively, we can get a list of all the tables from the source and use that to build a 1:1 map, this will have the result of copying all items in all tables from the source account to the destination.
The Copy method simply takes the clients and mapping and does some iteration to get the items from each table from the source account and save them to the destination account.

Note: The code above is horribly inefficient for copying large amounts of data as it inserts each request individually. In a follow up post, I'll make this more efficient by making use of the TableBatchOperation.

Sunday 31 August 2014

Using Custom Model Binders in ASP.Net MVC

I answered a question on Reddit this week from someone starting out in MVC who had read an incorrect article about model binding which was mostly correct, but made using custom binders look like they require more code than they actually do, so I thought it was worth a post to clear that up.

What is (Custom) Model Binding?

Model Binding is the process through which MVC takes a form post and maps all of the form values in to a custom object, allowing you to have a POST action method which takes in a ViewModel and have it automagically populated for you. Custom Model Binders allow you to insert your own binders for particular scenarios where the default binding won't quite cut it.

Creating our custom binder

We have the following typical example ViewModel:
    public class MyViewModel
    {
        public string MyStringProperty { get; set; }
    }
It's just a class, nothing special about it at all. Now we want to manually handle the binding of this model because we want to add some text to the end of MyStringProperty when it gets bound. This is unlikely to be something you would want to do in real life, but we're just proving the point here.
This is our binder:
    public class MyViewModelBinder:IModelBinder
    {
        protected System.Collections.Specialized.NameValueCollection Form { get; set; }

        private void Initialise(ControllerContext controllerContext, ModelBindingContext bindingContext)
        {
            Form = controllerContext.HttpContext.Request.Form;
        }

        public object BindModel(ControllerContext controllerContext, ModelBindingContext bindingContext)
        {
            Initialise(controllerContext, bindingContext);
            var msp = Form["MyStringProperty"];
            return new MyViewModel {MyStringProperty = msp + " from my custom binder"};
        }
    }
Model Binders need to implement IModelBinder and have a BindModel method. This gives you access to the controllerContext from which you can access HttpContext and the bindingContext, which admittedly I have never had to use.
In our binder, we just manually pick up the MyStringProperty value from the form, add it to a new instance of our object and return it, adding our incredibly important piece of text to the end of the retrieved value.

Using our Custom Binder

There are 2 ways we can use our custom binder, which one we use depends on each scenario. If we need to override the binding of a class for a particular Action method, we can use the ModelBinder attribute on the relevant parameter of the Action Method:
        [HttpPost]
        public ActionResult Index([ModelBinder(typeof(MyViewModelBinder))]MyViewModel model)
        {
            return View(model);
        }
This will apply our Custom Binder to this property (MyViewModel) for this action only, no other actions or controllers will be affected.
Alternatively, if we want to apply our custom binder to MyViewModel globally within the application, we can add the following line to Application_Start in global.asax.cs:
        protected void Application_Start()
        {
            AreaRegistration.RegisterAllAreas();
            FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);
            RouteConfig.RegisterRoutes(RouteTable.Routes);
            BundleConfig.RegisterBundles(BundleTable.Bundles);

            ModelBinders.Binders[typeof(MyViewModel)] = new MyViewModelBinder();
        }
Using this method, everywhere a parameter of type MyViewModel is encountered on an ActionResult, our custom binder will be invoked instead of the standard one. Because this applies globally, we do not need the ModelBinder attribute on our Action Method so the overridden behaviour is completely transparent to the controller, promoting code reuse and keeping model binding logic where it belongs.

Wednesday 6 August 2014

API Head-to-head: AWS S3 Vs Windows Azure Table Storage

Recently, I was experimenting with using S3 as a tertiary backup for my photos, an honour which eventually went to Azure as it was cheaper and I am more familiar with the Azure APIs as I use them in my day job.

I thought I’d take a deeper look at both APIs and see how they compare. I’ll go through some standard operations, comparing the amount of code required to perform the operation.

If you want a comparison of features, there are plenty of blog posts on the subject, just Bingle It

All the code in this test is being run in Linqpad, using the AWS SDK for .Net and Windows Azure Storage Nuget packages.

Create the client

Both Azure and S3 have the concept of a client, this represents the service itself and is where you provide credentials for accessing the service.

Azure

var account = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse("connectionstring");
var client = account.CreateCloudBlobClient();

S3

var client = AWSClientFactory.CreateAmazonS3Client("accessKey", "secret",RegionEndpoint.EUWest1);
S3 wins on lines of code but I don’t like having to declare the datacenter the account is in. In my opinion, the application shouldn’t be aware of this. 1 point to Azure.

Creating a container

This is a folder, Azure refers to is a container, S3 calls it a bucket.

Azure

var container = client.GetContainerReference("test-container");
container.CreateIfNotExists();

S3

try
{         
 client.PutBucket(new PutBucketRequest { BucketName = "my-testing-bucket-123456", UseClientRegion = true});
}
catch (AmazonS3Exception ex)
{
 if(ex.ErrorCode != "BucketAlreadyOwnedByYou") {
  throw;
 }
}

S3 loses big time on simplicity here. To my knowledge, this is the only way to do a blind create of a container, that is creating it without knowing up front if it already exists. Azure makes this trivial with CreateIfNotExists. 2 points to Azure.

Uploading a file

Azure

var container = client.GetContainerReference("test-container");
var blob = container.GetBlockBlobReference("testfile");
blob.UploadFromFile(@"M:\testfile1.txt",FileMode.OpenOrCreate);

S3

var putObjectRequest = new PutObjectRequest {BucketName = "my-testing-bucket-123456", FilePath = @"M:\testfile.txt", Key = "testfile", GenerateMD5Digest = true, Timeout=-1};
var upload = client.PutObject(putObjectRequest);
They’re pretty much equal here, but the S3 code is more verbose. I like the idea of getting a reference to a blob while not knowing if it actually exists or not.

List Blobs

Azure

var container = client.GetContainerReference("test-container");
var blobs = container.ListBlobs(null, true, BlobListingDetails.Metadata);
blobs.OfType().Select (cbb => cbb.Name).Dump();

S3

var listRequest = new ListObjectsRequest(){ BucketName = "my-testing-bucket-123456"};
client.ListObjects(listRequest).S3Objects.Select (so => so.Key).Dump();

In terms of complexity, they’re pretty even here too. Azure has one more line but it’s not a difficult one. Notice that whereas with Azure, we get a reference to a container and then perform operations against that reference, with AWS all requests are individual so you end up having to explicitly tell the client for every operation what the bucket name is. Point to Azure.

Deleting a Blob

Azure

var dblob = container.GetBlockBlobReference("testfile");
dblob.Delete();

S3

var delRequest = new DeleteObjectRequest(){ BucketName = "my-testing-bucket-123456", Key="testfile"};
client.DeleteObject(delRequest);
Neither code is particularly complicated here, but I prefer Azure’s simplicity with the container and blob reference model so point Azure.

Delete a Container

Azure

var container = client.GetContainerReference("test-container");
container.Delete();

S3

var delBucket = new DeleteBucketRequest(){ BucketName = "my-testing-bucket-123456"};
client.DeleteBucket(delBucket);
Again, pretty equal. To micro-analyse the lines, you could say that for Azure, you’ve got one potentially reusable line, and one throw-away line. With S3, they’re both throw away. But in reality, unless you’re doing thousands of consecutive operations, it doesn’t really matter.

Conclusion

In terms of complexity, Azure’s and S3’s APIs are pretty much equal, but it’s easy to see where they each have their uses. Azure’s API is a much thicker abstraction over REST, whereas the S3 API is such a thin-veneer that you could imagine a home-grown API not turning out that differently (but most likely not as reliable).

In my mind, if you’re doing lots of operations against lots of different blobs and containers then S3’s API is more suitable as each operation is self-contained and there are no references to containers or blobs hanging around.

If you’re doing operations which share common elements, such as performing numerous operations on a blob or working with lots of blobs within a few containers, Azure’s API seems better suited as you create the references and then reuse them, reducing the amount of repeated code.

Bonus Section

If you could be bothered to read past my conclusion, congratulations on your determination! The comparative speed of Azure and AWS has been done to death, but I couldn’t resist getting my own stats.

These are ridiculously simple stats, essentially Stopwatch calls wrapped around the code in this post. The file I am uploading is only 6k. The simple reason for this is that everyone tests how these services handle lots of large objects, but no one seems to cover the probably more common scenario of users uploading very small files. The average size is probably higher than 6kb, but this is what I’ve got hanging around so this is what I’m using.

So here are my extremely simple and probably not at all reliable benchmarks.

Operation S3 Azure
Create Container573279
Upload 6Kb file9955
List Blobs (1)41103
Delete Blob5545
Delete Container22138
All times are in milliseconds. I’ve got to admit; I was expecting a more even spread here. Azure is significantly faster creating and deleting containers and uploading the file. It is also faster at deleting a blob, but the difference is insignificant. S3 wins significantly listing blobs.

Not covered in this post: Both APIs also have the Begin/End style of async operations and Azure has the bonus of async operations based on the async/await pattern, I may do another post on that in the future.

TL;DR; Azure's API is in my opinion a better abstraction and it's faster for most operations.

Friday 18 July 2014

Upgraded to Azure Storage Emulator 3.2, where have all my tables gone?

In an attempt to solve a 400 error accessing tables on the Azure Storage Emulator 3.0 today, I upgraded to 3.2 using the Web Platform Installer. This resulted in a kind of good news, bad news situation.

Good - The error stopped happening.
Bad - Where the f**k have all my tables gone!

I'll be buggered if I'm recreating and repopulating them all so I went hunting. I managed to find the emulator database is in C:\Users\<username>\. In that directory you'll find mdf files called WAStorageEmulatorDb**.mdf where ** is the version number. I had ones ending in 22, 30, 32. Each will be accompanied by a _log file.

I loaded them up in Linqpad and the schemas looked the same, so for a punt I just renamed the files ending in 32 to something else and the renamed the files ending in 30 to 32.

Start up the emulator and everything is present again. That saved me a few hours!

Wednesday 19 March 2014

Copying records between tables in different Azure accounts

Today I had to quickly throw up a new instance of a customer's service in Hong Kong as they've got a big demo event coming up and want things to be as quick as possible. Now I haven't quite got things to a point where I can have multiple geographically distributed instances of the service all happily talking to each other and sharing data so this instance is it's own little island, a completely separate instance to the main one in the EU.

Deploying the new Cloud Service was easy.
Taking a backup of the EU database and deploying it to Hong Kong was also easy.

However, recently I've been making increasing use of Azure Table storage for trivial data storage scenarios where the data isn't relational and the data will want to be shared amongst multiple instances eventually without having to wait for a database sync. It was at this point that I realised, I have no way of copying data from one storage account to another.
Time to correct that!
public void Transfer<T>(Microsoft.WindowsAzure.Storage.CloudStorageAccount fromAcc, Microsoft.WindowsAzure.Storage.CloudStorageAccount toAcc, string table, Expression<Func<T,bool>> expr) where T: TableServiceEntity {
  
  var fromTC = fromAcc.CreateCloudTableClient();
  var fromT = fromTC.GetTableReference(table);
  
  var toTC = toAcc.CreateCloudTableClient();
  var toT = toTC.GetTableReference(table);
  toT.CreateIfNotExists();
  
  var fromContext = fromTC.GetTableServiceContext();
  var toContext = toTC.GetTableServiceContext();
    
  var fromData = fromContext.CreateQuery<T>(table).Where(expr);

  foreach (var item in fromData)
  {
    toContext.AttachTo(table,item);
    toContext.UpdateObject(item);
  }
  toContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
}


This Transfer method takes in a from account and a to account and the name of the table.
The last parameter is an expression for the where clause. This is for scenarios where the same table contains multiple types of objects and you just want to query out the ones of a particular type for transfer using whatever clause is appropriate.

T must derive from TableServiceEntity and be the type of the object from which the record originated, or one that is similarly shaped.

The method is quite straight forward, it just fires up 2 table clients, gets a reference to the table specified by the table parameter, creates it on the receiving end if it doesn't exist (I think it's safe to assume that it already exists at the source end), queries out the data, then attaches it to the destination context and saves changes.
This upserts all of the data in to the source table.

Usage is simple:
var fromAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=accountname;AccountKey=accesskey");
var toAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=accountname;AccountKey=accesskey");

Transfer<MyProject.MyType1>(fromAccount, toAccount, "sharedtable", p=>p.PartitionKey == "Type1");
Transfer<MyProject.MyType2>(fromAccount, toAccount, "sharedtable", p=>p.PartitionKey == "Type2");
Transfer<MyProject.SomeOtherType>(fromAccount, toAccount, "someothertype", p=>p.PartitionKey != "");
Put all this together in Linqpad and you've got a simple way to transfer records between accounts on an ad hoc basis. As expected, it works with the Storage Emulator so you can use it to clone the contents of a production account down to your local dev machine and vice versa.