Search

Sunday, July 06, 2025

Home Backup


I just returned home from a trip and had one of those dreaded moments: one of the disks on my home desktop had failed! Thankfully, I have a backup strategy in place that is working for me and it is the second time I had to use it.

How I Choose What to Back Up

This experience reinforced my belief in keeping things simple. I categorize my data as:

  • Important: Family photos, essential documents, critical projects.

  • Not Important: Old files, random downloads, stuff that doesn’t need my attention.

Less clutter ensures lower cost to backup, easier to backup and faster restore. I really care ensuring multiple backups including remote backups of a few critical folders, everything else, I can delete at will and never worry. 

Staying Cost-Effective

Finding affordable yet effective backup options has always been my goal. No one wants to spend money unnecessarily on backups that provide minimal additional benefits.

My Backup Philosophy: The 3-2-1 Rule

I follow the popular 3-2-1 backup rule:

  • 3 copies of all important data.

  • 2 different storage types (local NAS and cloud).

  • 1 off-site backup for worst-case scenarios (think fires or theft).

My Actual Backup Setup

Here's the practical side of my setup:

  1. Local NAS: Quick, reliable backups. All my devices constantly backup their entire contents here, which allowed me to quickly recover my desktops data this time. I treat is as a local cache. I am using a Synology DS2xx series for some time now.

  2. Cloud Storage (OneDrive): Essential files from laptops, desktop, and my phone (including camera) sync automatically. This ensures easy access from anywhere. OneDrive is an easy choice for me. The family tier is 1TB per person, and this alone covers for everything critical for me except the huge cache of raw images (see below).

  3. Cloud Backup (Backblaze): If not for my huge collection of RAW images from my DSLR, I'd likely have just stuck to the above two. However, my desktop has few terrabytes of images, that I simply cannot fit in onedrive, hence my desktop backs up fully (all drives) to BackBlaze against serious emergencies. I have been using BackBlaze for years. I use their 30 day versioning and have in the past had to get a restore, where they sent me a SSD with the folders I wanted to restore. BackBlaze allows you to give an encryption key that is used in the client and not kept with backblaze. So if you loose that key, all data is lost. 

Backup is Easy, but Restore is Crucial

Backing up data is easy—restoring it reliably when something goes wrong is where the real test lies. Every six months, I run a mini disaster recovery drill:

  • Checking that backups are up-to-date.

  • Restoring a few random files to verify everything works.

This practice made my recent restore straightforward and stress-free.

This disk failure ended up being a positive reminder of how valuable a good backup strategy is. I’d love to hear your backup strategies or any tips you've learned through your experiences. Drop your suggestions and ideas in the comments below!

Resume Tips

 



As I am filtering through the applications for a open Principal Manager position in  my organization, I thought I'd share some quick tips. Also if you have been impacted by the recent layoffs and need a fresh pair of eyes to go through yours, ping me, happy to help in any way I can.

1️⃣ Keep it lean
Two pages max. Hiring managers, recruiters are pressed for time, and your résumé is your first filter.

2️⃣ Tailor for the target
Start with a slightly longer master résumé, then tighten it for each application. Highlight the experiences that mirror the job requirements and trim the rest. If the role asks for deep technical chops, surface your biggest engineering wins. If it values people leadership, spell out the size, geography and diversity of teams you have guided.

3️⃣ Know every line
Interviewers love to probe the details. Even that “obscure” project from years ago can surface. Be ready to dive deep on anything you list.

4️⃣ Proof. Proof again. Then proof once more.
Ask a friend to review for clarity and errors. Yes, I just reviewed the resume of “Software Engineering Leafer”.

5️⃣ Expand acronym on the first usage
Clarity beats cleverness. Write “Time-to-mitigate (TTM)” before you rely on TTM alone.

6️⃣ Show impact, not activity
Microsoft and other top firms care about results. Replace “led migrations” with specifics: “migrated 120M users to new platform, improving sign-in latency by 35 percent.” Include metrics on revenue, cost, uptime, or customer satisfaction whenever possible, and be prepared to defend them in interviews.

7️⃣ Cut the buzzwords
“Visionary, results-oriented ninja” is outdated. Run your résumé through a GenAI tool and ask it to flag fluff and repetition.

💡 Bottom line: Precision, relevance and quantified impact separate strong leaders from a sea of applicants. Invest the extra hour to refine your résumé and you will earn the next-step conversation.

Monday, November 28, 2022

Wordament Solver

 


Many many years back in an interview I was asked to design a solver for the game Wordament. At that time I had no idea what the game was and the interviewer patiently explained it to me. I later learnt that couple of engineers in Microsoft came up with the game for the Windows phone platform and it was such a success that they went and bootstrapped a team and made that game their full time job. 

I was able to give a solution in the interview, but that always remained at the back of my mind. I wanted to go further than the theoretical solution and  really build the solver. I began tinkering with the idea a couple of weeks back and over the Thanksgiving long weekend I got enough time to sit down and complete the solution.

You can see it in action at bonggeek.com/wordament/



Basic Idea

We begin by loading dictionary into a Trie data-structure.  Obviously there are fantastic Trie implementation out there, including ones that are highly optimized in memory by being able to collapse multiple nodes into one, however, the whole idea of this exercise was to write some code. So I rolled out a basic Trie.

If a particular Trie node is a end of word, then that node is marked as so. As an example a Trie created with the words, cat, car, men, man, mad will look as below. The green checks denote these are valid end of word nodes.


Now starting from each cell of Wordament, we start at the node for that cell character in the Trie. We look at the 8 adjacent cells (neighbors) and if there are Trie children with the same character as the neighbor, then it is a candidate to look into. And we recursively move to that node. At any point if we arrive at a valid word node, then we check to see if that word was previously found, if not, we add the word and the list of cells that created that word in the result.

Finally since Wordament gives higher score for longer words, we sort the list of words by their length.

The logic of this solution is implemented in wordament.go.

I built the solver into a web-service, that runs in a docker container inside Azure VM. The service exposes an API. Then I built a single page web-application, that calls this web-service and renders the solution.

You can hit the API directly with something like
curl -s commonvm1.westus2.cloudapp.azure.com:8090/?input=SPAVURNYGERSMSBE | jq .

The input is all the 16 characters of the Wordament to be solved.

Wednesday, March 02, 2022

CAYL - Code as you like day


Building an enterprise grade distributed service is like trying to fix and improve a car while driving it at high speed down the free-way. Engineering debt accumulates fast and engineers in the team yearn for the time to get to them. A common complaint is also that we need more time to tinker with cool features and tech to learn and experiment.

An approach many companies take is the big hackathon events. Even though they have their place, I think those are mostly for PR and getting eye candy. Which exec doesn’t want to show the world their company creates AI powered blockchain running on quantum computer in just a 3 day hackathon.

This is where CAYL comes in. CAYL or “Code As You Like” is named loosely on “go as you like” event I experienced as a student in India. In a lot of uniform based schools in Kolkata, it is common to have a go as you like day, where kids dress up however they want.

Even though we call it code as you like, it has evolved beyond coding. One of our extended Program Management team has also picked this up and call it the WAYL (Work as you like day). This is what we have set aside in our group calendar for this event.


“code as you like day” is a reserved date every month (first Monday of the month) where we get to code/document or learn something on our own.
There will be no scheduled work items and no standups.
We simply do stuff we want to do. Examples include but not limited to
  1. Solve a pet peeve (e.g. fix a bug that is not scheduled but you really want to get done)
  2. A cool feature
  3. Learn something related to the project that you always wanted to figure out (how do we use fluentd to process events, what is helm)
  4. Learn something technical (how does go channels work, go assembly code)
  5. Shadow someone from a sibling team and learn what they are working on
We can stay late and get things done (you totally do not have to do that) and there will be pizza or ice-cream.
One requirement is that you *have* to present the next day, whatever you did.  5 minutes each

I would say we have had great success with it. We have had CAYL projects all over the spectrum
  1. Speed up build system and just make building easier
  2. ML Vision device that can tell you which bin trash needs to go in (e.g. if it is compostable)
  3. Better BVT system and cross porting it to work on our Macs
  4. Pet peeves like make function naming more uniform, remove TODO from code, spelling/grammar  etc.
  5. Better logging and error handling
  6. Fix SQL resiliency issues
  7. Move some of our older custom management VMs move to AKS
  8. Bring in gomock, go vet, static checking
  9. 3D game where mommy penguin gets fish for her babies and learns to be optimal using machine learning
  10. Experiment with Prometheus
  11. A dev spent a day shadowing dev from another team to learn the cool tech they are using etc.
We just finished our CAYL yesterday and one of my CAYL items was to write a blog about it. So it’s fitting that I am hitting publish on this blog, as I sit in the CAYL presentation while eating Kale chips

Monday, February 07, 2022

Go Generics


Every month in our team we do a Code as You Like Day, which is basically a day of taking time off regular work and hacking something up, learning something new or even fixing some pet-peeves in the system. This month I chose to learn about go-lang generics.

I started go many years back while coming from mainly coding in C++ and C#. Also in Adobe almost 20 years back I got a week long class on generic programming from Alexander Stepanov himself. I missed generics terribly and hated all the code I had hand role out for custom container types. So I was looking forward to generics in go.

This was also the first time I was trying to use a non-stable version of go as generics is available currently as go 1.18 Beta 2. Installing this was a bit confusing for me.

I just attempted go install which seemed to work




but seemed like it did not work. I had to do an additional step of download. That wasn't very intuitive.

For my quick test, I decided to do a port my quick and dirty stack implementation from relying on interface{} to use generic type.

I created a Stack with generic type T which is implemented over a slice of T.

var Full = errors.New("Full")
var Empty = errors.New("Empty")

type Stack[T any] struct {
    arr  []T
    curr int
    max  int
}

Creating two functions to create a fixed size stack or growable was a breeze. Using the generic types was intuitive.

func NewSizedStack[T any] (size int) *Stack[T] {
    s := &Stack[T]{max: size}

    s.arr = make([]T, size)
    return s
}

func NewStack[T any]() *Stack[T] {
    return &Stack[T]{
        max: math.MaxInt32,
    }
}


However, I did fumble on creating the methods on that type. Because I somehow felt I need to write it as func (s *Stack[T])Length[T any]() int {}. However, the [T any] is actually not required.

func (s *Stack[T]) Length() int {
    return s.curr
}

func (s *Stack[T]) IsEmpty() bool {
    return s.Length() == 0
}


Push and Pop worked out as well

func (s *Stack[T]) Push(v T) error {
    if s.curr == len(s.arr) {
        if s.curr == s.max {
            return Full
        } else {
            s.arr = append(s.arr, v)
        }
    } else {
        s.arr[s.curr] = v
    }

    s.curr++

    return nil
}

func (s *Stack[T]) Pop() (T, error) {
    var noop T // 0 value
    if s.Length() == 0 {
        return noop, Empty
    }

    v := s.arr[s.curr-1]
    s.arr[s.curr-1] = noop // release the reference
    s.curr--

    return v, nil
}
However, for pop I needed to return a nil/0-value for the generic type. It did seem odd that go does not implement something specific for it. I had to create a variable as noop and they return that.

Using the generic type is a breeze too, no more type casting!
s := NewStack[int]()

s.Push(5)
if v, e := s.Pop(); e != nil {
    t.Errorf("Should get poped value")
}

Tuesday, June 16, 2020

Raspberry Pi Photo frame


This small project brings together bunch of my hobbies together. I got to play with carpentry, photography and software/technology including face detection.

I have run out of places in the home to hang photo frames and as a way around I was planning to get a digital photo frame. When I upgraded my home desktop to 2 x 4K monitors I had my old dell 28" 1080p monitor lying around. I used that and a raspberry pi to create a photo frame. It boasts of the following features
  1. A real handmade frame
  2. 1080p display
  3. Auto sync from OneDrive
  4. Remotely managed
  5. Face detection based image crop
  6. Low cost (uses raspberry pi)
This is how it looks.


Construction

In my previous project of smart-mirror, I focused way too much on the framing monitor part and finally had the problem that the raspberry-pi and the monitor is so well contained inside the frame that I have a hard time accessing it and replacing stuff. So this time my plan was to build a simple lightweight frame that is put on the monitor using velcro fasteners so that I can easily remove the frame. The monitor is actually on its own base, so the frame is just cosmetic and doesn't bear the load of the monitor. Rather the monitor and its base holds the frame in place.

I bought a 2" trim from Homedepot and cut out 4 pieces using a saw and then joined them using just wood glue. To let the glue cure, I held the corners using corner clamp for 12 hours. The glue is actually stronger than the trim itself, so once it dries there is no chance of things falling apart.



On the back of the frame I attached a small piece of wood, on which I added velcro. I also glued velcro to the top of the monitor. These two strips of velcro keeps the frame on the monitor.



Now the frame can be attached loosely to the monitor just by placing on it.

After that I got a raspberry-pi and connected it to the monitor using hdmi cable and attached the raspberry pi with zip ties to the frame. All low tech till this point.
On powering up, it boots into Raspbian.

Software

Base Setup

I always get my base setup 

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install xrdp # install remote desktop
sudo apt-get install vim  # my editor of choice
sudo apt-get install git

git clone https://github.com/abhinababasu/share # get my shell
cp share/.vimrc .
cp share/.bash_aliases .
cp share/.bashrc .
cp share/.bash_aliases .

sudo apt-get install unclutter # hide mouse pointer in slide show

To keep things fresh, reboot midnight every day, add the following to /etc/crontab
0  0    * * *   root    reboot

Enable ssh
sudo raspi-config

Portrait mode
1. sudo vim /boot/config.txt
2. Add the line: display_rotate=3

Push Pics

I use FrameMaker for managing photos I take. My workflow for this case is as follows

  1. All images are tagged with keyword "frame" in lightroom. 
  2. I use smart folder to see all these images and then publish to a folder named Frame in  OneDrive

Sync OneDrive to Raspberry Pi

I used the steps in https://jarrodstech.net/how-to-raspberry-pi-onedrive-sync/ 
  1. curl -L https://raw.github.com/pageauc/rclone4pi/master/rclone-install.sh | bash
  2. rclone config
    1. Enter n (for a new connection) and then press enter
    2. Enter a name for the connection (i’ll enter onedrive) and press enter
    3. Enter the number for One Drive
    4. Press Enter for client ID
    5. Press Enter for Client Secret
    6. Press n and enter for edit advanced config
    7. Enter y for auto config
    8. A browser window will now open, log in with your Microsoft Account and select yes to allow OneDrive
    9. Choose right option for OneDrive personal
    10. Now select the OneDrive you would like to use, you will probably only have one OneDrive linked to your account. This will be 0
    11. Y for subsequent questions
  3. To Sync once: rclone sync -v onedrive:Frame /home/pi/frame
  4. Setup automatic sync every one hour
    1. echo "rclone sync -v onedrive:Frame /home/pi/frame" > ~/sync.sh
    2. chmod +x ~/sync.sh
    3. crontab -e
    4. Add the line: 1 * * * * /home/pi/sync.sh

Setup Screensaver

There are many options that I could find online to show the photos. But I chose to go with the easiest one, use the xscreensaver. However, there are some issues and most likely this is something I will revisit.

  1. Disable screen blanking after some time of no use
    1. vi /etc/lightdm/lightdm.conf
    2. Addd the line[SeatDefaults]
      xserver-command=X -s 0 -dpms

  2. Enable auto-login, so that on restart you directly get logged in and then into screensaver
    1. sudo raspi-config
    2. Select 'Boot Options' then 'Desktop / CLI' then 'Desktop Autologin'. Then right arrow twice and Finish and reboot.

  3.  Setup screen saver
    1. sudo apt-get -y install xscreensaver
    2. sudo apt-get -y install xscreensaver-gl-extra

These are my screen saver settings to show the photos in /home/pi/frame as slideshow





Problems and solving with Face Detection

My photos are rarely 9:16 portraits, that means an ugly black box on the top and bottom of the images. 


Obvious approach is to crop using some batch tool. But that would mean the crop could arbitrarily cut images out. Consider the following image 
Cropping in a batch tool that picks up arbitrary area of the image generated something like below, which is obviously not acceptable.
To solve this I build a tool at https://github.com/abhinababasu/img. It takes my other project on detecting faces in images and then ensures that in the cropped image the face is retained. E.g. the tool above generates the following image.

Monday, June 15, 2020

Building Azure Monitor for SAP Solutions

The Product

Update: Here is a quick-start video
    

This post is about how we build the Azure Monitor for Sap Solutions. It is about the distributed systems we use to build database monitoring at scale for customer's data-plane. However, the first section provides a quick intro into the product itself.

"Azure Monitor for SAP Solutions" provides managed monitoring for the databases powering customer's SAP landscapes. Our monitoring supports multiple instances of databases of a particular type (e.g. HANA) and is also extendable for various kinds of databases. We have started with HANA and plan to include SQL-Server, etc. in the future. At the time of writing this post the monitoring is in  private-preview with public preview coming up "soon".

The customer uses a creation wizard on Azure portal to create the monitor as shown is the screenshot below. Customer enters their subscription, resource group, vnet details, followed by connection details of the database. Our resource provider deploys a VM payload into their vnet that connects with the database to monitor them and pumps telemetry into their Azure analytics workspace. Customers can then create dashboards and configure alerts.

Some example visualization using Workbooks on the Log Analytics Workspace to which we pump the data are as follows. We are still tweaking these and once we are in public preview I plan to come back and edit this post with links to public docs.

In the screenshot below the visualization shows all database clusters of our test cluster at the same place. Selecting any cluster further drills down into each DB node health.

Similarly in the following visualization we see a cluster is unhealthy and then on drilling down a node is yellow (warning state) because it is triggering our high CPU usage threshold (>50%).

Architecture 

Our product is built on Kubernetes (or rather Azure Kubernetes Service), helm, linkerd, go-lang, fluentd and similar open source software. We use the engineering principles outlined here. Also we stand on the shoulder of giants, we did not have to build many core functionality because it comes for free inside the Azure engineering umbrella. We simply onboard to internal services that provide RBAC, cross region load balancing, billing etc.

If the architecture seems familiar it is because a large part of it is shared with how we manage BareMetal blades running in memory databases (HANA) in Azure and I have posted about that here.

At the high level our architecture looks as follows.

The user/customer interacts with out system using either the Azure Portal (screenshot above), the command line tools or the SDK. We build extensions to the Azure portal for our product sub-area. All resources in azure is exposed using standardized RESTful APIs. The swagger spec is published here and the CLI and SDK is generated out of those.

All interactions of the customer is handled first by the central Azure Resource Manager (ARM). It handles authentication and RBAC. Every resource type in Azure is handled by a corresponding resource provider. In this particular case the resource is Microsoft.HanaOnAzure/sapMonitors and it is handled by the HANA-RP (also referred to as just RP for simplification in this post). ARM knows to forward calls that it gets from customers to a particular regional instance of HANA-RP after taking care of authentication and other gate keeper activities.

The regional Resource Provider or RP

For every Azure region we support we have a HANA-RP (resource provider or RP) instance deployed in that region. The RP is a collection of services that runs on Azure Kubernetes Service (AKS). HANA-RP is build mostly using go-lang and engineered through Azure DevOps. We have automated build pipeline for the RP and single click (maybe a few clicks) deployment. We use use Helm for management.

The service itself is stateless and the state is stored externally in Azure SQL Server. We use both structured data and document-DB style data. All data is replicated remotely to one more region, we configure automated backups for disaster recovery scenarios.

We do not share any state across the RP instances. This provides an important attribute we look for in Azure services, regional isolation. This ensures that in case there is a regional Azure outage it does not effect any other regions.

Each instance of RP manages all monitors in its region. When the user uses the CLI/Portal to create the monitor all the details flow over encrypted channel from the ARM into the RP. The RP then deploys the monitoring payload into customers vnet.

All data flow across pods (intra service) and across the services are encrypted in transit and the data that we store is SQL Server is also encrypted at rest. We do not store any customer secrets on our systems (more below).

Tech usage: AKS, Kubernetes, linkerd, nginx, helm, linux, Docker, go-lang, Python, SQL Server, Azure DevOps, Azure Container Repository, Azure Key-vault, etc.

Deployment

Once the RP gets a request to provision a monitor, it talks to other Azure resource providers like Compute, Storage, Security, Networking to setup the monitoring payload inside the customers vnet
  1. The RP creates various networking components (NSG, NIC)
  2. Creates storage account, storage queues
  3. Uses KeyVault to deploy DB access secrets. These are not stored by us, they remain encrypted in transit and in rest inside customer owned KeyVault
  4. Creates log-analytics workspace
  5. Creates collector VM in the resource group (a VM of type B2ms
  6. The VM uses custom script extension to bootstrap docker and pulls down the monitoring payload docker image

The Payload

Since the payload runs inside the customers vnet, we want to be absolutely transparent about what runs inside it. The entire payload is open source and can be accessed at https://github.com/Azure/AzureMonitorForSAPSolutions. Specifically at https://github.com/Azure/AzureMonitorForSAPSolutions/tree/master/sapmon/payload.

The commands use to install, launch and manage individual sub-monitors is in sapmon.py. Specific payloads are in say saphana.py or other files in that folder.

Our payload VM fetches the docker image built out of these sources from our Azure container repository from the following location
 mcr.microsoft.com/oss/azure/azure-monitor-for-sap-solutions
Once this payload starts running inside the payload-VM, it fetches database connectivity information from customer's key-vault where the RP has placed that information. It then starts querying the database to fetch various monitoring information and pumping it into the Azure telemetry pipeline. 

If the customer had opted-in during the monitor creation, the monitor also sends non identifiable telemetry back to Microsoft, so that we can ensure that the monitoring keeps functioning.

We intentionally chose a design where the monitor does not run on the database machine itself and it is isolated in a separate VM. This ensures it is easy to observe the execution of the monitor and it is easy to isolate any impact it may have on the production system of the customer.

The way our monitoring is designed (execute monitoring queries against the database to fetch monitoring information) allows it to monitor any database that is reachable from inside the customers vnet. This includes obviously databases deployed on VMs inside the vnet. In addition it can monitor customer's HANA Large Instances that are running in BareMetal blades in VLANs that are accessible over express-route. Essentially as long as the database server name is resolvable and the database on it is reachable, the monitoring system works.

Scalability

Our HANA-RP is automatically sharded by regions as it only handles all monitors in it's own region. Our stateless micro-services in each of those regions ensures we can easily horizontally scale to handle more control plane calls on the monitor in that region (create/delete monitors).

For the data-plane we actually deploy the entire payload in separate payload VMs inside the customer subscription/resource-group. So each new monitor comes with its own payload VM that monitors a DB (or a few instances of DB) for a given customer resulting in automatically scaling. The data also gets pumped into customer specific analytic workspaces and hence is not a bottleneck.

Tuesday, May 05, 2020

Using Visual Studio Codespaces



One of the pain points we face with remote development is having to go through few extra hops to get to our virtual dev boxes. Many of us uses Azure VMs for development (in addition to local machines) and our security policy is to lock down all VMs to our Microsoft corporate network.

So to ssh or rdp into an Azure VM for development, we first connect over VPN to corporate network, then use a corpnet machine to then login to the VMs. That is painful and more so now when we are working remotely.

This is  where the newly announced Visual Studio Codespaces come in. Basically it is a hosted vscode in the cloud. It runs beautifully inside a browser and best of all comes with full access to the linux shell underneath. Since it is run as a service and secured by the Microsoft team building it, we can simply use it from a browser on any machine (obviously over two-factor authentication).

At the time of writing this post, the cost is around $0.17 per hour for 4 core/8GB which brings the price to around $122 max for the whole month. Codespaces also has a snooze feature. I use snooze after one hour of no usage. This does mean additional startup time when you next login, but saves even more money. In between snooze the state of the box is retained.



While just being able to use the IDE on our code base is cool in itself, having access to the shell underneath is even cooler. Hit Ctrl+` in vscode to bring up the terminal window.


I then sync'd my linux development environment from https://github.com/abhinababasu/share, installed required packages that I need. Finally I have a full shell and IDE in the cloud, just the way I want it.

To try out Codespaces head to https://aka.ms/vso-login