Ryan Spletzer

Zero to Trusted: SPIFFE and SPIRE, Demystified

Tue, 25 Mar 2025 00:00:00 +0000

I vividly remember one of the first tasks my engineering manager handed to me: figure out how to automate rotating the client_secret for an App Registration in Entra ID (the product formerly known as Azure AD).

It sounded straightforward in theory – “just rotate the secret” – but I soon realized it was “turtles all the way down”: each automation script or tool you introduce to rotate a secret requires another secret to authenticate itself, and that secret, in turn, needs protection and rotating. It quickly becomes an intractable chain of credentials managing other credentials, and the security trade-offs pile up fast.

Turtles all the way down, courtesy of ChatGPT

Eventually several years later, I read Solving the Bottom Turtle: A SPIFFE Way to Establish Trust in Your Infrastructure via Universal Identity, and I realized “Aha, someone finally cracked this problem,” and its solution is one that has the promise to really help achieve a zero-trust posture inside of environments that are hosting a collaborating set of services (typically, for one or more products, but potentially for other scenarios as well, even CI/CD).

For those that don’t know, zero-trust admittedly, can be an overloaded term, and something that can mean different things to different people, and does mean different things in different contexts (say, for end users on a device versus a set of services talking to each other); and depending on who you ask, whether they are more network-oriented or identity-oriented, they may view controls for zero-trust through different lenses.

With that said, zero-trust at its core is a security model that assumes no implicit trust in network boundaries, and instead focuses on verifying every request – by users, systems, or services. With the rise of cloud computing and microservices, zero-trust principles (such as short-lived certificates and environment-based trust) have become increasingly critical for reducing the blast radius of breaches and eliminating the risk of long-lived secrets.

Before jumping into the magical solutions that SPIFFE and its implementation SPIRE provide on this front, let’s first talk about the actual problems and logistics involved with a secret string that a person is entrusted with handling and putting into service in an environment.

I need to apologize because the following descriptions are deliberately going to be painful and will describe the web browser and clicks and clipboards involved in getting a secret from some type of web page into a place where it can be used…

The (Painful) Lifecycle of a Secret

1. Acquiring a Secret

Let’s say you need a client_secret for your app registration in Microsoft Entra ID. You navigate to the Azure Portal, sign in (hopefully with MFA, device compliance, and all the best practices enforced through conditional access!), and then generate a random secret with a chosen validity period. Sure, it’s not technically difficult. But you’ve already introduced a significant risk: you’re personally handling the secret, and your clipboard or local files might become a point of leakage.

2. Keeping a Secret

Next, you have to store that secret somewhere. If you copy and paste it, you might momentarily stash it in a text file or keep it in your clipboard, hoping your machine is secure enough (EDR, compliance, device encryption, etc.). The bigger your team or the more frequent the rotations, the higher the odds that someone, somewhere, will accidentally commit it to a Git repo or leak it in logs. We rely on our best judgment, but we’re all human—and secrets all too easily slip out.

3. Putting a Secret into Service

Now you must inject that secret into an app or service. Do you store it in a password manager or a secrets vault? Do you put it in your CI/CD pipeline in an encrypted variable? Or do you manually configure it into each environment? Each approach is a trade-off – how many copies of the secret are you comfortable with existing at once, and how often do you rotate them to reduce risk? At large scale, all of these these philosophically different approaches can become a real source of friction, especially when hundreds of services each need secret management.

4. Rotating an In-Service Secret Manually

If you need to rotate that secret after a certain period of time (or before whenever it expires), you basically repeat the entire “acquire → store → deploy” process. Any misstep during manual rotation risks production outages or breakage in downstream services that depend on the secret. This is the primary reason people start asking: “How can we automate this?”

5. Rotating an In-Service Secret Automatically

Here’s where the “turtles all the way down” problem really hits. Automation usually requires another identity (and thus another credential) with the power to generate new secrets or push them into your environment. That means you end up managing a higher-level credential to manage your lower-level credentials. In many enterprises, you might have multiple credentials, each controlling a slice of the rotation story (for example, the secret itself, and perhaps another secret to authenticate to the environment where you need to push it into) – with each secret potentially living in a different vault or secrets manager. It’s complexity layered on complexity.

Bottom line: The second you add automation for secrets rotation, you typically introduce a chain of new secrets that must themselves be protected and rotated. It’s easy to see how this grows out of control.

This is Madness

It’s turtles all the way down…

There really isn’t a great answer to this layering of secrets that own/manage other secrets, and while there are methods to do credential-less authentication on the cloud providers, the logistics involved in extending that across clouds or even to data centers can be quite intricate, and it forces you to “pick a horse” from one of the cloud providers – would you standardize on Azure Arc to extend managed identities to other places like GCP, AWS and the data center?

No matter what, there will always be some type of secret to handle in our environment, but the question is two-fold: how do you go about making access to those secrets seamless so that they can be easily rotated, but also, how can you go about eliminating certain types of secrets for scenarios like service-to-service authentication?

Introducing SPIFFE

If the endless layers of secrets-owning-other-secrets make your head spin, you’re not alone. These are extremely common challenges in modern, distributed systems at scale. Even with the best practices for encrypting, storing, and rotating secrets, mistakes happen. Credentials can leak, get misused, or simply become unwieldy in large-scale environments.

That’s where SPIFFE (Secure Production Identity Framework For Everyone) comes in.

Many organizations use secret managers (e.g., HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) to store and rotate credentials. SPIFFE tackles a complementary problem: it removes certain secrets altogether by issuing ephemeral certificates. In some cases, you’ll still have secrets for legacy apps, but now you can centralize or reduce them over time as more services adopt SPIFFE-based identities.

SPIFFE is an open standard that is part of the CNCF for secure service identity, meant to free us from many of the pitfalls of distributing and rotating secrets manually. The SPIFFE standard defines:

How to identify workloads (anything running in your environment, across cloud providers or data centers, from a container to a VM, or – with some elbow grease – even potentially serverless functions).
How to issue and validate these identities in a cryptographically verifiable way without needing to inject long-lived secret strings.

The concept originated from folks at high-scale tech companies like Scytale (now owned by Hewlett Packard Enterprise), Netflix, Google, Pinterest, and Amazon, who learned firsthand that manually distributing secrets doesn’t scale or remain secure in a dynamic microservices world. They took inspiration from their various homegrown solutions to this problem and distilled it up into a standard that is generalized and open that technologists at other companies could adopt.

Instead of trusting humans and portals to move secrets around, SPIFFE relies on:

Attestation: Checking that a workload is truly running on the environment it claims (e.g., verifying AWS instance metadata, Kubernetes service account tokens, or node properties).
Automated Credential Issuance: Generating a short-lived identity document (an SVID, which can be a certificate or a JWT) that the workload uses for mutual TLS (mTLS) or other secure communication.

Terminology

For the uninitiated, here’s a quick terminology guide:

PKI (Public Key Infrastructure): A system for creating, managing, distributing, and revoking digital certificates and keys.
mTLS (Mutual TLS): A variation of TLS (commonly known for HTTPS) that requires both client and server to verify each other’s identity.
Attestation: The process of verifying that a workload is what it claims to be (for example, by checking the cryptographic signature of a cloud provider’s instance metadata).
SVID (SPIFFE Verifiable Identity Document): A short-lived certificate (or token) containing a SPIFFE ID that proves the identity of a workload.

What the Heck is an SVID?

An SVID is just a short-lived X.509 certificate with a SPIFFE ID inside one if its fields – like spiffe://mycompany.com/prod/serviceA – that can be used to establish mutual TLS (mTLS) between workloads. Once its lifetime expires (often minutes or hours), the workload automatically fetches a fresh SVID from SPIRE.

While certificate-based SVIDs have more advantages, there are also mechanisms to issue JWT-based SVIDs for scenarios where certificates are not a good fit. One of the notable downsides to JWT SVIDs is that they cannot be used to establish the client side of mTLS, effectively making mTLS a non-starter. This is why JWT SVIDs are a bit of a bummer on the zero-trust front, since mTLS is a powerful mechanism to secure client-server communications (or the “channel” as my colleague likes to call it).

Whether it is a certificate or JWT SVID, you can think of an SVID as an automatically issued “temporary ID badge” that expires quickly – like wearing a visitor’s badge in an office that stops working after a few hours. This means if someone steals that badge (certificate), it’s only valid briefly, preventing long-term damage. And because SPIRE automatically renews SVIDs under the hood, engineers don’t have to schedule or script rotations.

So, What Does SPIRE Do?

SPIRE (the SPIFFE Runtime Environment) is an open-source implementation of the SPIFFE standards, also under the CNCF umbrella. It handles the bootstrapping and management of workload identities according to SPIFFE specifications. It is composed of:

SPIRE Server: The “brain” that signs and issues SVIDs.
SPIRE Agents: Lightweight daemons running alongside your workloads (on VMs, Kubernetes nodes, etc.) that attest the workloads and fetch SVIDs from the SPIRE server.

Under the hood, SPIRE can:

Attest that a workload is what it claims to be (via node or workload attestation plugins) based on platform-specific signals (e.g., AWS instance metadata, Kubernetes service account tokens, etc.).
Issue short-lived certificates (called SVIDs, or SPIFFE Verifiable Identity Documents) that the workload uses to authenticate itself to other workloads, without passing around a static key or password.
Rotate certificates automatically so that secrets are ephemeral, drastically reducing the risk of stolen credentials being used for too long.

Attestation essentially means, “Prove you’re really running where you say you’re running.” For example, on AWS, SPIRE can verify a workload by checking the digitally signed metadata document from AWS that confirms your EC2 virtual machine is indeed in a valid AWS environment. Similarly it can key off of other metadata in Kubernetes-based environments to attest identity; and as mentioned above, with some additional effort, it can even be done in serverless scenarios. This check is automated and short-lived, meaning we trust the environment itself to confirm identity rather than a human copying secrets around, and we re-attest this at desired enough frequencies to ensure that the attestations are “fresh,” as are the issued SVIDs that follow.

The upshot of all this is that instead of injecting a user-managed secret, each workload receives a cryptographically verifiable identity tied directly to the environment it runs in. It’s effectively removing the manual secret from the equation and replacing it with an automatic, environment-based trust anchor.

Imagine you deploy a microservice in Kubernetes. When it starts, the SPIRE agent inside that node verifies the pod’s identity (using something like the Kubernetes Service Account token or pod labels). The agent says: “Yes, this is the real Service A pod in the production namespace,” and it issues a short-lived SVID certificate with the SPIFFE ID spiffe://mycompany.com/prod/serviceA. Now Service A can make an mTLS connection to Service B. Service B sees the spiffe://mycompany.com/prod/serviceA identity in the certificate and decides whether to trust it based on policies you’ve set. No manual secrets required.

Let’s walk through that step-by-step with a concrete example:

Deploy: You push your container image to your Kubernetes cluster. (This could happen many different ways, but for sake of illustration, let’s say it’s a typical Kubernetes Deployment.)
Attestation: The SPIRE agent on that node checks a pod’s Kubernetes labels, service account token, or node identity to confirm your service really is “Service A.”*
SVID Issuance: The SPIRE agent requests a short-lived certificate from the SPIRE Server on your service’s behalf. This SVID in effect is your “workload identity.”
mTLS: “Service A” initiates a mutually authenticated TLS connection to “Service B,” presenting the SVID as proof of identity. “Service B” verifies the certificate is valid and from a trusted SPIFFE domain.
Automatic Renewal: If the certificate expires in an hour, SPIRE renews it seamlessly, so there’s no downtime or manual rotation.

*Note that attestation in Kubernetes is highly flexible and can work off of multiple types of metadata and signals around the pod to prove identity; in other words, there options available to the implementers to decide precisely how they wish to identify the workload in Kubernetes.

Understanding this architecture without a visual can admittedly be a challenge, and while I could insert a visual here, I think it is better to point to the official SPIFFE website where they have great visuals to illustrate this.

This process above is how the process works for workload attestation, but at this point you may be asking: “How does the SPIRE server trust the SPIRE agent in the first place?” This happens prior to workload attestation, through a process called “node attestation,” and we’ll cover how this works in a couple of common scenarios, on EC2 and on Kubernetes.

SPIRE Agent Node Attestation on EC2

When you run a SPIRE agent on an EC2 instance, it uses AWS-specific signals to verify that the instance is real and belongs to your AWS account. Typically, SPIRE reads a digitally signed instance identity document from the EC2 metadata service – along with temporary AWS credentials or a signature unique to that instance – to prove to the SPIRE Server that “Yes, this agent really is running on an authentic EC2 instance in our AWS account.” Once verified, the agent is allowed to issue valid SVIDs to workloads on that host.

SPIRE Agent Node Attestation on Kubernetes

In a Kubernetes environment, the SPIRE agent checks one or more Kubernetes tokens or node properties to ensure it’s running on a trusted node within a legitimate cluster. For example, the agent might validate the node’s API server certificate or use the Kubernetes service account token assigned to the SPIRE agent’s pod. This cryptographic token tells the SPIRE Server, “I’m a recognized Kubernetes node within this cluster,” and once that’s confirmed, the agent can issue SVIDs to the containers for the pods on that node.

Anchor Trust (and Authentication) in the Environment

With SPIFFE/SPIRE, the trust anchors become part of your runtime environment – like your Kubernetes cluster or cloud environment’s unique attestation data (for instance in the case of EC2 on AWS, leveraging the cryptographically signed EC2 metadata document from AWS) – and these anchors are used to generate ephemeral certificates for each workload. This model shifts responsibility from:

Humans carefully copying strings back and forth across portals,
Long-lived service principals whose secrets have to be manually rotated (and worse, that expire and potentially cause outages),
Complex toolchains or third-party password vaults for everything,

… to an automatic identity issuance that authenticates workloads based on where and how they run. Your environment, plus SPIRE, effectively says: “Yes, this is the real Service A on the real cluster us-east-1-prod,” and it hands out a short-lived credential that Service A can use to prove its identity to Service B.

Trust Domains and Federation

With SPIFFE/SPIRE, each identity is rooted in a trust domain – often represented as a DNS-like name. For instance, spiffe://prod.mycompany.com might represent your production workloads. A second domain, spiffe://dev.mycompany.com, might represent development environments.

By default, workloads in one trust domain do not trust workloads from another. This is an important security control, because it prevents, for example, a dev environment from presenting valid credentials to prod. If you need cross-domain service calls (say, your SaaS platform at spiffe://saas.com trusting spiffe://enterpriseA.local), you can set up SPIFFE Federation to explicitly share trust anchors between those two domains. That way, you can keep distinct environments separate, while still allowing well-defined interoperability where it’s needed.

Authorization Still Comes from You

Once you have cryptographically provable identities for your workloads, you can layer on your own policies:

Which workloads can talk to each other?
Which cross-environment calls are allowed?
What exactly does “production” mean in terms of trust?

Humans still need to define the rules and specify exactly which services are authorized to each other, and that may involve your own logic in your services to determine authorization (which is slightly outside the scope of SPIFFE/SPIRE, and it will vary based on your scenario, but there are well-known projects like Open Policy Agent to take a look at for sophisticated approaches to authorization); however, the underlying credentials and authentication are handled automatically, so you’re not messing around with manual secrets or multi-step rotations.

SPIFFE is Spiffy

Puns aside, the real benefit of SPIFFE is that you now have a single, standardized way to express service identity across practically any environment – on-premises, multi-cloud, container-based, VM-based, or otherwise. It’s a great unifier that can drastically simplify (and de-risk) a lot of the “madness” of distributing secrets.

SPIRE is Fire 🔥

As the reference implementation, SPIRE does the heavy lifting of making SPIFFE a reality:

It boots up agents (node agents) that run on each machine/VM/node.
Those agents validate workloads (workload attestation).
They issue SVIDs with short lifetimes so that every service request or mTLS handshake uses a a very fresh, automatically rotated credential.

In short, SPIRE weaves a dynamic web of trust across your environment.

But Pragmatism Must Also Prevail

While SPIFFE and SPIRE can solve for service-to-service authentication, we can’t escape the fact that we still need to handle some types of secrets, potentially for certain types of database logins, or external APIs (perhaps for tracing) that haven’t yet adopted modern identity frameworks, or older systems that simply can’t talk SPIFFE.

However, with SPIFFE in the mix, there exists the potential to push those authentication concerns more towards the edge of the operating environment, like via potentially wrapping certain legacy API’s and services with SPIFFE-aware proxies to reduce the footprint of secrets scattered throughout the environment.

SPIFFE/SPIRE doesn’t magically eradicate every type of shared secret from your environment. However, it makes a big dent in the ongoing operational burden and risk by taking a huge chunk of service credentials off your plate.

Push the Envelope

Despite these caveats, there is greater potential for true identity-based security, rather than purely secret-based security. Think of all the trouble we endure just ensuring a secret is kept safe and rotated frequently. By moving to ephemeral, environment-attested, cryptographically sound identities:

We drastically reduce human error and accidental exposures.
We limit the blast radius when something does get compromised.
We make secrets rotation effectively invisible to the engineers and automated behind the scenes.

For many organizations, adopting SPIFFE/SPIRE is a step function leap forward towards zero-trust architecture in the operating environment of large scale sets of services. It may not be a universal fix for all credential types, but it can lessen the burden of how to manage the existing credentials that you do have, and is a powerful standard and toolset for modern infrastructure teams looking to level up their security posture for service-to-service authentication.

How to Build a Team from Scratch

Sat, 22 Feb 2025 00:00:00 +0000

I have observed several avoidable missteps when people try to build a brand-new engineering team from scratch, and I believe there are a few rules of thumb that can help. I work in the software engineering field, so I approach this mostly from that perspective; however, this advice likely applies to other professional domains as well.

Opportunities to start a brand-new team from scratch can be rare. More often, a manager inherits an existing team through various means like joining a new company where the team is already established, or perhaps even experiences a reorg where folks from multiple teams are split off and merged together, or is handed a set of team members through some other set of circumstances that are outside of their control. (As an aside: I believe as a management practice that breaking up well-functioning, “gelled” teams should be a last resort – unless the team’s success and growth require splitting it to accommodate another manager for ideal team sizes. More on that below.)

If you do find yourself with the golden opportunity to build a team from the ground up, this advice is for you.

This is not a post about the phases of “forming, storming, norming, and performing,” nor about team camaraderie, nor other management practices involved in forming new teams – rather, it focuses on how a newly minted manager should select and hire their first few team members for long-term success.

Caveats on Constraints

I realize as I write this that this is an idealized take on this problem, and that every manager out there is dealing with a set of constraints they have to work within, with the biggest one likely being budget. Understand that I am sympathetic to those constraints, as I have witnessed them first-hand in hiring processes throughout the years. This post serves as a sort of set of recommendations for engineering managers, who should consider adapting this advice for their own environment.

Make Your First Hire a Senior Engineer

The most important point is that your first hire should be a senior-level person. This is a strong statement, and frankly an opinion of mine, but it is the number one misstep I see managers and organizations make when starting a new team. You need someone who you can build the team around, and someone who will mentor early-career hires that come next.

Too often, managers begin by hiring entry-level engineers. This can work if the manager has a strong engineering background and can effectively act as the team’s Senior Engineer or Tech Lead, but even in that scenario, it can be more of a burden than expected. Your first hire should help set the team’s standards, establish essential processes (such as CI/CD), and have both knowledge and opinions about infrastructure as code, the cloud, and more. It is unreasonable to expect this from a single entry-level engineer on a brand-new team.

It is also important to note that your first hire will help set the tone and will be responsible for interviewing and hiring subsequent team members. That is why the first hire on your team is so critical.

Screen for “Learn-It-Alls” Instead of “Know-It-Alls”

As Satya Nadella points out, it is best to seek out hires who are “learn-it-alls,” instead of “know-it-alls.” Avoid limiting your searches to people who have hyper-specific skills with a single framework or tool. If someone is adept at learning, they can pick up the tools you actually use.

The reality is that none of us uses the exact same tools and frameworks for years on end (and if you are, then I have a Classic ASP Intranet to sell you) – our industry changes too rapidly. With the advent of AI coding assistance, the landscape will shift even more quickly. Look for people who are continuous learners, and ideally those who are eager to become subject-matter experts in a problem domain.

Manager Engineering Expertise is Ideal, but Not Required

Ideally, the manager building the team is a skilled engineer themselves (with bonus points if they once reached Senior Engineer before moving into management). Such a manager can guide the team technically and mentor them in engineering practices. That said, while being a former engineer is helpful, it is not a strict requirement. If you as a manager are not a former engineer, then it is doubly important to make your first or second hire a Senior Engineer.

I have seen highly effective teams led by non-technical managers who rely on strong technical leads and extend a high level of trust to them. The opposite scenario can also work: a team of mostly early-career engineers managed by a former engineer who acts as the Tech Lead. That often works best for smaller teams, where a single manager/tech lead can handle the guidance and mentorship needed.

Aim for at Least Three Members to Form a “Team”

A group of one or two people cannot realistically function as a “team.” Once you have three people working closely together, you start to see the dynamics of a real team.

As the team grows, pay attention to the ratio of Senior vs. early-career roles. For example, in a typical team of eight with levels like Software Engineer I, II, III, and Senior, you might aim for two people at each level (maybe with one Senior at Staff level). If you grow to ten, add one more Senior and one more at one of the other levels.

Be careful when you bring on multiple entry-level hires at once. They will need substantial onboarding and mentorship, and that takes time and focus from your senior team members. If you are not in a rush to scale quickly, pace your hiring to avoid overwhelming your more experienced engineers.

The role level ratio guidance above is based on a team that is working in a typical software engineering capacity – however, there are more “advanced problem domains” that may warrant skewing the team towards more senior roles versus entry level roles. There are many such advanced problem domains out there, and I would expect you to know if you are operating in one of them.

Consider the “Two Pizza Team” Rule

I am a big believer in the “two pizza team” rule: you should be able to feed your entire team with two pizzas, which generally translates to about eight people (plus or minus two). (If you have a team of four that can wolf down two pizzas, then I am jealous of their metabolisms and ability to not get heartburn.) Beyond this threshold, coordination costs rise significantly. If the team’s size exceeds this number, consider a team split with the addition of another manager, analogous to the way that cells grow and divide.

If you reach that point of having enough engineers to consider a second engineering manager, congratulations: upper leadership sees that your team’s work is valuable, and wants to see more of it, and you may get the chance to become a manager of managers. Also, around eight people is a good threshold for practical matters like on-call rotations. If the team has fewer members, the on-call burden can be disproportionately large unless you have first-level support groups to help; if you have gotten enough team members to require a team split, then that makes the on-call burden potentially even more spread out amongst more people who understand your systems, which is a good thing.

Notes on Augmenting Your Team with Contractors

Many teams with an average of eight full-time members can handle one or two contractors without a huge coordination cost. However, if you add too many, you will need to manage higher coordination costs and onboarding overhead.

Sometimes, temporarily adding many more contract staff makes sense for time-bound, focused efforts (e.g., migrating systems from on-premise to the cloud, or adopting a new observability platform for a backlog of existing services), or can help as an augmentation to tackle bugs in the system and handle tackling tech debt. Just be aware that if half of your team is made up of temporary staff, and they are responsible for mainline projects, you could risk losing critical knowledge or having the team being overwhelmed if those contractors transition to their next opportunity. Conversely, if you’re lucky and have the right set of circumstances, your contractors could serve as an additional pipeline of potential talent that could be transitioned to full-time roles.

I don’t necessarily recommend starting a team exclusively with contractors – but at that same time if you are in this situation (and, again, with the right set of circumstances), your initial pool of contractors could be a potential pool of candidates to source that elusive first Senior Engineer full-time hire.

Wrap-Up

These rules of thumb are just that: guidelines I have postulated over the course of my career in different industries and team sizes. I have personally been a member of teams as small as a size of one, and larger than fifteen. There is no “one-size-fits-all” approach to building a team, but I believe that if you:

aim to make your first hire a senior engineer,
look for continuous learners rather than “know-it-alls” or people who are exclusively skilled in specific frameworks and tools, and
pay attention to team size and role level ratios,

you will set your new team up for long-term success.

Good luck building and growing your new team!

How to Get Your client_id and client_secret from Entra ID

Thu, 09 Jan 2025 00:00:00 +0000

Since I am not a scalable replacement for ChatGPT or Google, and because I felt this would be helpful beyond my own company, I felt compelled to write this post due to the sheer volume of questions I get on an almost daily basis about how to get a client_id and client_secret for an app registration from Entra ID (formerly known as Azure AD).

Microsoft has changed the Azure Portal UI over the years to include things like a “Secret ID” next to secret values, and there have always been other GUID identifiers in the mix in the form of tenant id’s and object id’s, which can be confusing for people when they go to the Azure Portal to retrieve the correct values for authentication for an app registration they are using for a specific use case.

While there are guides out there on Microsoft Learn that describe how to do this, people often do not find those, so I find that it is helpful to describe how to do this in my own words.

Caveats

These steps assume that there is an app registration waiting for you in the Azure Portal that you own.

(Many/most enterprises are likely restrict who can create app registrations, for good security reasons, so one would likely be created for you upon request to your admins.)

Steps

Here are the steps, illustrated with my test tenant:

Navigate to the Azure Portal at portal.azure.com and sign in with your company identity.
Search for “App registrations” in the search box at the top and click on the icon with the grid and the three-dimensional looking cube:
Go to the “Owned applications” tab:
Filter / select your desired app registration.
On the “Overview” tab, your client_id is in the “Application (client) ID” – it is NOT the “Object id” nor the “Directory (tenant) id.” (I’m belaboring this point because this is where a lot of people get tripped up.)
On the “Certificates & secrets” tab, you can generate a client_secret by clicking the “New client secret” button:
Once you’ve generated the new client_secret, the client_secret value is in the “Value” column – the client_secret is NOT the “Secret ID.” Also the “Secret ID” is NOT your client_id. The “Secret ID” is not used at all in authentication flows. (I’m belaboring this point because this is where a lot of people get tripped up.) Also, importantly, this client_secret is only shown once, so if you navigate away or close this page, you cannot get it back – be sure to capture it in a safe, company-approved place like a password manager for safe-keeping and later reference. (If you do close or navigate away from this page, when you come back, only the first three letters of the client_secret will be shown – this is to allow you to correlate which client_secret value goes with which listed client_secret in the portal.) Don’t worry, I deleted this client_secret shortly after I generated it, but in general, you should not share these client_secrets with anyone besides the developers that need to use it (and I encourage developers to use their own client_secret for local development purposes versus sharing a client_secret):

I may revisit this blog post to amend it to make it as clear as I can in the case that the instructions are not enough to get someone through this process, or in the event that Microsoft changes the UI for this.

Once you have a client_id and client_secret, you can use those to get short-lived JWT access_tokens, for use in calling API’s that utilize Entra ID authorization – calling the Entra ID token endpoint to issue those tokens will be the subject of a future blog post, because many people struggle with this, too!

Running SpinRite on Apple Silicon

Thu, 19 Dec 2024 00:00:00 +0000

I was trying to run SpinRite 6.1 on some drives from an old Drobo 5N NAS that bit the dust a few years back, but I didn’t have an x86 machine handy, so after a bit of research I found a very elegant way to do this with QEMU on macOS.

Allow me to provide a little more background for those who may not know what I’m talking about…

In my opinion, Steve Gibson’s SpinRite is the best hard drive maintenance and recovery utility out there, and the way you use it is by booting it from removable media (typically, a USB thumb drive, although in the distant past I remember running it off these things we call “CD’s”). The SpinRite program itself runs using FreeDOS.

As time has progressed, though, machines have moved to UEFI booting instead of BIOS, and on top of that many machines have moved away from x86 architectures to ARM architectures, like Apple Silicon on macs, and now increasingly, ARM on Windows, and given that SpinRite needs legacy BIOS boot methods and an x86 chipset to run, it’s becoming more challenging to get it going these days.

We’re looking at getting a new NAS in place at our house in the near future, and I had old drives from a prior NAS (a Drobo 5N that ceased to function several years back) which I wanted to dust off and run through SpinRite in anticipation of using them in a new NAS.

I recently did a purge of old devices and I no longer had an Intel/AMD x86/x64 machine handy anymore, and short of getting a small Intel compute stick or something, I wanted to see if I could somehow emulate what was necessary to have SpinRite 6.1 run on my M2 MacBook Pro.

After a little while of research and some back and forth with ChatGPT, I found the solution in the form of QEMU, which I installed via Homebrew:

brew install qemu

(I also looked at UTM which is another great app that wraps QEMU with a nice UI, but due to some quirks on macOS – primarily the fact that you can’t run it elevated with sudo to get the proper drive access since the app doesn’t work properly that way – I had to pivot.)

Now, in order to get a SpinRite 6.1 ISO file in the proper format for use with QEMU, I did have to temporarily boot up Windows 11 in Parallels and run spinrite.exe to create that image (BUT, see next, you don’t need to do this, there’s a simpler way to do this). This ISO conversion was just a one-time exercise, as I have the ISO stored in my OneDrive for later use.

However, for purposes of this blog post, I didn’t want folks to have to go through setting up a full Windows VM one time just to do the .img to .iso conversion, and though I tried several mac-native commands to try to get the conversion just right, none of them worked specifically for SpinRite.

So, I resorted to installing Wine via Homebrew and using that to run spinrite.exe to prove out creating the ISO in a less heavy way that doesn’t involve running a full Windows 11 VM. (Note that Wine is one of those apps that macOS is going to initially block for “safety” reasons, but since you hopefully know what you’re doing, you can go to Privacy & Security in Settings and click “Open anyway.”)

brew install wine-stable
wine spinrite.exe # Opens the SpinRite GUI app and lets you make an ISO; close once done

Before you go fire up your QEMU emulator, first you should insert your storage media you’re going to run SpinRite on, then eject it through Finder, but leave the physical drive inserted – yes, it needs to be not mounted to the host macOS operating system so that QEMU can attach to it. (Alternatively you can do diskutil list to see what your disk number is – in my case it was /dev/disk4 – then do diskutil unmount /dev/disk4 to unmount it.) If you forget to do this, QEMU will say qemu-system-x86_64: Could not open '/dev/disk4': Resource busy. Also note that after quitting QEMU, macOS will immediately re-mount the drive…

Finally, run QEMU (with sudo, which is necessary so it can get at the drive – as always do this with caution and only if you’re comfortable), providing the correct path to SpinRite.iso. Note that the ISO path is case-sensitive – beyond the obviousness of macOS being a case-sensitive file system, I say this because spinrite.exe will output SpinRite.iso with uppercase S and R, and that might trip someone up.

sudo qemu-system-x86_64 -drive file=/dev/disk4,format=raw -cdrom /Users/username/Desktop/SpinRite.iso -boot d

Voilà:

Hopefully, this saves someone out there some time from having to go through the same steps I did to figure this out.

As a caveat, if it wasn’t obvious, this solution does not work against your actual hard drive of your mac itself, but it at least is a good middle ground for servicing other drives that you may have.

As another caveat, SpinRite 6.1 may give a new message (as of the 6.1 version) about an inability to create an SRLOGS directory and write to it – I suspect that is due to the way it is being launched from a bootable read-only ISO image which is presented to the guest OS as a CD-ROM, and unlike a bootable USB drive, it cannot be written to. I’m used to running SpinRite 6.0 from the past sans logs, so I did not fret too much about not having those for this exercise. Steve Gibson is planning formal UEFI boot support for SpinRite 7 which should hopefully allow even booting from a mac, which would solve for this in the future.

Is this the most performant way to run SpinRite? Maybe not. I’m not certain of the performance hit that might occur going through the x86 translation layer of QEMU. But regardless, if utter performance is not a concern, (meaning, you’re maybe willing ot wait a little longer for SpinRite to process a drive), it certainly is a great fallback approach to having to seek out x86 hardware in an increasingly ARM-based world.

Happy spinning, and Happy Holidays! 💿

San Diego

Fri, 08 Nov 2024 00:00:00 +0000

All has been quiet on the blogging front, since life has been busy: on top of my wife and I moving houses in August, as part of the COO Leadership Development Program (LDP) at Autodesk I had the opportunity to travel to to Tokyo in May, met with my business challenge team in London in July, and experienced the culmination of our program with a presentation in beautiful San Diego before AU, the Design and Make conference.

At the beginning of our LDP program, one of our senior leaders Haresh Khoobchandani told us a parable of a zen master who instructed his student to “empty his cup” (metaphorically speaking) so that he could be open to learning new ideas — I found this to be a very apt way to put it, and was something I found to be very true of my personal leadership journey during this program.

I’d go as far as to say that I had to “unlearn” some things I had learned in the past, in order to learn anew.

Since the early days of my career, I’ve been largely a self-learner and have picked up many a book on software engineering and technology, and in recent years, many a book on technical leadership. I had little mentorship early on in my career, and like most people thrust into a “staff engineer” type of role, I had very little guidance to lean on — in recent years, I found solace in the writings of technical leadership books by the likes of Camille Fournier, Will Larson and others.

To suddenly find myself in a scenario where I wasn’t trying to teach myself leadership on my own, but had mentors in the form of our amazing leadership coach Geoff Scales, and our amazing internal leadership and coaches who supported the program, was not something I was used to, and as I went through the program, I realized it was an unprecedented opportunity.

To that end, one of the pieces of coaching that really resonated with me was: you’ll get out of the LDP program what you put into it. I was fortunate to have an organizational leadership chain that truly allowed me the space and time to put in a lot of energy into the LDP program, and the benefits will carry with me for a long time to come.

With that, I have a few personal takeaways I want touch on (even though it’s hard to whittle it down to just a few), both for my own reflection, and in the hopes that someone else out there reading this has one or more of these resonate with them.

I’m a perfectionist by nature, and it was abundantly clear in my 360 assessment I took for the LDP program that perfection was the strongest “Reactive” behavior I had.

Close on the heels of being a perfectionist is a struggle with balance (a “Creative” behavior, which you want to maximize) — when you do have perfectionist tendencies, it is also natural tendency to pour yourself into projects with undue amounts of time and energy that would otherwise be better spent.

Throughout the LDP program I learned that by leaning on the strengths of others, I could bring my own strengths to bear on the challenges that we were solving for, and that allowed me to let go of some of the perfectionist tendencies that I would normally have, which there in turn led to better balance; this was especially helpful given the nature of our business challenge for the program, which had no “perfect” answers to seek.

To wrap up my thoughts around LDP, I would quote the title of Michael Lopp’s book: “The Art of Leadership: Small Things, Done Well.” I am used to software engineering where there is a discrete set of activities that one can do to successfully build a given piece of software and deliver it to the world. Leadership is not always as clear-cut as that, but though it is not as “binary” and straight-forward as the 1’s and 0’s of the software world, there are a lot of small things we can do well in our leadership journey to have a big impact, and help build a positive culture in our organizations that we serve, in order to better realize both our own potential and the potential of the individuals around us.

Tokyo

Wed, 05 Jun 2024 00:00:00 +0000

I would be remiss if I wrote my next blog post without reflecting on a recent amazing and profound experience I had in Tokyo, as part of Autodesk’s COO Leadership Development Program.

While wandering around the neon lights of Kabukichō one evening (in the pouring rain mind you, making it feel like a scene out of Blade Runner), I stumbled upon a small establishment in the Golden Gai area named BAR COO, and seeing as how it shared the name of my parent division, I found it prudent of me to at least check it out.

I walked in to a scene of a small bar with five seats in it and one bartender, a kind middle-aged Japanese woman, and three other patrons, one of them local and the others a couple visiting from India.

What was interesting about this place is due to its small, intimate setting, it was very conducive to conversation – the woman tending bar was very interested in hearing everyone’s stories about where they were from and their reasons for visiting Japan, and you could tell this was one of the main reasons she does what she does, as the stories from visitors is what she thoroughly enjoyed.

That experience at BAR COO primed me for another set of intimate conversations I was going to have over the following three days, as part of the COO Leadership Development Program.

For three days, we didn’t talk about technology.

For three days, we didn’t talk about business. (Well, maybe a little.)

For three days, we talked about leadership.

I know some hardened technologists at this point likely instinctively reflexed at that last sentence, instantly putting up mental guardrails around the type of value they would derive from such an experience. I would be lying if I said I didn’t have my own doubts, and while I was open to this experience, I also had no idea what to expect going into it.

And that is why I was all the more delighted with unique opportunity of learning and growth that was to follow.

During the program we learned about what it means to be a leader in the year 2024. Organizations and the people that work for them are yearning for a new type of leadership, for leaders that are high on the emotional intelligence scale, who are curious, courageous, have conviction, and can connect with people, amongst many other aspects. We had opportunities throughout those three days to self-reflect and see what areas we have for our own personal growth and learning – and grow and learn we did.

On top of this, I was introduced to my wonderful LDP team, whom I’m going to be working alongside on a business case study challenge over the next few months, and with whom I’ll be able to share and enjoy a journey of learning and growth together.

Being a leader is not that much unlike being a bartender at BAR COO – as a leader, you need to understand where people have come from, and actively listen to them, in order to help them figure out where the want to go on their own journeys. To that end, you also need to understand yourself and where you have come from, in order to know where you are going, and what type of journey you want to have in your career, and in life.

A No-Nonsense Guide to Setting Up Python Environments

Mon, 29 Apr 2024 00:00:00 +0000

Python has become the lingua franca for developing various AI/ML solutions (and more), but getting started with setting up a proper Python environment can be tricky and involve several hours of research or asking colleagues how they go about it, and many “getting started with Python” guides tend to gloss over these aspects, which is why I hope in this post to give a no-nonsense guide to setting up Python environments.

Overview / TL;DR

The prescriptive setup outlined below involves installing the pyenv and pyenv-virtualenv equivalents for the respective operating system platform to manage global Python version installations and subsequent local Python virtual environments, respectively.

Why do you want to do this? The answer is: isolation – you ideally want a standalone Python environment, with its own installed packages, and avoid using a global Python environment and set of globally installed packages.

Credits and Caveats

My own internet research aside, I would like to credit my teammates who are experts in this area who have given me a lot of pointers and helpful opinons on what works well. My first exposure to Python was well over a decade ago in college with 2.7, before pyenv even existed, and thus I’ve been up-revving my mental models of Python environment management to match the current state of the world and how many developers prefer to set up their local Python dev environments.

As a result, this will hopefully be a distillation that should save you a few hours of searching around and reading and forming your own opinions, to give you an opinionated way to quickly get started with Python environments.

I will break things down between macOS, Linux, and Windows.

I will also say up-front that your preferred methodology may not match this, as you may be using different tools or techniques like containerization with Docker or tools like Conda or venv or pipenv or the standalone virtualenv tools (even if some of these don’t handle multiple Python versions); if you already have your own preferred way to set up Python environments, then this guide isn’t really for you. 🙂 (And I even reserve the right to change my mind later as my own practices and opinions evolve.)

Sidebar: Containerization is quite handy regardless, because quite often the target for where you’re going to run these Python workloads is a container-based one; but even then, having native Python environments on your native OS platform is still handy for flexibility and convenience, for instance for doing quick proofs-of-concept or for learning or simple scripts, or, maybe you’re developing a local Python application for distribution to a certain OS via something like pipx and you want to mimic that environment explicitly.

Before diving into the setup steps, note that it is often helpful to “practice” and try these things out first in a clean and temporary environment like a VM in Parallels on macOS or Hyper-V in Windows or some other virtualized environment other than your main host machine, so you can easily start over if things get a little weird.

macOS

On macOS, install Homebrew if you haven’t already – note that this also installs Xcode Command Line Tools, which is relevant in that installing Xcode Command Line Tools also installs a version of Python 3 at the time of this writing. (Apple notably stopped bundling Python 3 with the base operating system in favor of this approach.)

You more than likely want to be able to try newer versions than that which comes with the Xcode tools – for example, the version on my machine that came with XCode tools is 3.9.6, but the latest version available is 3.12.3.

One way to get a new version of Python is to install it via Homebrew directly, but you do not have to do it this way – instead you can install it via pyenv explicitly.

However, do note that the Python versions available via pyenv may lag slightly behind the latest available version that can be installed directly via Homebrew; for example, at the time of this writing, the latest version of Python available directly via Homebrew is 3.12.3, whereas via pyenv it’s 3.12.2. (According to the pyenv GitHub repo, 3.12.3 will be released via pyenv soon, so it will wind up being about a six month delay.)

Also note that other Homebrew packages may depend on a specific version of Python, and as a result may install versions of Python directly via Homebrew anyways, including packages like awscli and pipx. Despite this, you can still override your global Python version with pyenv later to be a desired one, in the case that the latest Python installed via Homebrew is a little too far ahead for your tastes, or if dependent tools are installing older versions and you want a newer version.

With that all said, here’s how you directly install a version of Python via Homebrew, noting that this will change the global python3 that is on your path away from the built-in system Python that ships with macOS Xcode tools – again, YOU DON’T HAVE TO DO THIS, this is just for illustration:

# ***YOU DON'T* HAVE TO DO THIS*** -- you can install the latest version of Python via pyenv later, but there may be a
# reason you want to get a version of Python installed directly via Homebrew, which is why it is touched on here.
brew install python@3.12

With that explanation and preamble out of the way, the remaining steps for macOS will describe the preferred path of using pyenv and pyenv-virtualenv.

First install pyenv:

# Install pyenv
brew install pyenv

And follow the instructions for Set up your shell environment for Pyenv.

Next, install pyenv-virtualenv:

# Install pyenv-virtualenv
brew install pyenv-virtualenv

And follow the instructions for adding the line to your .rc file for your shell to automatically activate Python virtual environments when you change into a given directory that contains a virtual environment.

Now you can install global versions of Python with pyenv and create local virtual environments in a given directory with pyenv-virtualenv.

With that in place, let’s say you have a repo at ~/git/my-repo and want to create a virtual Python environment there with the latest 3.11 patch version of Python – this is how you would do it:

# Install latest patch version of Python 3.11 globally, which at the time of this writing is 3.11.9
pyenv install 3.11

# Change to repo directory
cd ~/git/my-repo

# Create a named Python virtual environment based on 3.11.9
pyenv virtualenv 3.11.9 my-repo-virtualenv-3.11.9

# Set that virtual environment to be used for the local Python version for your repo directory
pyenv local my-repo-virtualenv-3.11.9

If you’ve followed the steps for adding lines to your shell’s .rc file for automatic activation of Python virtual environments linked above, when entering this directory the virtual environment should automatically activate – else, you need to manually activate it with the following:

# Change to repo directory
cd ~/git/my-repo

# Activate the virtual environment
pyenv activate

# Deactivate when you are done
pyenv deactivate

Linux

The process for Linux distributions is going to feel awfully similar to macOS, with the exception that different package managers are involved (apt or yum instead of brew), and will vary slightly based on the Linux distribution you’re working with.

With that said, here is an example of getting started in an Ubuntu environment:

# Ensure you have installed the requisite Python build dependencies for pyenv to function properly
# See here and adjust accordingly for your Linux distribution:
# https://github.com/pyenv/pyenv/wiki#suggested-build-environment
sudo apt update; sudo apt install build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

# Ensure you have Git installed or this next step won't work
sudo apt install git

# Run the pyenv installer
# Note: on Linux this also installs pyenv-virtualenv
curl https://pyenv.run | bash

# Add the following to your desired .rc file for your shell to auto-activate pyenv and pyenv-virtualenv.
# Bash RC example below -- for further steps on adding this to your .bash_profile for non-login shells, or for steps
# for other shells like zsh or fish, see:
# https://github.com/pyenv/pyenv?tab=readme-ov-file#set-up-your-shell-environment-for-pyenv
echo -e 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo -e '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo -e 'eval "$(pyenv init -)"' >> ~/.bashrc
echo -e 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc

# Install a recent version of Python, latest of 3.12 as of this writing is 3.12.3
pyenv install 3.12

# Create a pretend directory for a repo and set your location to it
mkdir -p ~/git/my-repo
cd ~/git/my-repo

# Create a virtualenv and set it as the virtualenv for the local directory
pyenv virtualenv 3.12.3 my-repo-virtualenv-3.12.3
pyenv local my-repo-virtualenv-3.12.3

Windows

To be totally honest, getting Python going on Windows is the weirdest of the bunch, and because of that you may want to consider using Windows Subsystem for Linux (in which case, follow the steps above for your desired Linux distribution in WSL); but in any case, the steps are below if you choose to pursue creating Python virtual environments on Windows.

First of all, install Chocolatey if you haven’t already. (Sorry WinGet fans, I couldn’t find the desired packages with that tool…)

Then install pyenv-win (with the caveat that you need to run this step as Administrator):

# Install pyenv-win, need to run this as administrator
choco install pyenv-win -y

Next install pyenv-win-venv – note that you may need to set your execution policy in PowerShell on a new machine to allow scripts to run, in order to install this. (For those who think that the PowerShell execution policy is a security boundary, see this to correct your thinking, and also understand that on macOS and Linux there is no direct equivalent execution policy for bash shell scripts so… yeah.)

Also note that you may need to create a PowerShell profile file if it doesn’t exist already for auto activation when opening a terminal in a given directory (which, there are huge caveats with auto-activation on Windows, see the code steps further below). Also note that if you’re using the built-in Windows PowerShell, if you want this to work in the newer PowerShell versions you’ll need to update the profile for that shell as well (or vice versa, if you do this in newer PowerShell you may need to update your profile in old Windows PowerShell, too).

# Set the execution policy to Bypass if needed -- run this as administrator
Set-ExecutionPolicy -ExecutionPolicy Bypass

# Run the next steps as a normal user

# Install pyenv-win-venv
# Note at the time of this writing there is a quirk where the install script might actually report failure when it
# actually succeeded -- I've submitted a PR and related issue to resolve this:
# https://github.com/pyenv-win/pyenv-win-venv/pull/28
Invoke-WebRequest -UseBasicParsing -Uri "https://raw.githubusercontent.com/pyenv-win/pyenv-win-venv/main/bin/install-pyenv-win-venv.ps1" -OutFile "$HOME\install-pyenv-win-venv.ps1"
& "$HOME\install-pyenv-win-venv.ps1"

# To auto-activate environments, add this line to your PowerShell profile.
# First, Ensure your PowerShell profile file exists first
if (-not (Test-Path -Path $PROFILE)) {
    New-Item -Path $PROFILE -Force
}

# Then, add the init line.
Add-Content -Path $PROFILE -Value 'pyenv-venv init'

# This is an undocumented quirk -- need to set these environment variables for pyenv-win-venv to work, then
# restart the terminal.
[System.Environment]::SetEnvironmentVariable('PYENV_HOME', $env:USERPROFILE + '\.pyenv\pyenv-win', 'User')

# pyenv-win is pretty out-of-date as of this writing, and unlike the macOS and Linux versions, comes with an update
# command, so update pyenv to get latest available versions.
pyenv update

# Then install a recent global Python version
pyenv install 3.12.3

# Create a pretend directory for a repo and set your location to it
mkdir -p ~/git/my-repo
cd ~/git/my-repo

# NOTE: Before doing this next step, you may need to disable python3.exe *and* python.exe from "App Execution Aliases"
# on Windows, see:
# https://stackoverflow.com/questions/65348890/python-was-not-found-run-without-arguments-to-install-from-the-microsoft-store

# Create a virtualenv and set it as the virtualenv for the local directory
pyenv-venv install 3.12.3 my-repo-virtualenv-3.12.3
pyenv-venv local my-repo-virtualenv-3.12.3

# NOTE that pyenv-win is currently limited and does *not* support auto-activation via `cd` or changing into the
# directory, it only supports auto-activation when you open a shell directly in the given directly via Explorer. See:
# https://github.com/pyenv-win/pyenv-win-venv/issues/12
# If you follow the steps above, you can activate a virtualenv at any time in a given directory by running
pyenv-venv init

# Or
pyenv-venv activate my-repo-virtualenv-3.12.3

# And to deactivate:
pyenv-venv deactivate

Bonus: Python Dev Container with Visual Studio Code

I mentioned earlier in the blog post about the containerization approach, and I wanted to give honorable mention to the Dev Containers feature of Visual Studio Code, which allows you to do local development from within a local container (or, even a remote container with the GitHub Codespaces feature).

The container image you use can be configured in your devcontainer.json file for your repo, and come from either a public container registry or even an internal container registry for your organization.

There is a good starting point devcontainer.json file and sibling DOCKERFILE and supporting scripts in a sample repo of devcontainer setups from GitHub, here.

In the future I may update this blog post with more content on how to get this approach going, but the README.md from the aforementioned repo does have really good documentation and steps to get this going.

On top of WSL as an option on Windows, this Visual Studio Code Dev Container approach is also a very good option to use, in lieu of the quirks of using pyenv-win and pyenv-win-venv on Windows; and not only that, this approach also works on every OS platform out there, assuming you use Visual Studio Code as your IDE.

Threading the Needle

Wed, 20 Mar 2024 00:00:00 +0000

Back in the early part of my career, I found myself thrust into an environment that in retrospect was entirely predicated upon a waterfall design methodology. We would do big design up front, in excruciating detail, often without understanding the implications of what was even feasible, with the usual overruns that would happen in terms of time and money to go along with that.

Of course, this way of working seems silly today, but back then in that environment, when we were still dealing with the very real constraints of data centers, it was somewhat the reality, as the activation energy it took to get even a development virtual machine going on limited hardware to prove out some theories was very high.

I distinctly remember going out of my way with my own approaches to setting up local VMs to prove things out. I didn’t understand it at the time, but looking back this was my gut instinct and natural reaction to de-risk situations so that I knew that what we had on paper would line up with reality. It also sparked my interest in infrastructure automation, so I could do this easily on a repeatable basis.

Fast forward to today, and there are so many more options for doing experiments and proofs of concept. Between the cloud providers and SaaS services, many of which offer various free trial options or perpetually free sandboxes, there are numerous ways to spin up even short lived ephemeral environments to test theories and reinforce approaches.

Coming up with solid system designs is a little bit of luck and heuristics and leaning on past experiences, but to truly “thread the needle” and wind up with resilient designs that are not only achievable but also desirable, and that meet both your technological and experience goals (amongst others), I believe that experimentation and proofs of concept are essential.

Too often I still see, even in this day and age, a desire to design on paper and “hope for the best” and leave things to fate, to some degree. In my experience, the devil is still truly in the details, and while that first 90% of your design on paper may go just fine, it is that last often unexpected 10% that you wind up toiling over at the worst time in a project cycle, near the end… Interestingly, that last 10% can often be not a technical constraint, but a human one: sure, your design meets all the requirements, but it turns out it creates a terrible UX for the end user in some unacceptable way to stakeholders, be that an engineer or business user at your company, or even an external customer.

In order to deliver high risk systems, you need to spend time in low risk environments.

You often don’t even need to spend time implementing the whole system in those low risk environments — often just building select parts of a system can help answer key open questions that keeps everyone comfortable in the direction of the project.

Spending more time in experimentation and proofs of concept not only helps to de-risk your technology decisions, it can also help to ensure that you wind up in a more strategic destination that not only meets your technology goals, but your stakeholder experience expectations as well.

I Have No PATs

Fri, 01 Mar 2024 00:00:00 +0000

I have pushed the content of this blog post to GitHub Pages with zero use of a personal access token (fine-grained or otherwise) or even an SSH key.

How is this possible?

The answer is Git Credential Manager.

I’ve been using GCM for several years now, and its convenience and security deserves more hype.

What I love about GCM is how it seamlessly and securely authenticates me via OAuth in a browser using a trusted GitHub app, and subsequently retrieves an OAuth access token, scoped with least privilege permissions, which is securely stored in my macOS Keychain (or in Credential Manager on Windows), with access control set exclusively to the git-credential-manager application, all without me having to ever handle copying around a credential manually.

For those running GitHub Enterprise Server, it even works there with the latest versions, as well as with many other source control providers like Bitbucket, GitLab, and more.

Security professionals should take note here, as the use of OAuth and the authentication brokering and credential storage approaches, coupled with a broader Zero Trust posture and controls for conditional access for your organization, do really help to advance the cause of protecting these credentials, and by extension the source control systems to which they provide access – this blog post from GitHub goes into more depth about this.

If you’re still setting up your Git credentials via existing methods, I highly recommend taking a look at Git Credential Manager to make the setup process for Git authentication easier, seamless, and more secure.

AI is Frosting on the Data Cake

Thu, 01 Feb 2024 00:00:00 +0000

Chefs have a saying: mise en place, or “putting in place,” which means to organize your ingredients and tools before cooking begins, and not just in any generic way, but in a specific ways that allows for the maximization of efficiency and movement for the individual so they can produce dishes at a desired cadence and speed.

This is a great analogy and philosophy for data in an enterprise as well, as data’s proper placement and movement is key to “baking” the right data architecture “cake” to facilitate the ever-so-craved AI/ML “frosting” that can be piped atop.

The GPT models alone and products like ChatGPT have created a lot of hype (rightfully so) for the abilities of these very capable technologies – but as the hype of the models themselves wanes a bit, there lies a very real challenge for all of us: the hard data engineering work that is involved to take your company’s data and make meaningful organization and use of it, in conjunction with these models and other forms of AI.

I would venture to guess that many organizations have put off some form of rigorous data engineering, data architecture, and data platform efforts in their ecosystems for quite some time. In the past, AI/ML was unapproachable to many, in large part due to the mountainous effort that is wrangling your organization’s data in a way that it can be consumed by ML (not to mention the scarcity of the highly sought after skills of AI/ML practitioners that logically followed that). The advent of various generative AI technologies has re-ignited the interest of AI to many organizations, and as a result there has never been a better time to dust off those data initiatives and get them going to enrich your AI efforts.

You will only be as successful as you can be with AI, to a point, unless you get your data house in order. (Or, to keep the analogy going: unless you get your your data kitchen in order.)

In that vein, one key thing does need to be pointed out: if data is the ingredients for this bake, realize that not all ingredients are “fresh.” Data can not only be non-authoritative, but it also has a “best by” date that you need to be aware of, and therefore tossing all your possible data ingredients into a given bake is going to lead to some weird-tasting results.

To that end, it’s really important to think about which data is ripe for the plucking, and come up with systems to ensure freshness and evict stale items, or at a minimum provide some heuristics or approaches to signal to downstream data pipelines when certain items shouldn’t be incorporated into the mix.

This freshness problem has interesting ramifications when it comes to training models: if you train a model on data that becomes stale, what are the consequences of that? Will it entail retraining? Can you train over what’s been previously trained? These are important considerations to take into account.

If data is the ingredients, then data pipelines, lakes, and platforms (and more) are the tools, and following that you should start to think about the overall data architecture at your company – that is, the makeup of data sources, and data lake(s) and/or platform(s); and downstream analytics, AI/ML training, and reverse ETL to other potential data products and use cases, like for example vector search indices for knowledge retrieval applications using the retrieval-augmented generation (RAG) pattern.

It’s not possible to get into the depths and nuance of thinking about the entire data engineering space in one blog post, so for those interested, a very interesting read for those looking to get a grasp of the data engineering landscape is Fundamentals of Data Engineering.

There is no one size fits all for data architecture. Below is a non-complete set of questions you could ask about your own organization to get a sense of what types of data architectures for different organizations make sense:

How small or large is your company?
How many customers do you have?
How much budget do you have to spend on data engineering solutions?
Do you sell physical goods or services, or purely online services?
How long has your company been around?
How many products do you sell?
Is your set of technologies that make up your products largely homogenous, or more of a heterogenous mix?
What do your company’s existing skill sets look like?

All of these factors have an impact on your scale and complexity and overall technology ecosystem, and by extension your data ecosystem, and any resulting data architectures that may result.