Yashish Dua


5 tips for failing in engineering Infrastructure

5 tips for failing in engineering Infrastructure

Subscribe to my newsletter and never miss my upcoming articles

Listen to this article

Each tip is capable of bringing a cultural change in your infrastructure team.

None of these would include a tip to keep backups of databases.

You don’t wanna do this!

1. Infrastructure team’s job is to provide access

Really? Old times. Infrastructure teams should no longer be responsible to provide access, rather should federate access.

Every organization has been in a phase, where its infrastructure team would get 100s of support tickets a day to provide access to something or another. This restriction of access is indeed necessary to avoid creating chaos in production systems, which could be due to a lack of knowledge while making a change, or it could be an intern (?). However, the responsibility of access control should be distributed, and federated.

As an example, every team’s T.L/Manager should have access to grant access to their team’s engineers, and hence no involvement of a central body like Infrastructure team. The objectives of the infrastructure team here are to make sure the onboarded T.L/Manager has the right set of access, and the compliance systems are in place that automatically checks for the activities done using the given access.

2. Always modify servers by SSHing into them

Consider, your production server has a bug. You SSHed into your machine, modified the configuration or code, and boom you fixed it, but you created something which is known as a snowflake server, too fragile and brittle! No one knows what was changed and when. Imagine the amount of time and effort it will take to apply the same change over 100s of instances.

3. Never use Version Control Systems to store configuration/manifests

Adoption of VCS, to collaborate and version control the application layer code has been widely accepted. Unfortunately, most of the infrastructure teams are not using VCS to store the configurations, manifests or infrastructure bounded entities that are outside the application layer. Usually, all these pieces lie on the cloud provider interface. Since there is no review process, and version control, it becomes easy to break the production (of course unintentionally or not?).

Also, at any given point of time painting the full state of infrastructure becomes really tough which makes it hard to make decisions on root cause analysis of bugs/incidents, security forensics, etc.

4. Infrastructure projects don’t need a good DX

DX — Developer Experience, the most neglected part of any organization. As the primary motive of an organization is to earn money, the user experience (UX) of the end consumers is the top priority. Developers, designers, managers, etc all spend their days and nights to make this end-user experience is seamless. What about the user experience and productivity of those who are working hard to ship that experience? They are humans too.

The major portion of the entire engineering team productively is dependent on the Infrastructure team. The tools or projects built by Infrastructure enable everyone to build, ship, and deploy fast & effortlessly. It’s the responsibility of the Infrastructure team to provide a top-notch developer experience to all their consumers (engineering team) to increase developer velocity effectively.

5. Infrastructure tests should not be a part of CI

As of 2020, every developer (and organization) knows the importance of automated testing — unit, integration, smoke, etc. Usually, this is a part of the project’s source code and runs in a CI pipeline. Interestingly, all infrastructure and security related tests are done manually once the code is deployed to staging (or whatever environment you use). This increases the probability of bugs and blast radius. Also, increasing unnecessary and redundant human efforts.

CI/CD pipelines should include Infrastructure tests. These tests can contain the creation of vulnerability reports in the source code’s dependencies, reliability score of service by actually running it in a simulated chaos environment, etc.

All these tips are highly consolidated and are capable of a new blog per tip. In brief, I have tried to bring up the capabilities and responsibilities of an Infrastructure team. Even if you are intrigued by the concepts, I have done my part.

Actively sharing my developer life on Instagram and concepts on Youtube.

Share this
Proudly part of