Safer Software with Cautious Code

5 min readSep 22, 2016

Working in an software organization, engineers write libraries that other people use. Unlike the real world, software abstracts the risk to a point where it might look safe. $10,000,000 in physical cash will not burst into flames if an algorithmic trader mistypes a line; a few numbers will just move around. A lot of work has gone into best practices for safe software, but often times we forget about the danger in our day-to-day work.

Aircraft manufacturing implements special procedures. Strict protocols like making sure every tool is accounted for promote proactive mindfulness. A missing bolt can cause hours of searching. Multiple inspections along the way also enforce a high standard. From the beginning, the industry trains and tunes systems to prevent accidents. Both bottom up and top down methodologies shape how things get built.

Learning from Industry

Layers of mindfulness surround a plane’s build process.

At the design stage, engineers work to make it easier to avoid errors. Interviews with pilots and instrument analysis informs the design.
Mechanics have procedures when putting together the plane. The protocols remind everyone of the serious consequences.
During flight, the plane has backup systems. The pilots have checklists which force acknowledgement of danger.

Software is pretty good at the last two. We code review as we are building and we even have safeguards and alarms using static analysis and linting. After a long day, a code review might not get the attention it deserves. This makes the first point even more important.

Code controlling financial and personal data needs the same attention to detail as physical systems. One end of the spectrum is NASA with lengthy code reviews and specifications. The opposite end ships code to production without review. Most teams lie somewhere in the middle. It’s often unclear where you need to be on the spectrum. As a team grows or the mission changes, revaluate the level of risk of the code. In an early-stage 2 person team, everyone knows the code and how to use it. A 100 person team needs more coordination and communication.

Boeing uses cognitive engineering to reduce risk in jet cockpits by designing systems that minimize errors. Designers work with pilots to figure out what controls or systems cause errors. Software engineers must do the same with their code. Figure out from the people using your code where the problems are. Iterate on the user experience of the interfaces until the implementation matches the intension.

Identify the problem

Data leaks are detrimental to organizations. Fragile code often hides in plain sight. A misplaced parameter can wreak havoc. Changing the interface into the code removes some of that risk. Understand the problems with the current interface before you build a better interface.

Databases return all possible information given the constraints you place on them. By default a SQL statement dumps out all its rows.

SELECT * FROM secret_documents;

Only after adding constraints does the dataset get reduced.

SELECT * FROM secret_documents WHERE id = 1;

Traditionally, we access data in this way. Start with everything and reduce it. The above has its equivalent in Django (as with most ORMs based around SQL). You start with all the objects and then filter them down.

SecretDocument.objects.filter(id=1)

This makes sense for some things, but doesn’t make sense for sensitive data.

Spilling Secrets

In those simple examples, it is hard to see how easy information can leak. Imagine an organization with an app that keeps secrets. The main model SecretDocuments corresponds with a user. Documents from one user can never mix with another.

An expected piece of code would involve getting a document based on a primary key.

def get_doc(doc_pk):
    return SecretDocument.objects.get(pk=doc_pk)

The SecretDocument is queried based on the document PK (primary key). This seems like a harmless function, but that query is a potential disaster.

The function is relying on doc_pk to be a primary key. Most databases all share the same primary key type. The most common is an integer. But, an integer from the User id column looks like the integer from the SecretDocument id column.

Unclear variable names can cause a User PK to sneak into a function expecting a SecretDocument PK. An innocent mistake with big consequences.

Say what you mean and mean what you say.

Lets make one tiny tweak to our function.

def get_doc(doc_pk, user_pk): 
    return SecretDocument.objects.get(pk=doc_pk, user_id=user_pk)

If documents always have an associated user, the function needs to reflect that. It still takes integers, but now avoids a single point of failure. The original intention was for a SecretDocument to only be accessed by its user. The function now expresses the structural intent we decided on as an organization. All documents must be accessed in the context of a user.

Understand the data model you want and craft your code accordingly.

Enforcement

How does one enforce correct data access? A top down approach would be to have analysis tools to catch misuse of queries. Alarms can make sure only authorized users can access the data. ORMs behavior can change to throw exceptions if certain fields are not constrained.

Those precautions are nice and often necessary, but do nothing to prevent the mistakes. The interface into the data embodies the intention. If you are doing something dangerous, it must be obvious.

Removing the potential for errors

In our example, only certain users can view certain documents. Django by default gives you the loaded gun, objects. Luckily, we have control over our codebase. Instead of being OK with the risk, we can change the semantics to not use objects at all.

user.secret_documents.get(pk=doc_pk) SecretDocument.byUser(user_pk).get(pk=doc_pk)
SecretDocument.uNsAfE_SeLeCt_AlL_ThE_ObJeCtS.get(pk=note_pk)# Add user context to related lookups
document.notes(user_pk).all()

A request for all the rows is crystal clear. Adding friction by renaming objects to the intentionally ugly uNsAfE_SeLeCt_AlL_ThE_ObJeCtS prevents misuse and helps people identify the danger later. It will catch the eye of a reviewer. The example demonstrates the extreme side and is a bit silly. A simpler unsafe_select_all can accomplish the same goal. A production example can be found in the ReactJS project using the variable name __SECRET_INTERNALS_DO_NOT_USE_OR_YOU_WILL_BE_FIRED.

Engineers want compact and clear code. When deep in a project, it is tough to see the line where compactness becomes unintelligible. Don’t let efficiency trump clarity. Semantics of the code inform how to use and understand it.

This principle applies to things other than security. In a new feature, SecretDocuments become versioned by date. The entire application runs on the principle of rewinding the state of all the documents to a point in time. The interface to our data extends to accommodate this.

SecretDocument.byUser(user_pk, date=yesterday).all()

The code naturally expresses the meaning of the data. Get the documents from this user at this date. Our primary and default methods for accessing those documents need to reflect the model.

More hardened query level permissions can be enforced by passing in the web request.

SecretDocument.byUser(user_pk, request).get(pk=doc_pk)

Conclusion

Every language has great packages which use the advantages of the language to produce great interfaces. Trying to port a library from Java to Python without taking this into consideration, will result in code that promotes mistakes.

Software engineers can take cues from industry to reduce the potential for failure. It’s up to your team to decide what end of the spectrum you should be on. Identifying what is causing the problem is an important first step. Once aware of the danger, teams can adjust their practices to handle this risk. Safeguards and fallbacks are critical, but hopefully those things never get used. Code should write itself and the proper safe way to interface with a library needs to be obvious. Don’t let theoretical efficiency come before practical safety. Clear interfaces naturally promote goals of privacy and safety. Prevention is the best cure.