Why and how GitHub uses ActiveRecord::Encryption to encrypt sensitive database columns

you might know that GitHub Encrypt your source code at restHowever, you may not know that the Ruby on Rails monolith also encrypts sensitive database columns. This is done to provide additional defense in depth to mitigate concerns such as:

  • Reading or tampering with sensitive fields if the database is improperly accessed
  • Accidentally revealing sensitive data in logs


Until recently it used an internal library called Encrypted Attributes. A GitHub developer declares that a column should be encrypted using the API. ActiveRecord::Encryption:

class PersonalAccessToken
  encrypted_attribute :encrypted_token, :plaintext_token

Given that there are existing implementations, why the column ActiveRecord::EncryptionOur main motivation was to prevent developers from having to learn GitHub-specific patterns to encrypt sensitive data.

We believe that using familiar and intuitive patterns will lead to better adoption of our security tools and, in turn, better security for our users.

Besides exposing some of the underlying encryption implementation details, this API did not provide an easy way for developers to encrypt existing columns. The internal library required each new database column to generate a separate encryption key and store it in a secure environment variable configuration. This created a bottleneck as most developers weren’t doing encryption every day and needed support from the security team to make any changes.

at the time of assessment ActiveRecord::Encryption, was particularly interested in usability for developers. We wanted the developer to be able to write his one line of code. Their columns magically took off regardless of whether they were plain text before or they were using the previous solution. ActiveRecord::EncryptionThe final API looks like this:

class PersonalAccessToken
  encrypts :token

This API is exactly the same as the one used in the legacy API. ActiveRecord::Encryption It hides all the complexity of making it work at GitHub scale.

how i implemented this

as part of the implementation ActiveRecord::EncryptionIn building Monolith, we worked with our architecture and infrastructure teams to ensure that the solution met GitHub’s scalability and security requirements. Below is a quick list of some of the customizations we made to adapt our implementation to our infrastructure.

As always, there are certain nuances to consider when modifying an existing encryption implementation, and it is always recommended to review new encryption code with your security team.

Figure 1: Key access and derivation flow for ActiveRecord::Encryption implementation on GitHub

Secure primary key storage

By default Rails uses the built-in credentials.yml.enc File for securely storing primary keys and static salts used to derive column encryption keys ActiveRecord::Encryption.

GitHub’s key management strategy ActiveRecord::Encryption It differs from the Rails default in two main ways. Deriving a separate key for each column and storing the key in a centralized secret management system.

Deriving per-column keys from a single primary key

As explained above, one of the goals of this migration was to eliminate team bottlenecks by manually managing keys. However, I wanted to preserve the security properties of individual keys. Thankfully, cryptography experts have created a primitive known as the Key Derivation Function (KDF) for this purpose. These functions take (roughly) three important parameters: a primary key, a unique salt, and a string called “info” by specification.

Our salt is simply the table name, an underscore, and the attribute name. So the salt of “PersonalAccessTokens#token” is “personal_access_tokens_token”. This ensures that the key is different for each column.

According to the specifications of ActiveRecord::Encryption The algorithm (AES256-GCM) must be careful not to encrypt too many values ​​with the same key (to avoid Nonce reuse). Use the “info” string parameter to automatically change the key for each column at least once a year. therefore, information Enter the current year as Nonce Deriving key.

Applications that configure GitHub store secrets Hashicorp Vault. To comply with this existing pattern, we needed to get the primary key from the Vault instead of the credentials.yml.enc file. To accommodate this, I created a custom key provider that behaves similarly to the default. DerivedSecretKeyProviderRetrieve the key from Vault and use KDF to retrieve the key (see Figure 1).

Make new behavior the default

One of our team’s key principles is that the solutions we develop should be intuitive and not require implementation knowledge on the part of the product developer. ActiveRecord::Encryption includes the ability to customize the Encryptor used to encrypt data in specific columns. This feature allows developers to optionally use the above strategy, but to make it the default for monoliths, encrypts model helper Automatically selects the appropriate GitHub-specific key provider for the user.

def self.encrypts(*attributes, key_provider: nil, previous: nil, **options)
      # snip: ensure only one attribute is passed
# ...

    # pull out the sole attribute
    attribute = attributes.sole

      # snip: ensure if a key provider is passed, that it is a GitHubKeyProvider
      # ...

    # If no key provider is set, instantiate one
    kp = key_provider || GitHub::Encryption::GitHubKeyProvider.new(table: table_name.to_sym, attribute: attribute)

      # snip: logic to ensure previous encryption formats and plaintext are supported for smooth transition (see part 2)
      # github_previous = ...

    # call to rails encryption
    super(attribute, key_provider: kp, previous: github_previous, **options)

We currently only offer this API to developers working in-house. github.com code base. I’m experimenting with upstreaming this strategy when using the library. ActiveRecord::Encryption By replacing the per-class encryption scheme with a per-column encryption scheme.

Disable compression by default

Compressing the value before encryption can reveal information about the contents of the value. For example, a value with more repeated bytes, such as ‘abcabcabc’, compresses better than a string of the same length, such as ‘abcdefghi’. Besides the general cryptographic property that ciphertexts generally expose length, this exposes additional information about the entropy (randomness) of the underlying plaintext.

ActiveRecord::Encryption compresses data by default for storage efficiency, but due to the relatively small value to encrypt, we didn’t feel this trade-off was worth it for our use case. This is why we replaced the default of compressing values ​​before encryption with a flag that makes compression optional.

Moving to a new cryptographic standard: the hard part

This post presents some of the design decisions and trade-offs I encountered when choosing ActiveRecord::Encryption, but I’m not sure what to do to guide developers of existing applications to start encrypting columns. is not enough information. In the next post in this series, I’ll show you how we handled the hard part: upgrading existing columns in your application from plaintext or another encryption standard.

https://github.blog/2022-10-26-why-and-how-github-encrypts-sensitive-database-columns-using-activerecordencryption/ Why and how GitHub uses ActiveRecord::Encryption to encrypt sensitive database columns

Show More
Back to top button