Working with JSON Data in Terraform

18 Apr, 2023

TL;DR

Complex data should always be kept separate from your Terraform Code to allow better management. Also, JSON is a well-suited Data Interchange format for that purpose given it's widespread use and native support in Terraform.

Problem Statement

As a best practice, configuration logic and data should be kept separately wherever possible in Software Engineering. For Terraform, this usually involves Input Variables.

As this works fine for scalar values like strings and numbers, more complex data requires the use of objects and is better implemented common Data Interchange formats for the reasons layed out in this article.

Scenario

In the given scenario, it was required to configure multiple scheduled downtimes in Datadog. The Terraform provider supports the feature natively, but iterating over a list of objects opens a set of possible solutions in Terraform.

Setting up the provider is not part of this post, but documented reasonably well within the Terraform Registry Provider Documentation.

Level of Abstraction

One-by-One

Using the lowest level of abstraction, it is possible to create a resource per downtime as shown below:

resource "datadog_downtime" "downtime_1" {
  scope      = ["host:vm1"]
  start      = 1681787940
  end        = 1681790400
  timezone   = "Europe/Berlin"
  recurrence {
    type   = "days"
    period = 1
  }
}

resource "datadog_downtime" "downtime_2" {
  scope      = ["host:vm1"]
  start      = 1681787940
  end        = 1681790400
  timezone   = "Europe/Berlin"
  recurrence {
    type   = "days"
    period = 1
  }
}

resource "datadog_downtime" "downtime_3" {
  scope      = ["host:vm1"]
  start      = 1681787940
  end        = 1681790400
  timezone   = "Europe/Berlin"
  recurrence {
    type   = "days"
    period = 1
  }
}

This violates both the principle of separating data and logic as well as the Principle of "Don't Repeat Yourself" and should be avoided.

Looping through internal Data

In this instance, we imagine a variable construct as shown below:

variable "downtimes" {
  type = list(object(
    {
      scope     = string
      start     = number
      end       = number
      timezone  = string
      type      = string
      period    = number
    }
  ))
}

locals {
    downtimes= flatten([
    for downtime in var.downtimes : {
      scope     = downtime.scope
      start     = downtime.start
      end       = downtime.end
      timezone  = downtime.timezone
      type      = downtime.type
      period    = downtime.period
    }
  ])
}


resource "datadog_downtime" "downtime_from_json" {
  for_each = {
    for downtime in local.downtime : "${downtime.scope}" => downtime
  }
  scope      = [each.value.scope]
  start = each.value.start
  end = each.value.end
  timezone = each.value.timezone
  recurrence {
    type = each.value.reccurance_type
    period = each.value.reccurance_period
  }
}

Values will be delivered in a variable file called downtimes.auto.tfvars right next to the Terraform code with the following content:

downtimes = [
    {
        end = 1681740000
        scope = ["host:vm1"]
        reccurance_period = 1
        reccurance_type = "days"
        start = 1681736400
        timezone = "Europe/Berlin"
    },
    {
        end = 1681740000
        scope = ["host:vm2"]
        reccurance_period = 1
        reccurance_type = "days"
        start = 1681736400
        timezone = "Europe/Berlin"
    }
]

The data is still internal as it is still in HCL syntax and stored next to the the configuration code and therefore managed within the same repository potentially.

Looping through external Data

Externalizing the data is conducted using JSON files that have been created per downtime in a directory that can potentially be stored anywhere.

The JSON file schema has been defined intuitively for this example as shown below:

{
    "name": "vm1_5am_db_restore",
    "reccurance_type": "days",
    "reccurance_period": 1,
    "scope": "host:vm1",
    "start": 1681787940,
    "end": 1681790400,
    "timezone": "Europe/Berlin"
}

NOTE: In pratice, it is best to use an existing schema like the one that is used by the application's API when running GET methods. This makes it easy to generate IaC code from As-Is-Configuration.

In this instance, there is just a single variable indicating the filesystem path to look for JSON files that are then parsed to Terraform objects:

variable "downtime_file_path" {
  type = string
}

locals {
  scheduled_downtimes_fileset = fileset(var.downtime_file_path, "*.json")
  scheduled_downtimes = flatten([
    for scheduled_downtime_file in local.scheduled_downtimes_fileset : {
      downtime = jsondecode(file(scheduled_downtime_file))
    }
  ])
}

resource "datadog_downtime" "downtime_from_json" {
  for_each = {for downtime_config in local.scheduled_downtimes.*.downtime : downtime_config.name => downtime_config }
  scope      = [each.value.scope]
  start = each.value.start
  end = each.value.end
  timezone = each.value.timezone
  recurrence {
    type = each.value.reccurance_type
    period = each.value.reccurance_period
  }
}

By using a single variable for the filesystem location, managing variables becomes easier than handling complex variable types in CI/CD-integrations and the data can be managed by any system or stakeholder without ever touching Terraform.

Advantages of external Data

Validation

Validating common Data Interchange formats is a solved problem in Software Engineering and can both be done syntactically using online validation tools like JSON Online Validator and Formatter - JSON Lint or semantically by using tools like Open Policy Agent.

Therefore, it is trivial to add validation to CI/CD pipelines hardening the deployments, but that's a question for a later post.

Cross-Platform

In this context, cross-platform does not relate to the underlying Operating System, but the technologies used to create, manage and delete the data.

By using a Data Interchange format like JSON, all operations can be done on the same data using different paradigms and tools. Just imagine a PowerShell script and a hypothetic module used to schedule Datadog downtimes and think of the following snippet:

# Browse into directory
Set-Location ./scheduled_downtimes

# Parse all JSON Files
$downtimeArr = get-ChildItem  | foreach-Object {Get-Content $_.FullName | ConvertFrom-Json} 

# Create Downtime per JSON Object
$downtimeArr | ForEach-Object {New-DDDowntime -Scope $_.scope -Start $_.Start ...}

Now, there are two distinct ways of managing downtimes for Datadog without violating the principle of single source of truth.

Separation of Data and Logic

Platform Engineering is always about abstraction to implement separation of duties. Using external data, managing the source code can be completely decoupled from the input data allowing for scenarios where application owners can bring their own downtime rules in a common syntax.

The data can also be audited without allowing accessing the source code.

#guide #terraform