Parsing JSON body where value of one of the keys is a string for another JSON object

Some background real quick!

I have an app that sends emails through AWS’s SES. I want to receive bounce notifications so I can properly mitigate bounced email issues. I’ve set everything up so that SES tells SNS about bounces and SNS has my application’s endpoint for sending notifications. All of this is working after a small hiccup: SNS subscription confirmation requests (and maybe others too) set the Content-Type header to text/plain even though the body is clearly JSON. They’ve confirmed this is a bug (https://forums.aws.amazon.com/thread.jspa?start=0&threadID=69413&tstart=0) and it means that some extra work is required to properly parse these requests using Plug/Phoenix. I have these requests coming through a special pipeline that includes a plug checking for this specific content-type and then overwriting the header to the more appropriate application/json. With that, the subscription confirmation is working. Thanks ex_aws_sns for making that so easy!

The problem

Now, I need to process the incoming messages for bounced emails. However, the incoming payload is JSON where one of the keys ("Message") maps to a string that is just a stringified JSON payload. I believe AWS does this so that you can confirm the rest of the message (using signature verification) without having the “trust” the value in "Message" first. Unfortunately, I think this breaks how Jason decodes JSON. Here’s an example incoming payload, parsed by a different system so that only the first “level” of JSON is decoded with anything remotely sensitive changed:

{
  "Message": "{"notificationType":"Bounce","bounce":{"bounceType":"Permanent","bounceSubType":"General","bouncedRecipients":[{"emailAddress":"bounce@simulator.amazonses.com","action":"failed","status":"5.1.1","diagnosticCode":"smtp; 550 5.1.1 user unknown"}],"timestamp":"[utc-timestamp-with-date]","feedbackId":"[some-extended-uuid]","remoteMtaIp":"[some.remote.i.p]","reportingMTA":"dsn; a1-2.smtp-out.amazonses.com"},"mail":{"timestamp":"[utc-timestamp-with-date]","source":"example@example.com","sourceArn":"arn:aws:ses:us-east-1:[some-12-digit-account-number]:identity/example@example.com","sourceIp":"[some.remote.i.p]","sendingAccountId":"[some-12-digit-account-number]","messageId":"[some-extended-uuid]","destination":["bounce@simulator.amazonses.com"]}}", 
  "MessageId": "[some-uuid]", 
  "Signature": "[base64encodedgarbage]", 
  "SignatureVersion": "1", 
  "SigningCertURL": "https://sns.us-east-1.amazonaws.com/SimpleNotificationService-[some-hash].pem", 
  "Timestamp": "[utc-timestamp-with-date]", 
  "TopicArn": "arn:aws:sns:us-east-1:[some-12-digit-account-number]:ses-email-bounce", 
  "Type": "Notification", 
  "UnsubscribeURL": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:[some-12-digit-account-number]:ses-email-bounce:[some-uuid]"
}

Perhaps because I have to manually change the Content-Type header, the body isn’t parsed by the time it hits the router’s pipelines. Right after the plug for changing the Content-Type header, I have:

plug Plug.Parsers, parsers: [:json], json_decoder: Jason

When my application receives the payload from SNS, this results in a Jason.DecodeError, the unexpected byte it’s referring to is always the first ‘n’ for the value of "Message" in the example above.

I just need to access the list of "bouncedRecipients" within the JSON payload that’s actually a string which is the value of a key inside another JSON payload that used to be a string.

Is this technically a bug in Jason? I haven’t worked with embedded payloads like this in Elixir/Phoenix/Plug/Jason before. I’m not seeing any options or other ways to decode only the “first level” of stringification in Jason. Is this something that ex_aws is adept at handling and I’m just missing something? Do I just need to write my own parser here?

Bonus points: What do you call this situation?

Interesting problem. Unless I’m missing something, couldn’t you get rid of those troublesome quotes around the object via Regex before passing to Jason?

Are you saying that you receive the SNS event with Message key unescaped? That hasn’t been my experience with SNS events. But I’m not calling you a liar. :wink:

In that case, you’re not gonna get any self-respecting JSON library to decode that payload.

Personally I would just parse out the message with Regex and use Jason to decode that.

Edit:
This should work

    case Regex.named_captures(~R/"Message": "(?<message>.+)"/, sns_event) do
      %{"message" => message} ->
        Jason.decode!(message)
        |> Map.get("bounce", %{})
        |> Map.get("bouncedRecipients", [])

      _ ->
        []
    end
    |> IO.inspect()

You’re right, the decoder that produced the above example strips the escaping, but the value of "Message" is properly escaped and can be decoded properly on its own.

I’ll try the Regex method and see what I can get. Hopefully it won’t have to be too messy :laughing:

Couldn’t you register a text/plain Plug parser for that one route? I’m pretty sure that won’t escape the payload. But I could be wrong.

If it is, it’s something more complicated than just the embedded JSON - a simple example works:

iex(4)> s = "{\"foo\": \"bar\", \"baz\": \"{\\\"blam\\\": \\\"wat\\\"}\"}"
"{\"foo\": \"bar\", \"baz\": \"{\\\"blam\\\": \\\"wat\\\"}\"}"
iex(5)> IO.puts(s)
{"foo": "bar", "baz": "{\"blam\": \"wat\"}"}
:ok
iex(6)> Jason.decode(s)
{:ok, %{"baz" => "{\"blam\": \"wat\"}", "foo" => "bar"}}

(this is on Elixir 1.9.0 with Jason 1.1.2, because that’s what was handy)

1 Like

This is not a bug in Jason, as the snippet you posted is not valid JSON. Jason is working as it should, as it refuses to parse that.

2 Likes

Just curious, would not SQS instead of SNS and Broadway alone instead of Phoenix more reliable in your usecase?