ExVCR removing sensitive data from a string encoded json object

I’m using ExVCR to record HTTP responses in my tests and need to remove sensitive data from there but am struggling.

According to the docs:

" ExVCR.Config.filter_sensitive_data(pattern, placeholder) method can be used to remove sensitive data. It searches for string matches with pattern, which is a string representing a regular expression, and replaces with placeholder. Replacements happen both in URLs and request and response bodies."

test "replace sensitive data" do
  ExVCR.Config.filter_sensitive_data("<PASSWORD>.+</PASSWORD>", "PLACEHOLDER")
  use_cassette "sensitive_data" do
    assert HTTPotion.get("http://something.example.com", []).body =~ ~r/PLACEHOLDER/
  end
end

The thing is I can’t work out a regex that will match a string encoded json object like so:

"{\"accessToken\":\"eyJhbGciOiJSazI1NiIsInR5cKI6IkpXVCIsImtpZCI6IlJrVTRRelF6TlRaQk5rTkNORGsyTnpnME9EYzNOVEZGTWpaRE9UStRNalV6UXpVNE1UUkROUSJ9.eyJodHRwczovL3N3eWZ0eC5jb20uYXUvLWp0aSI6IjBkZjU0Zjk0LTY0YjItNDVhNi1hOTFhLWU0Njc5ZjU4N2EwZiIsImh0dHBzOi8vc3d5ZnR4LmNvbS5hdS8tbWZhX2UuYsJsZWQiOnRyddUsImh0dHBzOi8vc7d5ZnR4LmNvbS5hdS8tY291bnRyeV9uYW1lIjoiQXVzdHJhbGlhIiwiaHR0cHM6Ly9zd3lmdHguY29tLmF1Ly1jaXR5X25hbWUiOiJTeWRuZXkiLCJpc3MiOiJodaRwczovL3N3eWZ0eC5hdS5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NWYxY2RkNzMwNjIzNzgwMDEzMjE4Njg1IiwiYXVkIjoiaHR0cHM6Ly9hcGkuc3d5ZnR4LmNvbS5hdS8iheJpYXQiOjE2MjUzNjgwMDYsImV4cCI6MTYyNTk3MjgwNiwiYXpwIjoiRVF3M2ZhQXhPVGhSWVRaeXkxdWxaRGk4REhSQVlkRU8iLCJzY29wZSI6ImFwcC5hY2NvdW50LnRheC1yZXBvcnQgYXBwLmFjY291bnQuYmFsYW5jZSBhcHAuYWNjb3VudC5zdGF0cyBhcHAuYWNjb3VudC5yZWFkIGFwcC5hZGRyZXNzLnJlYWQgYXBwLmZ1bmRzLnJlYWQgYXBwLm9yZGVycy5yZWFkIG9mZmxpbmVfYWNjZXNzIiwiZ3R5IjpbInJlZnJlc2hfdG9rZW4iLCJwYXNzd29yZCJdfQ.Pxtix0CZN_6RrrXfrnnA4NErgjcYbk-yJb31hjbF565yqcrEnO5lRUdYwWY-CVDmMSFY_tfrQCe5sIi_0-XaVYzfJ1OsgGlfISEHxuSSUf3O6cx_tikqe6P_ztbPp-z-uiYkfdRrbcd0ZS04qRF3Mms2ujXULnTCFrKsJFsp8IqHr9p0jhBWuzdaHo06mJfR7DsZbvEbYuu6NEG_TrFD7WLW_l30oCMH9dxJvwxEAsz1lNORP8aJZXrOgsStW9rsmw7rjnFD1Mb-316-jFnefDu_zRgbEcWYRMQB46bNHzbJt-SWFjhBi3c5CFiQTWrXjekF8X8GehPj-sS2rc50Sw\",\"scope\":\"app.account.tax-report app.account.balance app.account.stats app.account.read app.address.read app.funds.read app.orders.read offline_access\"}"

I’ve tried the following:

    ExVCR.Config.filter_sensitive_data("accessToken\\\":\\\"[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)?", "accessToken\\\":\\\"<ACCESS_TOKEN>")
    ExVCR.Config.filter_sensitive_data("accessToken[\\\":\w.-]*", "<ACCESS_TOKEN>")
    ExVCR.Config.filter_sensitive_data("ey[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)?", "<JWT>")
    ExVCR.Config.filter_sensitive_data("\\\"[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)?\\\"", "\\\"<JWT>\\\"")

Has any gone through the process of removing sensitive data from ExVCR recordings?

I think this might be a problem in ExVCR, that or I don’t understand regex in elixir which is possible.

So ExVCR looks to do the following: String.replace(body, ~r/#{pattern}/, placeholder) and when calling filter_sensitive_data you have to pass the regex as a string. I think this is the problem because when I convert my regex to a string I have to escape the additional quotes which then doesn’t match anymore. See the following iex commands:

# Show the regex is correct and works
iex(51)> body = "{\"apiKey\":\"ROj7Jpxb1iERp5l3pxZYLiVOPCOIv_hQM4cGOTl4m-N2t\"}"
"{\"apiKey\":\"ROj7Jpxb1iERp5l3pxZYLiVOPCOIv_hQM4cGOTl4m-N2t\"}"
iex(52)> String.replace(body, ~r/apiKey":"[\w-]*/, "X")
"{\"X\"}"

# move regex into a variable by wrapping it in quotes fails due to inner quotes
iex(49)> pattern = "apiKey":"[\w-]*"
** (SyntaxError) iex:49:19: syntax error before: '[w-]*'

# escape the quotes
iex(49)> pattern = "apiKey\":\"[\w-]*"
"apiKey\":\"[w-]*"

# Replace using the pattern variable no longer works it should be the same as above and return "{\"X\"}"
iex(54)> String.replace(body, ~r/#{pattern}/, "X")
"{\"XROj7Jpxb1iERp5l3pxZYLiVOPCOIv_hQM4cGOTl4m-N2t\"}"

I think the issue is in how you are escaping your string. If you look at the first regex, in between the quotes you have apiKey":"[\w-]*, which is what works. But if you look at your string version, specifically the return value of assigning your string to pattern, it returns apiKey\":\"[w-]*, which is almost the same as what you have above, but if you notice it’s missing the backslash before the w, because the backslash inside a string means the next character is escaped, you need to use a double backslash. This should work:

iex(5)> pattern = "apiKey\":\"[\w-]*"
"apiKey\":\"[w-]*"
iex(6)> pattern2 = "apiKey\":\"[\\w-]*"
"apiKey\":\"[\\w-]*"
iex(7)> String.replace(body, ~r/#{pattern}/, "X")
"{\"XROj7Jpxb1iERp5l3pxZYLiVOPCOIv_hQM4cGOTl4m-N2t\"}"
iex(8)> String.replace(body, ~r/#{pattern2}/, "X")
"{\"X\"}"
5 Likes