How to use AWS.Textract?

Hi community,

I am going to implement textract for my project and I’m facing an issue using AWS.Textract — aws-elixir v0.13.1, I have tried to look into the function and I am still quite lost …

I don’t have a solution and I hope someone out there could provide me a better example on how to use AWS.Textract — aws-elixir v0.13.1

Thank you so much in advance :sob:

Best wishes,
Jing Hui P.

Hi @enkr1 can you explain what part confuses you? Can you show code you have tried?

Hi @benwilson512,

This is what I tried:

file = File.read!("sample-1.png") # same sample file i put at root

md5 =
  :crypto.hash(:md5, file)
  |> Base.encode64()
  |> IO.inspect(label: "md5")

AWS.Client.create("MY-AK","MY-SK","ap-southeast-1")
|> AWS.Textract.analyze_document(%{"Body" => file, "ContentMD5" => md5})

OK, and when you did that, what happened?

# output
md5: "DbRrxJFe+3N9d8tVLxJkYQ=="
** (Jason.EncodeError) invalid byte 0x89 in <<137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82, 0, 0, 6, 164, 0, 0, 8, 152, 8, 3, 0, 0, 0, 134, 137, 54, 129, 0, 0, 0, 9, 112, 72, 89, 115, 0, 0, 30, 194, 0, 0, 30, 194, 1, ...>>
    (jason 1.4.0) lib/jason.ex:164: Jason.encode!/2
    (aws 0.13.1) lib/aws/request.ex:41: AWS.Request.request_post/5
    iex:40: (file)

Sorry about that, i forgot to attach the output

I am also quite curious, I was trying to debug in the analyze_document like:

but it is not even hitting this line… i cant hit line 69 at all.

Are you editing this file in your deps folder? If so make sure to mix deps.compile aws_elixir afterward, changes those files are not automatically picked up.

AnalyzeDocument - Amazon Textract documents the input shape, I don’t see either ContentMD5 or Body as parameters.

1 Like

I tried to input as a 64encoded string:

file = File.read!("sample-1.png")

encoded_file_str =
  file
  |> Base.encode64()

AWS.Client.create(
  "AKIAWDEDHAWI4DJKMRGD",
  "HQ39L1VL1chintpG63RzMbXEB51RoavqTkR3Mtf7",
  "ap-southeast-1"
)
|> AWS.Textract.analyze_document(encoded_file_str)

it throws this error:


** (FunctionClauseError) no function clause matching in AWS.Request.encode!/3    
    
    The following arguments were given to AWS.Request.encode!/3:
    
        # 1
        #AWS.Client<
          region: "ap-southeast-1",
          service: "textract",
          endpoint: nil,
          proto: "https",
          port: 443,
          http_client: {AWS.HTTPClient, []},
          json_module: {AWS.JSON, []},
          xml_module: {AWS.XML, []},
          ...
        >
    
        # 2
        "json"
    
        # 3
        "iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAMA...

    Attempted function clauses (showing 1 out of 1):
    
        defp encode!(%AWS.Client{} = client, protocol, payload) when protocol === "query" or protocol === "json" or protocol === "rest-json" or protocol === "rest-xml" and is_map(payload)
    
    (aws 0.13.1) lib/aws/request.ex:249: AWS.Request.encode!/3
    (aws 0.13.1) lib/aws/request.ex:41: AWS.Request.request_post/5
    iex:62: (file)

I tried to mix deps.compile aws as I’m using {:aws, "~> 0.13.0"} but the log still does not show up … :confused:

edit: OMG it shows up only when i restart the iex

Update!

I managed to get some kind of readable response by using %{"Document" => encoded_file_str} as my input!

{:error,
 {:unexpected_response,
  %{
    body: "{\"__type\":\"SerializationException\",\"Message\":\"Expected null\"}",
    headers: [
      {"x-amzn-RequestId",
       "01daf971-d182-4478-b1ed-9ba6294f2e8d"},
      {"Content-Type", "application/x-amz-json-1.1"},
      {"Content-Length", "61"},
      {"Date", "Wed, 16 Nov 2022 06:42:39 GMT"},
      {"Connection", "close"}
    ],
    status_code: 400
  }}}

Managed to find the solution!!! :tada:

encoded_file_str = File.read!("sample-1.png") |> Base.encode64()

AWS.Client.create("XXX", "XXX", "ap-southeast-1")
|> AWS.Textract.analyze_document(%{
  "Document" => %{"Bytes" => encoded_file_str},
  "FeatureTypes" => ["TABLES"]
})

Thanks @benwilson512 for the hints!!!

2 Likes