LiveBook Athena query is generating data in the AWS S3 bucket but the data is not returned to LiveBook

I am following the blog on setting up LiveBook smart cell to query Athena. The connection is coming back as :ok. I also see in the AWS output location a .csv file and a .metadata file are generated at the time the query runs. When I download the .csv file from the S3 bucket it contains the data that I expect to see as a result of the query. However, in LiveBook “result” is set to “no data”. This is version LiveBook 0.8.0.

What are the next troubleshooting steps?

Are you using the CLI or the Desktop app? The cli displays logging information in the terminal. The desktop app stores log files in ~/Library/Logs on macOS. I believe it’s called Livebook.log. Livebook is good about displaying most really problematic errors so I’m surprised nothing is being displayed. Usually everything I see in the terminal is shown on screen but there could be something hiding behind some nasty faults.

I am running LiveBook from the command line “livebook server” on PopOs! 20.04. There is no output in my terminal window to indicate something went wrong. I did have errors getting connected initially but it did display the errors so that I could react appropriately and correct the issues.

Yeah that’s strange. I would expect warnings or errors to show there. You should be able to click below the cell + then Code to have a new code cell below it. In that cell you should be able to type result or whatever the assign to sections are. If you type conn and evaluate the cell, you should see the data structure of the connection. For result it may try to display the same table or may output in a different format.

I admit I don’t know a lot about Athena to help too much but you should be able to use code cells to debug what’s happening. You may also be missing a Mix.install though I thought smart cells handled all that for you. The announcement post at How to query and visualize data from Amazon Athena using Livebook - Livebook.dev - The Livebook Blog doesn’t mention anything that remotely jumps out like that.

Did you run the following step from the notebook?

  Req.post!(conn,
    athena:
      {"""
       CREATE EXTERNAL TABLE IF NOT EXISTS stations  (
         station_id string, 
         latitude double, 
         longitude double, 
         elevation double,
         name string)
       ROW FORMAT SERDE 
         'org.apache.hadoop.hive.serde2.RegexSerDe' 
       WITH SERDEPROPERTIES ( 
         'input.regex'='([^ ]*) *([^ ]*) *([^ ]*) *([^ ]*) *(.+)$') 
       STORED AS INPUTFORMAT 
         'org.apache.hadoop.mapred.TextInputFormat' 
       OUTPUTFORMAT 
         'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
       LOCATION
         's3://livebook-blog/amazon-athena-integration'
       TBLPROPERTIES (
         'typeOfData'='file')
       """, []},
    cache_query: true
  ).body

I had originally skipped that step and went straight to querying my tables, which did work mostly as expected. Results were created in the output bucket they were just not returned back to LiveBook.

But when I do run the CREATE TABLE step that is listed in the blog article, that opens a whole new can of worms. The table is created but the data is bad (verified by running select statement from Athena). The data was not in columns.

I also have tried DROP TABLE from LiveBook and it says “table does not exits” then when I try to run the CREATE TABLE again from LiveBook it says “table already exists”.

Did this blog article work for you with no additional steps or modifications?

I see. Following the blog article worked for me. I’m actually the one who wrote the article, I work at Livebook. :slight_smile:

I also have tried DROP TABLE from LiveBook and it says “table does not exits” then when I try to run the CREATE TABLE again from LiveBook it says “table already exists”.

Just fixed a smart table creation query to check if the table exists before creating.

Please try to run the accompanying notebook. I just ran it and it worked for me.

1 Like

also when i try running the create table from the blog article exacts as written … i get this error when I try to query from Athena.

I assume the S3 bucket listed under “Location” is the data source and it isn’t public, maybe?

Okay running your livebook. I get this … I did fill in the WorkGroup in the connection settings since that was how my glue tasks created my tables.

That’s right, my bad. I just made the s3 bucket public for READ access. Please try the notebook again and let me know if it works.

I started with a fresh copy of the LiveBook that you linked to earlier. I also dropped the ‘stations’ table displayed in my Athena.

I think I found the root causes of the problem. Thanks for bringing this up. :blush:

One of the problems was with the permissions configuration of our S3 bucket, the s3://livebook-blog/amazon-athena-integration/.

The other potential bug is related to the cache query config in the SQL Query Smart Cell. We’ll investigate further that possible bug.

In the meantime, I modified the blog post and the accompanying notebook. It should work now. Please start from scratch, import the notebook again, and let me know if it works for you.

If it doesn’t work, one possible way to fix it is to disable the query cache config for now, like this:

So sorry, still a no go. On the bright side, the ‘stations’ table is being created mostly as expected. When I query the ‘stations’ table though it still does not return any results to LiveBook. Again on the bright side the S3 output destination is receiving good query results as expected the same as when I query my own tables.

The catch for me this time in generating the ‘stations’ table was the database was hard coded into in the “CREATE TABLE” statement. So I wasn’t seeing the table appear in my database as expected. But the data is appearing in columnar format as expected when queried from Athena.

CREATE EXTERNAL TABLE IF NOT EXISTS default.stations (

I still get the “No data” in LiveBook though when I run the ‘select’ statement.

I see, that’s weird.

I’m not being able to reproduce the problem here. I even asked a colleague to test on his machine, and it worked for him as well.

Can you please send me a message in Elixir Slack so we can try to investigate the problem together? My username there is “hugobarauna”.

Hi @michaelwa , as we talked, the problem was fixed. :+1:

We also released a new version of ReqAthena, v0.1.2, with the fix.

Thanks for reporting and for helping with the debugging process. :blush: