Single table Dynamo DB instead of relational databases?

frumos · October 1, 2019, 5:21am

In our org we mostly use DDB and all new implementation when required persistence also should be on DDB. After 5 years of using DDB I can not say (subjectively) that I’m missing or having any issues in DDB that RDBMS can do better. Or may be it is already matter of habit.
But I should admit, our domain is pretty trivial - no much business logic so at most I had 3 tables what I had to treat together. Data volume however we have already 23 billion records in one table and we need to query ~7-10 millions records each hour. I am not sure if any of RDBMS could sustain such data pattern.
The real issue with DDB was lack of distributed transactions, but with release this feature 1 year ago our live become bright.
I also personally found that schema less, de-normalized and unstructured data are more flexible, and dynamic. I rather give to the same data storage (table) different processing functions than encode into data any semantics (imo).
Query wise we have couple of strategies:

we introduce an index (GSI) over that dimension (field) when we need query by
or table(s) with sort keys which are mimicking foreign keys to ‘parent’ table. Query logic then encoded into application itself.
also you have DDB stream which can be used for data gathering

I also would like to say that to me it was beneficial to refine brains from relational model for data representation as only and get at least other alternatives.

wliao008 · October 1, 2019, 3:35pm

Jeremy, there’s now adaptive capacity to handle hotspot: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-partitions-adaptive

benwilson512 · October 1, 2019, 4:05pm

Interesting! Can you elaborate on your use case a little more? Also, approximately how much does all of that cost?

frumos · October 1, 2019, 8:59pm

The use case is time series events processing system to build materialized vies in timeline and snapshots of any abstract resource which can have binary state like yes/no, running/non-runnig, active/inactive and so on. The end result is a snapshot(s) (vertical time cuts) of all “active resource” on some discrete basis like every 10 minutes/1 hour/1 day, or historical materialized view of resource changes over some period lets say 1 month.
The number of resource may be few tens of billions. And any of this resource can go from active to inactive and back state.

Cost of DDB is big, however.

andre1sk · October 1, 2019, 10:16pm

And yet Amazon can’t get of Oracle with all their “amazing” cloud “scalable” solutions such as DynamoDB they are unable to match the performance of Oracle.

lpil · October 1, 2019, 10:18pm

What do you mean by this?

andre1sk · October 1, 2019, 10:21pm

for most loaded services for amazon.com Amazon is running on Oracle they have tried to switch to the in house products multiple times but failed due to performance issues. Larry Ellison keeps making fun of them for this reason.

peerreynders · October 1, 2019, 11:38pm

It was Redshift (based on PostgreSQL 8.0.2), not DynamoDB.

Oracle’s Ellison: No way a ‘normal’ person would move to AWS (2018-Dec-18).

Meanwhile Oracle’s new Java SE subs: Code and support for $25/processor/month (OpenJDK is still available free) - so a ‘normal’ person would still have to ask themselves whether or not all of this is worth it in the end - evidently AWS came to the conclusion that it wasn’t.

andre1sk · October 2, 2019, 12:45am

Missed that I guess finally the were able to