You are going to hit very wide rows with this. I would advise against your table format, and I'd go with something that allows you to control the wideness of the rows.
Depending on your query requirements, I'll write you down a more suitable schema (IMHO):
CREATE TABLE keyspace.measurement (
parameterId int,
granularity timestamp,
value bigint,
measureTime timestamp,
PRIMARY KEY((parameterId, granularity), measureTime)
) WITH CLUSTERING ORDER BY (measureTime DESC)
This is very similar to yours, however it has a major advantage: you can configure the wideness of your rows, and you don't have any hotspots. The idea is dead simple: both parameterId
and granularity
fields make the partition key, so they tell where your data will go, while measureTime
will keep your data ordered. Supposing you want to query on a day-by-day basis, you'd store into granularity
the value yyyy-mm-dd
of your measureTime
, grouping together all the measures of the same day.
This allows you to retrieve all the values lying on the same partition (so per given parameterId
and granularity
fields pair) with an efficient range query. In a day-by-day configuration, you'd end up with 86400 records per partition. This number could be still high (the suggested limit is 10k IIRC), and you can lower tht value by going on hour-by-hour grouping with yyyy-mm-dd HH:00
value instead.
The drawback of that approach is that if you need data from multiple partitions (eg you are grouping on day-by-day basis, but you need data for two consecutive days, eg the last 6 hours of the Jan 19th, and the first 6 hours of Jan 20th), then you'll need to perform multiple queries.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…