AWS Timestream time series database for IoT

  • October 7, 2020
  • Steve Rogerson

Amazon Web Services has introduced Timestream, a time series database for IoT and operational applications that can scale to process trillions of time series events per day up to 1000 times faster than relational databases, and at as low as a tenth the cost.

Timestream saves effort and expense by keeping recent data in-memory and moving historical data to a cost-optimised storage tier based on user-defined policies, while its query processing gives the ability to access and combine recent and historical data transparently across tiers with a single query, without needing to specify explicitly in the query whether the data reside in the in-memory or cost-optimised tier.

Analytics features provide time series-specific functionality to help identify trends and patterns in data in near real time. Because it is serverless, it automatically scales up or down to adjust capacity based on load, without users needing to manage the underlying infrastructure. There are no upfront costs or commitments required to use Timestream, and users pay only for the data they write, store or query.

Today’s users want to build IoT, edge and operational applications that collect, synthesise and derive insights from enormous amounts of data that change over time, known as time series data. For example, manufacturers might want to track IoT sensor data that measure changes in equipment across a facility, online marketers might want to analyse clickstream data that capture how a user navigates a web site over time, and data centre operators might want to view data that measure changes in infrastructure performance metrics.

These time series data can be generated from multiple sources in extremely high volumes, need to be cost-effectively collected in near real time, and require efficient storage that helps organise and analyse the data.

To do this today, users can either use existing relational databases or self-managed time series databases. Neither of these options is attractive. Relational databases have rigid schemas that need to be predefined and are inflexible if new attributes of an application need to be tracked. For example, when new devices come online and start emitting time series data, rigid schemas mean that users either have to discard the new data or redesign their tables to support the new devices, which can be costly and time-consuming.

In addition to rigid schemas, relational databases also require multiple tables and indexes that need to be updated as new data arrive and lead to complex and inefficient queries as the data grow over time. Additionally, relational databases lack the required time series analytical functions such as smoothing, approximation and interpolation that help identify trends and patterns in near real time.

Alternatively, time series databases that users build and manage themselves have limited data processing and storage capacity, making them difficult to scale. Many existing time series databases fail to support data retention policies, creating storage complexity as data grow over time. To access the data, users must build custom query engines and tools, which are difficult to configure and maintain, and can require complicated, multi-year engineering initiatives. Furthermore, they do not integrate with the data collection, visualisation and machine-learning tools already being used. The result is that many just don’t bother saving or analysing time series data, missing out on the valuable insights they can provide.

Timestream addresses this by providing a purpose-built, serverless time series database for collecting, storing and processing time series data. It automatically detects the attributes of the data, so there is no longer a need to predefine a schema. Timestream simplifies the complex process of data lifecycle management with automated storage tiering that stores recent data in memory and automatically moves historical data to a cost-optimised storage tier based on predefined user policies.

It also uses a purpose-built adaptive query engine to access and combine recent and historical data across tiers transparently with a single SQL statement, without having to specify which storage tier houses the data. This lets users query all of their data using a single query without requiring them to write complicated application logic that looks up where their data are stored, queries each tier independently, and then combines the results into a complete view.

The built-in time series analytics has functions for smoothing, approximation and interpolation, so users don’t have to extract raw data from their databases and then perform their time series analytics with external tools and libraries or write complex stored procedures that not all databases support.

The serverless architecture is built with fully decoupled data ingestion and query processing systems, giving virtually infinite scale and the ability to grow storage and query processing independently and automatically, without requiring users to manage the underlying infrastructure.

In addition, Timestream integrates with popular data collection, visualisation and machine-learning tools that are used today, including services such as AWS IoT Core for IoT data collection, Amazon Kinesis and MSK for streaming data, Amazon QuickSight for serverless business intelligence, and SageMaker for building, training and deploying machine-learning models quickly, as well as open source, third-party tools such as Grafana for observability dashboards and Telegraf for metrics collection.

“What we hear from customers is that they have a lot of insightful data buried in their industrial equipment, web site clickstream logs, data centre infrastructure and many other places, but managing time series data at scale is too complex, expensive and slow,” said Shawn Bice, vice president at AWS. “Solving this problem required us to build something entirely new. Amazon Timestream provides a serverless database service that is purpose-built to manage the scale and complexity of time series data in the cloud, so customers can store more data more easily and cost effectively, giving them the ability to derive additional insights and drive better business decisions from their IoT and operational monitoring applications.”

One early user is Autodesk, which specialises in software for architecture, engineering, construction, media and entertainment, and manufacturing industries.

“At Autodesk, we make software for people who make things,” said Scott Reese, senior vice president at Autodesk. “This includes everything from buildings, bridges, roads, cars, medical devices and consumer electronics, to the movies and video games that we all know and love. We see that Amazon Timestream has the potential to help deliver new workflows by providing a cloud-hosted, scalable time series database. We anticipate that this will improve product performance and reduce waste in manufacturing. The key differentiator that excites us is the promise that this value will come without adding a data management burden for the customers nor Autodesk.”

Trimble is a technology provider of productivity for the construction, resources, geospatial and transportation industries.

“Whenever possible, we leverage AWS’s managed service offerings,” said David Kohler, engineering director at Trimble. “We are excited to now use Amazon Timestream as a serverless time series database supporting our IoT monitoring. Timestream is purpose-built for our IoT-generated time series data, and will allow us to reduce management overhead, improve performance and reduce costs of our existing monitoring system.”

With over 60 years of fashion retailing experience, River Island is one of the most well known and loved brands with over 350 stores across Europe, Asia and the Middle East, and six dedicated online sites operating in four currencies.

“The cloud engineering team have been excited about the release of Amazon Timestream for some time,” said Tonino Greco, head of cloud and infrastructure at River Island. “We’ve struggled to find a time series data store that is simple, easy and affordable. With Amazon Timestream we get that and more. Timestream will enable us to build a central monitoring capability across all of our heritage systems, as well as our AWS hosted microservices. Interesting times!”