Post Stream Widget

Path Params
integer
required
Body Params
string
required
query
object
required

1 - How to use it and build a request

Recipe

You can find here a recipe with an hello-world example which will :

  • Show you how to connect to the API
  • How to use the streams API
    • Upload an example dataset as a stream
    • Get the stream definition schema
  • How to (basically) use the Query API on this example dataset
recipe_screenshot

Summary

It's a request builder to get data from a stream

Supports multiple "modes":

  • listing all records matching a criteria
  • performing ad-hoc aggregations over records
  • retrieving a distinct_on of records at a specific time, distinct on a set of fields

This reuses all the principles of a SQL query and can be considered as a (very) simplified layer above the data. A basic understanding of the main SQL principles would be helpful to understand its capacities, especially because we'll use SQL as a guideline to explain the different principles.

Dataset & queries examples

With this documentation is also provided a set of examples which contain :

  • A raw data example of the stream we are querying
  • A sentence explaining what we're trying to achieve with this query
  • The full payload of the sentence
  • An extract of the response sent by the API :
    • schema of the columns response
    • the first row result

You can find it HERE

API usage

Use stream definition

The API already provides the stream definition of the columns on different endpoints ( GET zone/{zone_id}/streams/, GET zone/{zone_id}/streams/{stream_id} ). This stream definition gives us information about data present in the stream :

  • the column name
  • the min/max of the column
  • the number of rows
  • the data type

Here is a stream definition example

"stream_fields": {
    "additionalProp1": {
      "index": "primary",
      "extents": {
        "min": 2345,
        "max": 6789
      },
      "count": 0,
      "data_type": "number"
    },
    ...
  },

With this stream definition, you have the necessary informations to build a request to answer questions you have on the data.

The query request payload explanation

A payload for the query endpoint ( POST zone/{zone_id}/streams/query ) is composed like this :

{
  "fields": [],
  "aggregation": {
    "time": {
      "resolution": "minute",
      "field": "string",
      "label": "string"
    },
    "operations": []
  },
  "computed_fields": [],
  "distinct_on": {
    "fields": [],
    "order_by": [],
    "where": []
  },
  "joins": [],
  "where": [],
  "order_by": [],
  "limit": 50000,
  "sample": null
}

Let's discover what's the use case for each field

  • fields

    • It is considered as a simple select in SQL to choose the fields to return.

    • If none of the attributes fields, aggregation or computed_fields are provided in the payload request, it will select all fields in the stream by default

          {
              "fields": [
                  "start_name",
                  "end_name",
                  "geography",
              ]
              ...
          }
    • SQL translation :

      select start_name,end_name,geography from stream_records
  • aggregation

    • It's also a select in SQL, but it is used to compute aggregations on 1 specific field : sum, average, count ...

    • Specifities :

      • count does not need a field key associated with it, it will count all rows
      • All the other aggregations can be performed on fields which have a number type (We obviously can't make an average or a sum on a string)
    • There's no need to supply another value in the fields key, it will be merged as a single select

    • If you want to order by an aggregated_field, you'll have to provide the label property that will be used in the order_by field.

    • It can contain :

      • A list of operations that will be applied on fields :
          {
              "aggregation": {
                  "operations": [
                      {
                          "field": "od_count",
                          "type": "sum"
                      }
                  ]
              }
              ...
          },
        - SQL translation :
        ```sql 
        select sum(od_count) from stream_records
        ```
      • the time/date to group by the values on, with the possibility to change the resolution of the selected field date
          {
              "aggregation": {
                  "time": {
                      "resolution": "month",
                      "field": "OCCURRED_ON_DATE"
                  },
                  "operations": [
                      {
                          "type": "count"
                      }
                  ]
              },
              ...
          }
        - SQL translation :
        ```sql 
        select DATE_TRUNC(MONTH,OCCURED_ON_DATE)::TIMESTAMP_NTZ as "time" from stream_records
        ```
  • computed_fields

    • It's also a select in SQL, but it's used to make operations between fields or values. Here, we don't want to aggregate fields based on an aggregation operation, but we simply want to make simple computations between 2 fields. In the opposite, the aggregation performs on only 1 field.
    • There's no need to supply another value in the fields key, it will be merged as a single select
    • It can be quite powerful because it can be used in a nested way. You can make any nested computations that you want, the parser uses recursivity to interpret correctly the final operation
    • If you want to order by an aggregated_field, you'll have to provide the label property that will be used in the order_by field.
          {
              "computed_fields": [
                  {
                      "fields": [
                          {
                              "fields" : [
                                  {
                                      "fields": [
                                          "end_time",
                                          "start_time"
                                      ],
                                      "type": "minus"
                                  },
                                  {
                                      "fields": [
                                          10,
                                          100
                                      ],
                                      "type": "multiply"
                                  }
                              ],
                              "type": "divide"
                          },
                          99
                      ],
                      "type": "plus",
                      "label": "trip_duration_recomputed"
                  },
                  ...
          }
      • SQL translation :
      select ((end_time - start_time) / (10 * 100)) + 99 as "trip_duration_recomputed" from stream_records
  • where

    • Used to filter the rows based on the values of the fields. It's made of one field to compare (multiple operators available) with a value/another field
    • ** WARNING : It's mandatory to provide at least one where condition with the stream_primary_key in it, to filter it **
    {
        "where": [
            {
                "field": "stream_primary_key",
                "condition": "equals",
                "value": "1111111-b0ff-4610-986f-bf3a25154971"
            },
            {
                "field": "time",
                "condition": "greater_than",
                "value": "2022-03-20"
            }
            {
                "field": "start_date",
                "condition": "equals",
                "value": {
                    "source": "end_date"
                }
            }
        ],
        ...
    }
      - SQL Translation
      ```sql 
      from table stream_records
      where 
          stream_primary_key = '1111111-b0ff-4610-986f-bf3a25154971' 
          and 
          time > "2022-03-20"
          and
          start_date = end_date -- check rows where the events happened in the same day, for example
      ```
  • order_by

    • The fields to order by the results on. It can be cumulative (order first with the field x then y then ...)
    • You can also provide how to order it : from highest value to lowest (ascending:false) or from lowest value to highest (ascending:true)
    • You can order by a simple field
      {
          "order_by": [
              {
                  "field": "index",
                  "ascending": false
              }
          ],
          ...
      }
      • SQL translation :
      order by index desc 
    • And you can order by a computed/aggregated field but the fields will need to have a label associated with them.
      {
          "order_by": [
              {
                  "label": "computed_field",
                  "ascending": false
              }
          ],
          ...
      }
      • SQL translation :
      order by computed_field desc
  • limit

    • Number of rows maximum that will be returned in the output
        {
            "limit": 10,
            ...
        }

    SQL translation : sql LIMIT 10

  • sample

    • Data will be queried on n rows chosen randomly in the stream
    • To use with precaution, it will use only a sample of the raw data to make the computation so the query results will only represent a subset of the truth, to use when no need for absolute precision.
        {
            "sample": 500
            ...
        }

    SQL translation : sql FROM table stream_records SAMPLE(500 ROWS)

  • distinct_on

    • You can select some fields to be unique in the output and decide on what criteria the specific row should be chosen by reusing 2 concepts already seen :
      • order_by
      • where
    • You will :
      • select the rows in the dataset with unique vehicle id (the same vehicle id won't appear twice)
      • How to choose the row which should be selected to appear ? the most recent one in the dataset, for example, with the order_by clause
      • Filter : where the event_type in the row is not removed (where).
    • Concept explained here
        {
            "distinct_on": {
                "fields": [
                    "vehicle_id"
                ],
                "order_by": [
                    {
                        "ascending": false,
                        "field": "time"
                    }
                ],
                "where": [
                    {
                        "condition": "not_equals",
                        "field": "event_type",
                        "value": "removed"
                    }
                ]
            },
            ...
        }
      - SQL translation :
        QUALIFY
            ROW_NUMBER() OVER (
                PARTITION BY
                    vehicle_id
                ORDER BY
                    time desc
            ) = 1
        AND event_type != removed
  • join

    • Enables you to join your main source of data with other streams, by joining 2 columns together. You can also filter on values in the dataset to be joined on.
    • Like in the where param, it's mandatory to set up the stream_primary_key of the stream you want to join_on.
    • You can join as many streams as you want but be careful, the query might be slower to perform
    • join_on is a list of where conditions. You'll use these conditions to choose which fields needs to be joined together, and you'll refer to the main stream with the source param.
    • The label field will name the second stream, and we'll use it to refer to the field owned by the second stream, like we'll see in the next example
        {
            "joins": [
            {
                "join_on": [
                    {
                        "field": "stream_primary_key",
                        "condition": "equals",
                        "value": "5eee21fc-aac8-43f3-8ac0-7d2d19b9ccdf"
                    },
                    {
                        "field": "name",
                        "condition": "equals",
                        "value": {
                            "source": "start_name"
                        }
                    }
                    {
                        "field": "tag",
                        "condition": "in_",
                        "value": [
                            "district"
                        ]
                    }
                ],
                "label": "districts"
            },
            ...
        }
      - SQL translation :
      ```sql
      FROM stream_records
      INNER JOIN on stream_records as districts
          ON B.stream_primary_key = "5eee21fc-aac8-43f3-8ac0-7d2d19b9ccdf"
          AND A.start_name = B.name
          AND B.tag in ('district')
      ```
    • To select a field owned by the second stream as a colum of the output results, you'll have to use a specific syntax by reusing the label set up (You can also reuse this syntax in order_by/where/aggregation/computed fields) :
    {   
        "fields": [
            "start_name",
            "end_name",
            [
                "join",
                "districts",
                "geography"
            ],
        ],
        ... ,
        "order_by" : [
            [
                "join",
                "districts",
                "name"
            ],
        ]
    }
      - SQL translation
      ```sql
      SELECT start_name, end_name, districts.geography
      ORDER BY districts.name
      ```
string
Responses

Language
Credentials
OAuth2
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json