Step 3.2: Framework time: BUCKETIZE

Five frameworks are available and they all have their specific usage and important place in Time-Series analytics:

Let’s continue with the BUCKETIZE framework. Thanks to it, you will be able to apply some downsampling on the NASA lightcurve data.

The BUCKETIZE framework

In this tutorial we will discover the BUCKETIZE framework. It provides the tooling to put a time series data into regularly spaced buckets. A bucket corresponds to a time interval. We will use this bucket to downsample our data.

Downsampling some time series consists to reduce locally the number of points and to synchronize different series. There is several ways to process, one of the common one is to create small regular bucket into each series and to compute a value that resume this bucket. The following graphs shows this process:

Alt Text

What happen in each bucket? All the points inside the selected bucket are loaded and given to a function. This function will then be executed and one value will be selected (first, last, min, max) or computed (mean, median, count, join).

The graphs below and upper show a working example of the bucketize framework. In the graph below we takes only the first bucket: there is two ticks inside with value 20 and 10. Using, for example, the function mean, this will generate only one tick at the end of the bucket with value: mean(20 and 10) which equals 15:

Alt Text

This how the bucketize concept works, let see how it can be implemented with WarpScript!

Input parameters

The BUCKETIZE framework takes a list of elements as parameter. This list contains one or several time series list, a bucketizer function, a lastbucket that specify when start the last bucket (0 mean this will be computed automatically), a bucketspan which is the duration of the bucket (if 0 WarpScript will compute it), and finally a bucketcount which is the number of buckets to compute (if 0 WarpScript will compute it).

Pro tips: In order to get a correct number of buckets either Bucketspan or Bucketcount have to be specified!

Pro tips 2: Bucketcount indicate the number of bucket to keep starting from the last bucket computed when a Bucketspan is also set!

// BUCKETIZE Framework
[
    $gts                                // Series list or Singleton
    bucketizer.function                 // Bucketize function operator
    0                                   // Lastbucket
    1 d                                 // Bucketspan
    0                                   // Bucketcount
]
BUCKETIZE

To get a working bucketizer, replace the function keyword by an exisiting function as first, last, mean, min, max, median, join

Bucketize in pictures

Let’s resume step by step each bucketize element. First BUCKETIZE requires a set of time series (Singleton or list):

Alt Text

The second step consists to choose the function to apply on each bucket: first, last, mean, min, max, median, join, and others:

Alt Text

A bucketizer can be tuned according to three parameters: lastbucket which specifies the last tick (for all the series) to start computing each bucket. This parameter is very usefull to synchronize different time series.

Alt Text

The second parameter to tun a bucketizer consists of the bucketspan: the width of each buckets:

Alt Text

And finally the last parameter used to tun a bucketizer is the bucketcount to specify the maximal number of bucket to compute!

Alt Text

Now we would like to apply this specific framework to compute a downsampling operation on the NASA lightcurve data. Let’s see how to process!

Hello Exo World step

Only a single amount of series were kept in our previous step, it’s already possible to observe some drops in the data. Now we would like to write a script to detect those drops, which will means that an exoplanet could exists for this star. But first, to gain some data readibilty, a downsampling step is included. In our case we are interested in a one conserving the minimal point of each generated bucket for each series. Let’s do it! Try the bucketizer.min with a bucketize window of 2 h.

Resume

The main goal of this step is to downsample some raw data. To do so, you used the BUCKETIZE framework, it expects on top of the stack a list of parameters containing: the time series as list or as singleton to bucketize, a bucketizer function, and three long parameter: lastbucket, bucketspan and bucketcount.

The result of this step corresponds to a downsampled list. In our case the 4 selected time series are now on top of stack containing regular bucket and one value per bucket.

To be continued

It look’s simple as first look, but truts us it isn’t. When you will be working on complex time series analytics, keep in mind this framework and try to be familiar with it! You will quickly see how it will improves your exoplanet discovery quest! In the next step, we are guiding you in the usage of an other usefull framework: MAP or how to apply the same function on all datapoints of a set of time series.