vijaye12 commited on
Commit
dc31cd1
1 Parent(s): 573e5af
Files changed (1) hide show
  1. README.md +61 -63
README.md CHANGED
@@ -10,18 +10,22 @@ tags:
10
  - time-series
11
  ---
12
 
13
- # TinyTimeMixer (TTM) Model Card
14
 
15
  <p align="center" width="100%">
16
  <img src="ttm_image.webp" width="600">
17
  </p>
18
 
19
- TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.
20
  **With less than 1 Million parameters, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.**
21
 
 
 
 
 
22
  TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight
23
  forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be
24
- fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Refer to our [paper](https://arxiv.org/pdf/2401.03955v5.pdf) for more details.
25
 
26
 
27
  **The current open-source version supports point forecasting use-cases specifically ranging from minutely to hourly resolutions
@@ -30,18 +34,50 @@ fine-tuned for multi-variate forecasts with just 5% of the training data to be c
30
  **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**
31
 
32
 
33
- **Recent updates:** We have developed more sophisticated variants of TTMs (TTM-B, TTM-E and TTM-A), featuring extended benchmarks that compare them with some of the latest models
34
- such as TimesFM, Moirai, Chronos, Lag-llama, and Moment. For full details, please refer to the latest version of our [paper](https://arxiv.org/pdf/2401.03955.pdf).
35
- Stay tuned for the release of the model weights for these newer variants.
36
 
37
- ## How to Get Started with the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- Please refer to the below scrips for **zero-shot** and **finetuning** support:
40
- - [colab](https://colab.research.google.com/github/IBM/tsfm/blob/main/notebooks/tutorial/ttm_tutorial.ipynb)
41
- - [512-96 Benchmarks](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_512_96.ipynb)
42
- - [1024-96 Benchmarks](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
43
- - Script for Exogenous support - to be added soon
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Recommended Use
47
  1. Users have to externally standard scale their data independently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
@@ -49,24 +85,6 @@ Please refer to the below scrips for **zero-shot** and **finetuning** support:
49
  3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will
50
  impact the model performance.
51
 
52
- ## Benchmark Highlights:
53
-
54
- - TTM (with less than 1 Million parameters) outperforms the following popular Pre-trained SOTAs demanding several hundred Million to Billions of parameters [paper](https://arxiv.org/pdf/2401.03955v5.pdf):
55
- - *GPT4TS (NeurIPS 23) by 7-12% in few-shot forecasting*
56
- - *LLMTime (NeurIPS 23) by 24% in zero-shot forecasting*.
57
- - *SimMTM (NeurIPS 23) by 17% in few-shot forecasting*.
58
- - *Time-LLM (ICLR 24) by 2-8% in few-shot forecasting*
59
- - *UniTime (WWW 24) by 27% in zero-shot forecasting.*
60
- - Zero-shot results of TTM surpass the *few-shot results of many popular SOTA approaches* including
61
- PatchTST (ICLR 23), PatchTSMixer (KDD 23), TimesNet (ICLR 23), DLinear (AAAI 23) and FEDFormer (ICML 22).
62
- - TTM (1024-96, released in this model card with 1M parameters) outperforms pre-trained MOIRAI-Small (14M parameters) by 10%, MOIRAI-Base (91M parameters) by 2% and
63
- MOIRAI-Large (311M parameters) by 3% on zero-shot forecasting (horizon = 96). [[notebook]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
64
- - TTM quick fine-tuning also outperforms the competitive statistical baselines (Statistical ensemble and S-Naive) in
65
- M4-hourly dataset which existing pretrained TS models are finding difficult to outperform. [[notebook]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
66
- - TTM takes only a *few seconds for zeroshot/inference* and a *few minutes for finetuning* in 1 GPU machine, as
67
- opposed to long timing-requirements and heavy computing infra needs of other existing pre-trained models.
68
-
69
-
70
 
71
  ## Model Description
72
 
@@ -84,21 +102,10 @@ only 3-6 hours using 6 A100 GPUs, as opposed to several days or weeks in traditi
84
  Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our
85
  getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
86
 
87
- ## Model Releases (along with the branch name where the models are stored):
88
-
89
- - **512-96:** Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
90
- in future. This model is targeted towards a forecasting setting of context length 512 and forecast length 96 and
91
- recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: main)
92
-
93
- - **1024-96:** Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
94
- in future. This model is targeted towards a long forecasting setting of context length 1024 and forecast length 96 and
95
- recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-96-v1)
96
-
97
- - Stay tuned for more models !
98
 
99
  ## Model Details
100
 
101
- For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955v5.pdf).
102
 
103
  TTM-1 currently supports 2 modes:
104
 
@@ -113,22 +120,19 @@ The current release supports multivariate forecasting via both channel independe
113
  Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
114
  time-series variates, a critical capability lacking in existing counterparts.
115
 
116
- In addition, TTM also supports exogenous infusion and categorical data which is not released as part of this version.
117
- Stay tuned for these extended features.
118
 
119
 
120
-
121
-
122
  ### Model Sources
123
 
124
- - **Repository:** https://github.com/IBM/tsfm/tree/main/tsfm_public/models/tinytimemixer
125
- - **Paper:** https://arxiv.org/pdf/2401.03955v5.pdf
126
- - **Paper (Newer variants, extended benchmarks):** https://arxiv.org/pdf/2401.03955.pdf
127
 
128
- ### External Blogs on TTM
129
- - https://aihorizonforecast.substack.com/p/tiny-time-mixersttms-powerful-zerofew
130
- - https://medium.com/@david.proietti_17/predicting-venetian-lagoon-tide-levels-with-multivariate-time-series-modeling-8bafdf229588
131
 
 
 
 
 
132
  ## Uses
133
 
134
  ```
@@ -183,6 +187,7 @@ The TTM models were trained on a collection of datasets from the Monash Time Ser
183
  - US Births: https://zenodo.org/records/4656049
184
  - Wind Farms Production data: https://zenodo.org/records/4654858
185
  - Wind Power: https://zenodo.org/records/4656032
 
186
 
187
 
188
  ## Citation [optional]
@@ -192,24 +197,17 @@ work
192
  **BibTeX:**
193
 
194
  ```
195
- @misc{ekambaram2024tiny,
196
- title={Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series},
197
  author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
 
198
  year={2024},
199
- eprint={2401.03955},
200
- archivePrefix={arXiv},
201
- primaryClass={cs.LG}
202
  }
203
  ```
204
 
205
- **APA:**
206
-
207
- Ekambaram, V., Jati, A., Dayama, P., Mukherjee, S., Nguyen, N. H., Gifford, W. M., … Kalagnanam, J. (2024). Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2401.03955
208
-
209
-
210
  ## Model Card Authors
211
 
212
- Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Nam H. Nguyen, Wesley Gifford and Jayant Kalagnanam
213
 
214
 
215
  ## IBM Public Repository Disclosure:
 
10
  - time-series
11
  ---
12
 
13
+ # TinyTimeMixer (TTM) 1M Model Card
14
 
15
  <p align="center" width="100%">
16
  <img src="ttm_image.webp" width="600">
17
  </p>
18
 
19
+ TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.
20
  **With less than 1 Million parameters, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.**
21
 
22
+
23
+ TTM is accepted in NeurIPS 2024.
24
+
25
+
26
  TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight
27
  forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be
28
+ fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) for more details.
29
 
30
 
31
  **The current open-source version supports point forecasting use-cases specifically ranging from minutely to hourly resolutions
 
34
  **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**
35
 
36
 
 
 
 
37
 
38
+ ## Model Releases (along with the branch name where the models are stored):
39
+
40
+
41
+ - **512-96:** Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
42
+ in future. This model is targeted towards a forecasting setting of context length 512 and forecast length 96 and
43
+ recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). This model refers to the TTM-Q variant used in the paper. (branch name: main) [[Benchmark Scripts]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_512_96.ipynb)
44
+
45
+ - **1024-96:** Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
46
+ in future. This model is targeted towards a long forecasting setting of context length 1024 and forecast length 96 and
47
+ recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-96-v1) [[Benchmark Scripts]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
48
+
49
+ - **New Releases (trained on larger pretraining datasets, released on October 2024)**:
50
+
51
+ - **512-96-r2**: Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
52
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
53
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). This model refers to the TTM-B variant used in the paper (branch name: 512-96-r2) [[Benchmark Scripts]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/hfdemo/tinytimemixer/ttm_v2_benchmarking_512_96.ipynb)
54
+
55
+
56
+
57
+
58
+ ## Model Capabilities with example scripts
59
 
60
+ The below model scripts can be used for any of the above TTM models. Please update the HF model URL and branch name in the `from_pretrained` call appropriately to pick the model of your choice.
 
 
 
 
61
 
62
+ - Getting Started [[colab]](https://colab.research.google.com/github/IBM/tsfm/blob/main/notebooks/tutorial/ttm_tutorial.ipynb)
63
+ - Zeroshot Multivariate Forecasting [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/hfdemo/ttm_getting_started.ipynb)
64
+ - Finetuned Multivariate Forecasting:
65
+ - Channel-Independent Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/hfdemo/ttm_getting_started.ipynb) [M4-Hourly finetuning](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
66
+ - Channel-Mix Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/tutorial/ttm_channel_mix_finetuning.ipynb)
67
+ - **New Releases (extended features released on October 2024)**
68
+ - Finetuning and Forecasting with Exogenous/Control Variables [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
69
+ - Finetuning and Forecasting with static categorical features [Example: To be added soon]
70
+ - Rolling Forecasts - Extend forecast lengths beyond 96 via rolling capability [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/hfdemo/ttm_rolling_prediction_getting_started.ipynb)
71
+ - Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/ttm_v2_release/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
72
+
73
+ ## Benchmarks
74
+
75
+ TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag-Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/fewshot forecasting while reducing computational requirements significantly.
76
+ Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider
77
+ adoption in resource-constrained environments. For more details, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) TTM-Q referred in the paper maps to the `512-96` model
78
+ uploaded in the main branch, and TTM-B referred in the paper maps to the `512-96-r2` model. Please note that the Granite TTM models are pre-trained exclusively on datasets
79
+ with clear commercial-use licenses that are approved by our legal team. As a result, the pre-training dataset used in this release differs slightly from the one used in the research
80
+ paper, which may lead to minor variations in model performance as compared to the published results. Please refer to our paper for more details.
81
 
82
  ## Recommended Use
83
  1. Users have to externally standard scale their data independently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
 
85
  3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will
86
  impact the model performance.
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  ## Model Description
90
 
 
102
  Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our
103
  getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
104
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ## Model Details
107
 
108
+ For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
109
 
110
  TTM-1 currently supports 2 modes:
111
 
 
120
  Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
121
  time-series variates, a critical capability lacking in existing counterparts.
122
 
123
+ In addition, TTM also supports exogenous infusion and categorical data infusion.
 
124
 
125
 
 
 
126
  ### Model Sources
127
 
128
+ - **Repository:** https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer
129
+ - **Paper:** https://arxiv.org/pdf/2401.03955.pdf
 
130
 
 
 
 
131
 
132
+ ### Blogs and articles on TTM:
133
+ - Refer to our [wiki](https://github.com/ibm-granite/granite-tsfm/wiki)
134
+
135
+
136
  ## Uses
137
 
138
  ```
 
187
  - US Births: https://zenodo.org/records/4656049
188
  - Wind Farms Production data: https://zenodo.org/records/4654858
189
  - Wind Power: https://zenodo.org/records/4656032
190
+ - [to be updated]
191
 
192
 
193
  ## Citation [optional]
 
197
  **BibTeX:**
198
 
199
  ```
200
+ @inproceedings{ekambaram2024tinytimemixersttms,
201
+ title={Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series},
202
  author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
203
+ booktitle={Advances in Neural Information Processing Systems (NeurIPS 2024)},
204
  year={2024},
 
 
 
205
  }
206
  ```
207
 
 
 
 
 
 
208
  ## Model Card Authors
209
 
210
+ Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam
211
 
212
 
213
  ## IBM Public Repository Disclosure: