Skip to content

Commit e0e0131

Browse files
committed
update readme
1 parent 77e024d commit e0e0131

File tree

6 files changed

+631
-529
lines changed

6 files changed

+631
-529
lines changed

Pipfile

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,8 @@ numpy = "==1.26"
1313
pandas = "*"
1414
nbdev = "*"
1515
fastcore = "*"
16-
kaggle = ">=1.5"
1716
matplotlib = "*"
1817
seaborn = "*"
19-
scipy = "*"
20-
scikit-learn = "*"
2118
torch = "==2.0.0"
2219
python-dotenv = "*"
2320
pre-commit = "*"
@@ -26,6 +23,7 @@ hydra-core = "*"
2623
[dev-packages]
2724
ipykernel = "*"
2825
ts-vae-lstm = {editable = true, path = "."}
26+
nbconvert = "*"
2927

3028
[requires]
3129
python_version = "3.9"

Pipfile.lock

Lines changed: 443 additions & 390 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 23 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,37 @@
1-
# TS VAE-LSTM
1+
## TS VAE-LSTM
22

3+
Implementation of the paper [Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model](https://ieeexplore.ieee.org/document/9053558)
34

4-
> Implementation of the paper [Anomaly Detection for Time Series Using
5-
> VAE-LSTM Hybrid Model](https://ieeexplore.ieee.org/document/9053558)
65

7-
This is a work in progress.
6+
#### Usage
7+
> Hydra configurations to reproduce the results provided in `config`.
88
9-
#### TODO
10-
11-
- [ ] Separate training from notebooks
12-
- [ ] Fix github-actions
13-
- [ ] Page deployment
14-
- [ ] CI tests
15-
- [x] precommit
16-
- [x] Build complete AD pipeline
17-
- [x] include fine-grained threshold with quantile for within window
18-
detection.
19-
- [x] use a squared term if the absolute element-wise error falls below
20-
delta and a delta-scaled L1 term otherwise (Huber)
21-
- [x] Use dotenv `.env` to manage paths
22-
- [ ] Plot has a shift of 21 due to remainder -\> todo
23-
24-
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
25-
26-
#### Installation
27-
28-
``` sh
29-
pip install ts_vae_lstm
30-
```
9+
Clone. After setting up the environment with `pipenv` or others, update the paths (hydra) to the datasets and hyperparams for experimentation.
10+
run the scripts in order: `train_vae.py` -> `train_lstm.py` to generate the prerequisite models.
11+
Followed by the inference script: `run_ad.py` to generate the plots and logs. All outs will be written to `models/`.
3112

3213
#### Results from NYC Traffic dataset
3314

34-
At time $t$, past $k$ window(s) of length $p=48$ are taken. The VAE-LSTM
35-
reconstructs the past windows and if the true time series deviates from
36-
the reconstructed time series, the $k^{th}$ window is marked as an
37-
“anomalous window”.
15+
At time $t$, past $k$ window(s) of length $p=48$ are taken. The VAE-LSTM reconstructs the past windows and if the true time series deviates from the reconstructed time series, the $k^{th}$ window is marked as an "anomalous window".
3816

39-
VAE-LSTM is trained on a time series without anomalies so any deviation
40-
beyond the 90th quantile of reconstruction error (L2 norm) is considered
41-
an anomaly.
17+
VAE-LSTM is trained on a time series without anomalies so any deviation beyond the 90th quantile of reconstruction error (L2 norm) is considered an anomaly.
4218

43-
In the figure (`sample_data/result_granular.png`), blue lines represent
44-
the unseen data. Orange lines correspond to the reconstructed data. Red
45-
dashed lines are the true labels in the unseen set. Green window is the
46-
region where anomaly was predicted. Green line is the first time anomaly
47-
was flagged in the window.
19+
In the figure, blue lines represent the unseen data. Orange lines correspond to the reconstructed data. Red dashed lines are the true labels in the unseen set. Green window is the region where anomaly was predicted. Green line is the first time anomaly was flagged in the window.
4820

49-
![](sample_data/result_granular.png)
21+
![](./models/ad_result_z24_lstm_1733682851.4265444.png)
5022

5123
## Misc
5224

53-
### Env variables
5425

55-
``` bash
56-
BASEDIR='<your-base-path>/ts_vae-lstm'
57-
MODELDIR=${BASEDIR}/models
58-
VAE_MODEL=${MODELDIR}/<best-vae-model>.pth
59-
LSTM_MODEL=${MODELDIR}/<best-lstm-model>.pth
60-
```
61-
62-
### CUDA setup
63-
64-
Download the driver and cuda version compiled for the driver.
65-
66-
``` bash
67-
sudo mhwd -i pci video-nvidia-470xx
68-
sudo pacman -U https://archive.archlinux.org/packages/c/cuda/cuda-11.4.2-1-x86_64.pkg.tar.zst
69-
```
26+
#### TODO
27+
- [x] Training and inference scripts (#1)
28+
- [x] Separate training from notebooks
29+
- [ ] Fix github-actions
30+
- [ ] Page deployment
31+
- [ ] CI tests
32+
- [x] precommit
33+
- [x] Build complete AD pipeline
34+
- [x] include fine-grained threshold with quantile for within window detection.
35+
- [x] use a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise (Huber)
36+
- [x] Use dotenv `.env` to manage paths
37+
- [x] Plot has a shift of 21 due to remainder -> todo

nbs/03_ad_complete.ipynb

Lines changed: 142 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -67,34 +67,7 @@
6767
"cell_type": "code",
6868
"execution_count": 5,
6969
"metadata": {},
70-
"outputs": [
71-
{
72-
"name": "stdout",
73-
"output_type": "stream",
74-
"text": [
75-
"t\n",
76-
"t_unit\n",
77-
"readings\n",
78-
"idx_anomaly\n",
79-
"idx_split\n",
80-
"training\n",
81-
"test\n",
82-
"train_m\n",
83-
"train_std\n",
84-
"t_train\n",
85-
"t_test\n",
86-
"idx_anomaly_test\n"
87-
]
88-
},
89-
{
90-
"name": "stderr",
91-
"output_type": "stream",
92-
"text": [
93-
"/home/gg/.local/share/virtualenvs/ts_vae-lstm-hz-Oy2CQ/lib/python3.9/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)\n",
94-
" return torch._C._cuda_getDeviceCount() > 0\n"
95-
]
96-
}
97-
],
70+
"outputs": [],
9871
"source": [
9972
"# | export\n",
10073
"from ts_vae_lstm.vae import VAE, Encoder, Decoder, StochasticSampler\n",
@@ -108,47 +81,172 @@
10881
"cell_type": "code",
10982
"execution_count": 6,
11083
"metadata": {},
84+
"outputs": [],
85+
"source": [
86+
"# for configs\n",
87+
"from hydra import compose, initialize\n",
88+
"from omegaconf import OmegaConf\n",
89+
"from fastcore.xtras import Path\n",
90+
"import os\n",
91+
"import glob"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": 7,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"# run only once\n",
101+
"try:\n",
102+
" initialize(config_path=\"../config\", version_base=\"1.2\")\n",
103+
" cfg = compose(config_name=\"config.yaml\")\n",
104+
" cfg = OmegaConf.to_object(cfg) # perform interpolation of the variables also\n",
105+
" cfg = OmegaConf.create(cfg) # so that dot-notation works?\n",
106+
" cfg.base_dir = \"..\" # to make it work in the notebook\n",
107+
"except Exception as e:\n",
108+
" print(f\"Got Exception while reading config:\\n{e}\")"
109+
]
110+
},
111+
{
112+
"cell_type": "code",
113+
"execution_count": 8,
114+
"metadata": {},
111115
"outputs": [
112116
{
113117
"name": "stdout",
114118
"output_type": "stream",
115119
"text": [
116-
"LSTM model: /run/media/data2/ts_vae-lstm/models/lstm_100_val0.81.pth\n",
117-
"VAE model: /run/media/data2/ts_vae-lstm/models/vae_100_z24.pth\n"
120+
"Number of workers: 4\n"
118121
]
119122
}
120123
],
121124
"source": [
122-
"load_dotenv()\n",
123-
"\n",
124-
"BASEDIR = os.getenv(\"BASEDIR\")\n",
125-
"MODELDIR = os.getenv(\"MODELDIR\")\n",
126-
"VAE_MODEL = os.getenv(\"VAE_MODEL\")\n",
127-
"LSTM_MODEL = os.getenv(\"LSTM_MODEL\")\n",
128-
"\n",
129-
"print(f\"LSTM model: {LSTM_MODEL}\\nVAE model: {VAE_MODEL}\")"
125+
"num_workers = cfg.num_workers if cfg.get(\"num_workers\", None) else os.cpu_count()\n",
126+
"print(f\"Number of workers: {num_workers}\")"
130127
]
131128
},
132129
{
133130
"cell_type": "code",
134-
"execution_count": 7,
131+
"execution_count": 9,
135132
"metadata": {},
136133
"outputs": [
137134
{
138135
"data": {
139136
"text/plain": [
140-
"(4, 'cpu')"
137+
"'cuda'"
141138
]
142139
},
143-
"execution_count": 7,
140+
"execution_count": 9,
144141
"metadata": {},
145142
"output_type": "execute_result"
146143
}
147144
],
148145
"source": [
149-
"num_workers = os.cpu_count()\n",
150-
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
151-
"num_workers, device"
146+
"device = cfg.device if cfg.device else (\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
147+
"device"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": 10,
153+
"metadata": {},
154+
"outputs": [
155+
{
156+
"data": {
157+
"text/plain": [
158+
"'..'"
159+
]
160+
},
161+
"execution_count": 10,
162+
"metadata": {},
163+
"output_type": "execute_result"
164+
}
165+
],
166+
"source": [
167+
"cfg.base_dir"
168+
]
169+
},
170+
{
171+
"cell_type": "code",
172+
"execution_count": 11,
173+
"metadata": {},
174+
"outputs": [
175+
{
176+
"name": "stdout",
177+
"output_type": "stream",
178+
"text": [
179+
"Base directory: /run/media/data2/ts_vae-lstm\n",
180+
"Model directory: /run/media/data2/ts_vae-lstm/models\n",
181+
"Dataset is /run/media/data2/ts_vae-lstm/sample_data/nyc_taxi.npz\n",
182+
"VAE model: /run/media/data2/ts_vae-lstm/models/best_vae_*_z24_*.pth\n",
183+
"LSTM model: /run/media/data2/ts_vae-lstm/models/best_lstm_*_z24_*.pth\n"
184+
]
185+
}
186+
],
187+
"source": [
188+
"BASEDIR = Path(cfg.base_dir).resolve()\n",
189+
"MODELDIR = Path(\".\" + cfg.model_dir).resolve() # to move to project root\n",
190+
"DATAPATH = Path(\".\" + cfg.dataset.path).resolve() # to move to project root\n",
191+
"VAE_MODEL = Path(\".\" + cfg.vae_path).resolve() # to move to project root\n",
192+
"LSTM_MODEL = Path(\".\" + cfg.lstm_path).resolve()\n",
193+
"print(f\"Base directory: {BASEDIR}\")\n",
194+
"print(f\"Model directory: {MODELDIR}\")\n",
195+
"print(f\"Dataset is {DATAPATH}\")\n",
196+
"print(f\"VAE model: {VAE_MODEL}\")\n",
197+
"print(f\"LSTM model: {LSTM_MODEL}\")\n"
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": 12,
203+
"metadata": {},
204+
"outputs": [
205+
{
206+
"name": "stdout",
207+
"output_type": "stream",
208+
"text": [
209+
"/run/media/data2/ts_vae-lstm/models/best_vae_100_z24_1733051559.pth\n"
210+
]
211+
}
212+
],
213+
"source": [
214+
"if cfg.pattern:\n",
215+
" paths = glob.glob(f\"{VAE_MODEL}\")\n",
216+
" latest_path = paths[0]\n",
217+
" latest_time = 0\n",
218+
" for path in paths:\n",
219+
" if os.path.getmtime(path) > latest_time:\n",
220+
" latest_path = path\n",
221+
" latest_time = os.path.getmtime(path)\n",
222+
" VAE_MODEL = latest_path\n",
223+
" print(VAE_MODEL)"
224+
]
225+
},
226+
{
227+
"cell_type": "code",
228+
"execution_count": 13,
229+
"metadata": {},
230+
"outputs": [
231+
{
232+
"name": "stdout",
233+
"output_type": "stream",
234+
"text": [
235+
"/run/media/data2/ts_vae-lstm/models/best_lstm_100_z24_1733058653.pth\n"
236+
]
237+
}
238+
],
239+
"source": [
240+
"if cfg.pattern:\n",
241+
" paths = glob.glob(f\"{LSTM_MODEL}\")\n",
242+
" latest_path = paths[0]\n",
243+
" latest_time = 0\n",
244+
" for path in paths:\n",
245+
" if os.path.getmtime(path) > latest_time:\n",
246+
" latest_path = path\n",
247+
" latest_time = os.path.getmtime(path)\n",
248+
" LSTM_MODEL = latest_path\n",
249+
" print(LSTM_MODEL)"
152250
]
153251
},
154252
{

0 commit comments

Comments
 (0)