amazon-science · Sep 6, 2024 · Sep 6, 2024 · Sep 6, 2024 · Sep 6, 2024 · Sep 6, 2024
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,45 @@
+name: Build Documentation
+
+on:
+  push:
+    branches:
+      - main
+    tags:
+      - "**"
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  build-docs:
+    concurrency: ci-${{ github.ref }}
+    name: Build docs
+    runs-on: "ubuntu-latest"
+    defaults:
+      run:
+        shell: bash -l {0}
+
+    steps:
+      # Grap the latest commit from the branch
+      - name: Checkout the branch
+        uses: actions/checkout@v3.5.2
+        with:
+          persist-credentials: false
+
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.11
+
+      - name: Install Hatch
+        uses: pypa/hatch@install
+
+      - name: Build and deploy the documentation
+        run: hatch run docs:deploy
+
+      - name: Deploy Page 🚀
+        uses: JamesIves/github-pages-deploy-action@v4.4.1
+        with:
+          branch: gh-pages
+          folder: site
diff --git a/.github/workflows/test_docs.yml b/.github/workflows/test_docs.yml
@@ -0,0 +1,35 @@
+name: Test Documentation
+
+on:
+  pull_request:
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  test-docs:
+    concurrency: ci-${{ github.ref }}
+    name: Test docs
+    runs-on: "ubuntu-latest"
+    defaults:
+      run:
+        shell: bash -l {0}
+
+    steps:
+      # Grap the latest commit from the branch
+      - name: Checkout the branch
+        uses: actions/checkout@v3.5.2
+        with:
+          persist-credentials: false
+
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.11
+
+      - name: Install Hatch
+        uses: pypa/hatch@install
+
+      - name: Build the documentation
+        run: hatch run docs:build
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -13,13 +13,14 @@ jobs:
       matrix:
         # Select the Python versions to test against
         os: ["ubuntu-latest", "macos-latest"]
-        python-version: ["3.10", "3.11"]
+        python-version: ["3.10", "3.11", "3.12"]
       fail-fast: true
     steps:
       - name: Check out the code
         uses: actions/checkout@v3.5.2
         with:
           fetch-depth: 1
+
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v4
         with:

diff --git a/.gitignore b/.gitignore
@@ -147,4 +147,5 @@ scratch_nbs/
 .DS_store
 package.json
 package-lock.json
-node_modules/
+node_modules/
+docs/_examples
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,21 @@
+repos:
+  # python code formatting
+  - repo: https://github.com/psf/black
+    rev: 23.12.1
+    hooks:
+      - id: black
+        args: ["--config", "pyproject.toml"]
+
+  # python import sorting
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+        args: ["--settings-path", "pyproject.toml"]
+
+  # remove notebook cell output
+  - repo: https://github.com/kynan/nbstripout
+    rev: 0.7.1
+    hooks:
+      - id: nbstripout
+        files: ".ipynb"
diff --git a/README.md b/README.md
@@ -1,6 +1,9 @@
 # Causal Validation
 
-This package provides functionality to define your own causal data generation process and then simulate data from the process. Within the package, there is functionality to include complex components to your process, such as periodic and temporal trends, and all of these operations are fully composable with one another. 
+This package provides functionality to define your own causal data generation process
+and then simulate data from the process. Within the package, there is functionality to
+include complex components to your process, such as periodic and temporal trends, and
+all of these operations are fully composable with one another. 
 
 A short example is given below
 ```python
@@ -38,9 +41,12 @@ plot(inflated_data)
 
 ## Examples
 
-To supplement the above example, we have two more detailed notebooks which exhaustively present and explain the functionalty in this package, along with how the generated data may be integrated with [AZCausal](https://github.com/amazon-science/azcausal).
-1. [Basic notebook](): We here show the full range of available functions for data generation
-2. [AZCausal notebook](): We here show how the generated data may be used within an AZCausal model.
+To supplement the above example, we have two more detailed notebooks which exhaustively
+present and explain the functionalty in this package, along with how the generated data
+may be integrated with [AZCausal](https://github.com/amazon-science/azcausal).
+1. [Data Synthesis](https://amazon-science.github.io/causal-validation/examples/basic/): We here show the full range of available functions for data generation.
+2. [Placebo testing](https://amazon-science.github.io/causal-validation/examples/placebo_test/): Validate your model(s) using placebo tests.
+3. [AZCausal notebook](https://amazon-science.github.io/causal-validation/examples/azcausal/): We here show how the generated data may be used within an AZCausal model.
 
 ## Installation
 
@@ -70,4 +76,4 @@ in your terminal.
 1. Follow steps 1-3 from `For Users`
 2. Create a hatch environment `hatch env create`
 3. Open a hatch shell `hatch shell`
-4. Validate your installation by running `hatch run tests:test`
+4. Validate your installation by running `hatch run dev:test`
diff --git a/docs/examples/azcausal.ipynb b/docs/examples/azcausal.ipynb
@@ -0,0 +1,205 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0",
+   "metadata": {},
+   "source": [
+    "# AZCausal Integration\n",
+    "\n",
+    "Amazon's [AZCausal](https://github.com/amazon-science/azcausal) library provides the\n",
+    "functionality to fit synthetic control and difference-in-difference models to your\n",
+    "data. Integrating the synthetic data generating process of `causal_validation` with\n",
+    "AZCausal is trivial, as we show in this notebook. To start, we'll simulate a toy\n",
+    "dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azcausal.estimators.panel.sdid import SDID\n",
+    "import scipy.stats as st\n",
+    "\n",
+    "from causal_validation import (\n",
+    "    Config,\n",
+    "    simulate,\n",
+    ")\n",
+    "from causal_validation.effects import StaticEffect\n",
+    "from causal_validation.plotters import plot\n",
+    "from causal_validation.transforms import (\n",
+    "    Periodic,\n",
+    "    Trend,\n",
+    ")\n",
+    "from causal_validation.transforms.parameter import UnitVaryingParameter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cfg = Config(\n",
+    "    n_control_units=10,\n",
+    "    n_pre_intervention_timepoints=60,\n",
+    "    n_post_intervention_timepoints=30,\n",
+    "    seed=123,\n",
+    ")\n",
+    "\n",
+    "linear_trend = Trend(degree=1, coefficient=0.05)\n",
+    "data = linear_trend(simulate(cfg))\n",
+    "ax = plot(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3",
+   "metadata": {
+    "title": "We'll now simulate a 5% lift in the treatment group's observations. This"
+   },
+   "source": [
+    "will inflate the treated group's observations in the post-intervention window."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "TRUE_EFFECT = 0.05\n",
+    "effect = StaticEffect(effect=TRUE_EFFECT)\n",
+    "inflated_data = effect(data)\n",
+    "ax = plot(inflated_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5",
+   "metadata": {},
+   "source": [
+    "## Fitting a model\n",
+    "\n",
+    "We now have some very toy data on which we may apply a model. For this demonstration\n",
+    "we shall use the Synthetic Difference-in-Differences model implemented in AZCausal;\n",
+    "however, the approach shown here will work for any model implemented in AZCausal. To\n",
+    "achieve this, we must first coerce the data into a format that is digestible for\n",
+    "AZCausal. Through the `.to_azcausal()` method implemented here, this is\n",
+    "straightforward to achieve. Once we have a AZCausal compatible dataset, the modelling\n",
+    "is very simple by virtue of the clean design of AZCausal."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "panel = inflated_data.to_azcausal()\n",
+    "model = SDID()\n",
+    "result = model.fit(panel)\n",
+    "print(f\"Delta: {TRUE_EFFECT - result.effect.percentage().value / 100}\")\n",
+    "print(result.summary(title=\"Synthetic Data Experiment\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7",
+   "metadata": {
+    "title": "We see that SDID has done an excellent job of estimating the treatment"
+   },
+   "source": [
+    "We see that SDID has done an excellent job of estimating the treatment effect.  However, given the simplicity of the data, this is not surprising. With the\n",
+    "functionality within this package though we can easily construct more complex datasets\n",
+    "in effort to fully stress-test any new model and identify its limitations.\n",
+    "\n",
+    "To achieve this, we'll simulate 10 control units, 60 pre-intervention time points, and\n",
+    "30 post-intervention time points according to the following process: \n",
+    "\n",
+    "$$ \\begin{align}\n",
+    "\\mu_{n, t} & \\sim\\mathcal{N}(20, 0.5^2)\\\\\n",
+    "\\alpha_{n} & \\sim \\mathcal{N}(0, 1^2)\\\\\n",
+    "\\beta_{n} & \\sim \\mathcal{N}(0.05, 0.01^2)\\\\\n",
+    "\\nu_n & \\sim \\mathcal{N}(1, 1^2)\\\\\n",
+    "\\gamma_n & \\sim \\operatorname{Student-t}_{10}(1, 1^2)\\\\\n",
+    "\\mathbf{Y}_{n, t} & = \\mu_{n, t} + \\alpha_{n} + \\beta_{n}t + \\nu_n\\sin\\left(3\\times\n",
+    "2\\pi t + \\gamma\\right) + \\delta_{t, n} \\end{align} $$ \n",
+    "\n",
+    "where the true treatment effect\n",
+    "$\\delta_{t, n}$ is 5% when $n=1$ and $t\\geq 60$ and 0 otherwise. Meanwhile,\n",
+    "$\\mathbf{Y}$ is the matrix of observations, long in the number of time points and wide\n",
+    "in the number of units."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cfg = Config(\n",
+    "    n_control_units=10,\n",
+    "    n_pre_intervention_timepoints=60,\n",
+    "    n_post_intervention_timepoints=30,\n",
+    "    global_mean=20,\n",
+    "    global_scale=1,\n",
+    "    seed=123,\n",
+    ")\n",
+    "\n",
+    "intercept = UnitVaryingParameter(sampling_dist=st.norm(loc=0.0, scale=1))\n",
+    "coefficient = UnitVaryingParameter(sampling_dist=st.norm(loc=0.05, scale=0.01))\n",
+    "linear_trend = Trend(degree=1, coefficient=coefficient, intercept=intercept)\n",
+    "\n",
+    "amplitude = UnitVaryingParameter(sampling_dist=st.norm(loc=1.0, scale=2))\n",
+    "shift = UnitVaryingParameter(sampling_dist=st.t(df=10))\n",
+    "periodic = Periodic(amplitude=amplitude, shift=shift, frequency=3)\n",
+    "\n",
+    "data = effect(periodic(linear_trend(simulate(cfg))))\n",
+    "ax = plot(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9",
+   "metadata": {},
+   "source": [
+    "As before, we may now go about estimating the treatment. However, this time we see that the delta between the estimated and true effect is much larger than\n",
+    "before."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "panel = data.to_azcausal()\n",
+    "model = SDID()\n",
+    "result = model.fit(panel)\n",
+    "print(f\"Delta: {100*(TRUE_EFFECT - result.effect.percentage().value / 100): .2f}%\")\n",
+    "print(result.summary(title=\"Synthetic Data Experiment\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "title,-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}