You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: artifacts/table6/README.md
+42Lines changed: 42 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -37,3 +37,45 @@ The profiling results shown in Table 6 are based on [NVIDIA Nsight Compute (ncu)
37
37
In the output file of the profile results, you will find the memory traffic behavior of the kernel of interest. You can then further process and analyze these results.
38
38
39
39
We cannot pre-assign names due to libraries like Triton having internal implementations that call extra kernels. Filtering based on names is not feasible. To address this, we run profiling multiple times (e.g., three) to observe log outputs, then run the tested program several times (e.g., five) to identify patterns. This helps us pinpoint actual kernel calls and post-process the ncu profiling logs to compute network traffic over the memory hierarchy.
40
+
41
+
### Run the test
42
+
43
+
We have prepared a testing environment on the provided server to run the tests.
44
+
45
+
>The following command should be executed in the `artifacts` directory of the project, instead of in the `table6` directory.
46
+
47
+
1. [run_all_ncu_cutlass.sh](../run_all_ncu_cutlass.sh) is used to run the testforflash attention2 implmentedin CuTlass.
48
+
49
+
```bash
50
+
sudo -i # Switch to root account
51
+
cd /home/sosp/nnfusion/artifacts
52
+
./run_all_ncu_cutlass.sh
53
+
```
54
+
55
+
2. [run_all_ncu_flash2.sh](../run_all_ncu_pt.sh) is used to run the testforflash attention2 implementedin PyTorch.
56
+
57
+
```bash
58
+
sudo -i # Switch to root account
59
+
cd /home/sosp/nnfusion/artifacts
60
+
# Choose the environment you want to test
61
+
source /home/sosp/env/torch_env.sh
62
+
./run_all_ncu_flash2.sh
63
+
```
64
+
65
+
3. [run_all_ncu_ft.sh](../run_all_ncu_ft.sh) is used to run the test bigbird and flash attention implemented in FractalTensor.
66
+
67
+
```bash
68
+
sudo -i # Switch to root account
69
+
cd /home/sosp/nnfusion/artifacts
70
+
./run_all_ncu_pt.sh
71
+
```
72
+
73
+
4. [run_all_ncu_pt.sh](../run_all_ncu_pt.sh) is used to run the testforbigbird and flash attention implementedin PyTorch.
0 commit comments