Skip to content

Commit 9159d70

Browse files
committed
update README.
1 parent 95a4e06 commit 9159d70

File tree

2 files changed

+42
-0
lines changed

2 files changed

+42
-0
lines changed

artifacts/run_all_ncu_cutlass.sh

100644100755
File mode changed.

artifacts/table6/README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,45 @@ The profiling results shown in Table 6 are based on [NVIDIA Nsight Compute (ncu)
3737
In the output file of the profile results, you will find the memory traffic behavior of the kernel of interest. You can then further process and analyze these results.
3838

3939
We cannot pre-assign names due to libraries like Triton having internal implementations that call extra kernels. Filtering based on names is not feasible. To address this, we run profiling multiple times (e.g., three) to observe log outputs, then run the tested program several times (e.g., five) to identify patterns. This helps us pinpoint actual kernel calls and post-process the ncu profiling logs to compute network traffic over the memory hierarchy.
40+
41+
### Run the test
42+
43+
We have prepared a testing environment on the provided server to run the tests.
44+
45+
>The following command should be executed in the `artifacts` directory of the project, instead of in the `table6` directory.
46+
47+
1. [run_all_ncu_cutlass.sh](../run_all_ncu_cutlass.sh) is used to run the test for flash attention2 implmented in CuTlass.
48+
49+
```bash
50+
sudo -i # Switch to root account
51+
cd /home/sosp/nnfusion/artifacts
52+
./run_all_ncu_cutlass.sh
53+
```
54+
55+
2. [run_all_ncu_flash2.sh](../run_all_ncu_pt.sh) is used to run the test for flash attention2 implemented in PyTorch.
56+
57+
```bash
58+
sudo -i # Switch to root account
59+
cd /home/sosp/nnfusion/artifacts
60+
# Choose the environment you want to test
61+
source /home/sosp/env/torch_env.sh
62+
./run_all_ncu_flash2.sh
63+
```
64+
65+
3. [run_all_ncu_ft.sh](../run_all_ncu_ft.sh) is used to run the test bigbird and flash attention implemented in FractalTensor.
66+
67+
```bash
68+
sudo -i # Switch to root account
69+
cd /home/sosp/nnfusion/artifacts
70+
./run_all_ncu_pt.sh
71+
```
72+
73+
4. [run_all_ncu_pt.sh](../run_all_ncu_pt.sh) is used to run the test for bigbird and flash attention implemented in PyTorch.
74+
75+
```bash
76+
sudo -i # Switch to root account
77+
cd /home/sosp/nnfusion/artifacts
78+
# Choose the environment you want to test
79+
source /home/sosp/env/torch_env.sh
80+
./run_all_ncu_pt.sh
81+
```

0 commit comments

Comments
 (0)