@@ -25,13 +25,13 @@ accomplish some of the listed tasks above, but it has a number of limitations:
25
25
* Selenium has most of the code that would be needed already built in.
26
26
27
27
** Depending on Selenium DOES NOT mean that your crawling servers will need to
28
- also run a GUI. Selenium can run in a headless environment. Look this up for
29
- more information.**
28
+ also run a GUI. Selenium can run in a headless environment. See below for more
29
+ information.**
30
30
31
31
Quickstart
32
32
==========
33
33
34
- ```
34
+ ``` bash
35
35
pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler
36
36
```
37
37
@@ -51,7 +51,7 @@ article. It will print the following:
51
51
}
52
52
```
53
53
54
- Where {{HTMLSOURCE}} is the actual HTML of the article.
54
+ Where ` {{HTMLSOURCE}} ` is the actual HTML of the article.
55
55
56
56
Creating test cases
57
57
===================
@@ -139,10 +139,31 @@ Parsed ./sites/hnews/hnews_raw.py.
139
139
Parsed ./sites/reddit/reddit_raw.py.
140
140
```
141
141
142
- What's next?
142
+ Don't worry if the paths are different for your installation. Keep in mind that
143
+ ` makeparsed.py ` only has to be run when site scripts have either been changed
144
+ or added.
145
+
146
+ Headless configuration
147
+ ======================
148
+
149
+ Running headless means that no actual GUI will be running on a monitor during
150
+ use. Put simply, it means that no browser window will pop up when handling a
151
+ URL. One way to run headless is through the use of xvfb, a tool used to set up
152
+ virtual framebuffers. Run this before using selenium-crawler:
153
+
154
+ ``` bash
155
+ sh -e /etc/init.d/xvfb start
156
+ export DISPLAY=:99.0
157
+ ```
158
+
159
+ This is the method that CI systems like Travis-CI and CircleCI recommend. There
160
+ are other methods of running Selenium in a headless environment. Do a quick
161
+ Google search for more information.
162
+
163
+ Contributing
143
164
============
144
165
145
- Selenium-crawler is still in a very early testing stage. You might not even call
146
- it that. I still need to test with a variety of different Selenium test cases to
147
- make sure my parsing is robust enough .
166
+ Contributing is easy. If you write any new site handling scripts, just be sure
167
+ to follow the guide above and write a quick test for it in ` test_all.py ` . Just
168
+ send in a pull request and I'll get a ` CONTRIBUTORS.txt ` file going .
148
169
0 commit comments