Skip to content

Commit 13a3381

Browse files
committed
Organized tests.
1 parent 06b7f72 commit 13a3381

File tree

5 files changed

+46
-10
lines changed

5 files changed

+46
-10
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
.DS_Store
22
*.pyc
33
chromedriver.log
4+
selenium_crawler.egg-info

README.md

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ accomplish some of the listed tasks above, but it has a number of limitations:
2525
* Selenium has most of the code that would be needed already built in.
2626

2727
**Depending on Selenium DOES NOT mean that your crawling servers will need to
28-
also run a GUI. Selenium can run in a headless environment. Look this up for
29-
more information.**
28+
also run a GUI. Selenium can run in a headless environment. See below for more
29+
information.**
3030

3131
Quickstart
3232
==========
3333

34-
```
34+
```bash
3535
pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler
3636
```
3737

@@ -51,7 +51,7 @@ article. It will print the following:
5151
}
5252
```
5353

54-
Where {{HTMLSOURCE}} is the actual HTML of the article.
54+
Where `{{HTMLSOURCE}}` is the actual HTML of the article.
5555

5656
Creating test cases
5757
===================
@@ -139,10 +139,31 @@ Parsed ./sites/hnews/hnews_raw.py.
139139
Parsed ./sites/reddit/reddit_raw.py.
140140
```
141141

142-
What's next?
142+
Don't worry if the paths are different for your installation. Keep in mind that
143+
`makeparsed.py` only has to be run when site scripts have either been changed
144+
or added.
145+
146+
Headless configuration
147+
======================
148+
149+
Running headless means that no actual GUI will be running on a monitor during
150+
use. Put simply, it means that no browser window will pop up when handling a
151+
URL. One way to run headless is through the use of xvfb, a tool used to set up
152+
virtual framebuffers. Run this before using selenium-crawler:
153+
154+
```bash
155+
sh -e /etc/init.d/xvfb start
156+
export DISPLAY=:99.0
157+
```
158+
159+
This is the method that CI systems like Travis-CI and CircleCI recommend. There
160+
are other methods of running Selenium in a headless environment. Do a quick
161+
Google search for more information.
162+
163+
Contributing
143164
============
144165

145-
Selenium-crawler is still in a very early testing stage. You might not even call
146-
it that. I still need to test with a variety of different Selenium test cases to
147-
make sure my parsing is robust enough.
166+
Contributing is easy. If you write any new site handling scripts, just be sure
167+
to follow the guide above and write a quick test for it in `test_all.py`. Just
168+
send in a pull request and I'll get a `CONTRIBUTORS.txt` file going.
148169

seleniumcrawler/tests/__init__.py

Whitespace-only changes.
File renamed without changes.

setup.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,30 @@ def read(fname):
1414
setup(
1515
name = "selenium-crawler",
1616
version = "0.1.0",
17+
packages = find_packages(),
18+
19+
# Project uses reStructuredText, so ensure that the docutils get
20+
# installed or upgraded on the target machine
21+
install_requires=required,
22+
23+
package_data = {
24+
# If any package contains *.txt or *.rst files, include them:
25+
'': ['*.txt', '*.rst'],
26+
# And include any *.msg files found in the 'hello' package, too:
27+
'hello': ['*.msg'],
28+
},
29+
30+
# metadata for upload to PyPI
1731
author = "Cory Walker",
1832
author_email = "[email protected]",
1933
description = ("Sometimes sites make crawling hard. Selenium-crawler uses "
2034
"Selenium automation to fix that."),
2135
license = "LICENSE.txt",
2236
keywords = "selenium crawling crawl automate ads landing",
2337
url = "https://github.com/cmwslw/selenium-crawler",
24-
packages=find_packages(),
38+
2539
long_description=read('README.md'),
26-
install_requires=required,
40+
test_suite = "seleniumcrawler.tests.test_all",
2741
classifiers=[
2842
"Development Status :: 3 - Alpha",
2943
"Topic :: Utilities",

0 commit comments

Comments
 (0)