549c861908
Bug fixé
2018-02-26 14:38:26 +01:00
517be1d822
Merge rdf branch
2018-02-26 14:11:06 +01:00
c4f63a92b2
Error in the merge, mea culpa
2018-02-26 14:01:29 +01:00
db067e56fc
Typo
2018-02-26 13:59:34 +01:00
33bdae96e4
merge commit from histories_tobast into histories_models
2018-02-26 12:59:38 +01:00
526aad1364
Add interests
2018-02-26 12:33:23 +01:00
02e91bb2b7
Fix function calls
2018-02-26 11:56:02 +01:00
3e5fc2f9b3
Fix search engine URL generation
2018-02-26 11:49:24 +01:00
45ddbff91a
Crawling and histories: fix a lot of stuff
2018-02-26 11:49:24 +01:00
e6d587bffd
Actually save to DB a created history
2018-02-26 11:49:24 +01:00
8baf408e02
Use dict from data/nicknames_dict for nicknames
2018-02-26 11:49:24 +01:00
6463e348ac
Fix populate.sh exec path
2018-02-26 11:48:51 +01:00
22064ebee3
Histories: xml import/export — untested
...
To be tested when history generation is available
2018-02-26 11:48:51 +01:00
a4de51b84a
Crawl: do not use global SEARCH_ENGINES
2018-02-26 11:48:51 +01:00
4f0148cb63
Crawler: use a random fingerprint
2018-02-26 11:48:51 +01:00
4a8bd32516
Fix tor_runner import
2018-02-26 11:48:51 +01:00
44cf26df8f
It can be useful to save a new object
2018-02-26 11:42:45 +01:00
adb892ab7d
Check if crawling a search engine
2018-02-26 11:12:36 +01:00
15db8b4697
Change option name due to downgrade of aiohttp
2018-02-26 10:23:32 +01:00
d6b26c0a46
Better use of history
2018-02-26 10:05:33 +01:00
8f5c4f3f0f
Use datetimes
2018-02-26 09:49:24 +01:00
71d9e18eec
Add headers support
2018-02-25 23:56:51 +01:00
8ad46c0481
Bug fix, syntax erro
2018-02-25 21:59:29 +01:00
f66c978466
Tor runner has a run function to replay the history
2018-02-25 21:53:28 +01:00
0a676a2f65
PEP8
2018-02-25 21:34:20 +01:00
e074d96f02
tor_runner can make requests
2018-02-25 21:27:15 +01:00
93b235cb6c
Fix interests import
2018-02-25 21:20:52 +01:00
ae5699c089
Basic tor runner
2018-02-25 19:42:58 +01:00
f7313ff659
Add populate.sh script
2018-02-25 16:16:04 +01:00
0661fe0f01
Fix path
2018-02-25 16:10:38 +01:00
4b19febdf6
Add interests
2018-02-25 16:10:22 +01:00
15323c3465
[REBASE ME] Crawl: enhance efficiency and output a tree
2018-02-25 15:08:06 +01:00
c3bcdea1eb
Add tentative export to RDF
2018-02-25 14:37:30 +01:00
05a2e2ca3f
Partial generation of profiles
2018-02-25 13:18:12 +01:00
d4aefb6bb7
Load the data
2018-02-25 13:17:44 +01:00
3eb82a4a0b
data for names and emails
2018-02-25 13:17:27 +01:00
7c0fb7dda1
Better naming
2018-02-25 11:49:44 +01:00
ee32e5385b
Finished data import
2018-02-25 11:49:11 +01:00
bc7348f677
Integration of crawl module in histories
2018-02-24 23:17:24 +01:00
60bfc8cb77
Merge branch 'crawl' into histories_models
2018-02-24 18:44:27 +01:00
12c8c652d7
Serialisation function
2018-02-24 18:40:27 +01:00
c58f42476f
Missing script for 854481d
2018-02-24 17:22:52 +01:00
854481dbd3
Import utilities
2018-02-24 17:21:41 +01:00
d19c2e8216
Add mailto adresses to forbidden list
2018-02-24 15:41:46 +01:00
e56c088632
Better filter
2018-02-24 11:39:04 +01:00
2732e4115f
Add RDF models export classes — untested
...
Also add a dependency to https://github.com/tobast/RDFSerializer/
2018-02-23 13:32:32 +01:00
f0b8672c89
Silly me. (bis)
2018-02-23 10:44:51 +01:00
f6da179820
If robots.txt file is invalid, abort mission.
2018-02-23 10:36:14 +01:00
0e02f22d08
Exception handling
...
Big problem with the url https:/plus.google.com/+Python concerning
robots parsing.
Didn't find the bug. @tobast , if you have some time to look at it :)
2018-02-23 00:37:36 +01:00
77ca7ebcb9
Silly me.
2018-02-22 15:35:46 +01:00