|
22fa039f1b
|
Remove debug print
|
2018-02-26 16:23:14 +01:00 |
|
|
e4ad8c7ce6
|
Towards a working XML export
|
2018-02-26 15:58:30 +01:00 |
|
|
67ad232533
|
Add a timeout to a single page retrieval
|
2018-02-26 15:42:36 +01:00 |
|
|
e140d4a8a7
|
Fix merge remanences
|
2018-02-26 15:37:05 +01:00 |
|
|
98fe69ba62
|
Real async crawling
|
2018-02-26 15:30:38 +01:00 |
|
|
968ff6d24c
|
More robust crawling
|
2018-02-26 15:29:36 +01:00 |
|
|
5d4bd30e20
|
Exception handling
|
2018-02-26 15:15:03 +01:00 |
|
|
bdfa285e6b
|
We do not want to use settings
|
2018-02-26 15:14:53 +01:00 |
|
|
65f777f00f
|
Should get the objects and not the Manager
|
2018-02-26 15:04:26 +01:00 |
|
|
236e40d359
|
Sanity check
|
2018-02-26 14:57:46 +01:00 |
|
|
22017cea91
|
Typo in data u_u
|
2018-02-26 14:56:22 +01:00 |
|
|
549c861908
|
Bug fixé
|
2018-02-26 14:38:26 +01:00 |
|
|
517be1d822
|
Merge rdf branch
|
2018-02-26 14:11:06 +01:00 |
|
|
c4f63a92b2
|
Error in the merge, mea culpa
|
2018-02-26 14:01:29 +01:00 |
|
|
db067e56fc
|
Typo
|
2018-02-26 13:59:34 +01:00 |
|
|
33bdae96e4
|
merge commit from histories_tobast into histories_models
|
2018-02-26 12:59:38 +01:00 |
|
|
526aad1364
|
Add interests
|
2018-02-26 12:33:23 +01:00 |
|
|
02e91bb2b7
|
Fix function calls
|
2018-02-26 11:56:02 +01:00 |
|
|
3e5fc2f9b3
|
Fix search engine URL generation
|
2018-02-26 11:49:24 +01:00 |
|
|
45ddbff91a
|
Crawling and histories: fix a lot of stuff
|
2018-02-26 11:49:24 +01:00 |
|
|
e6d587bffd
|
Actually save to DB a created history
|
2018-02-26 11:49:24 +01:00 |
|
|
8baf408e02
|
Use dict from data/nicknames_dict for nicknames
|
2018-02-26 11:49:24 +01:00 |
|
|
6463e348ac
|
Fix populate.sh exec path
|
2018-02-26 11:48:51 +01:00 |
|
|
22064ebee3
|
Histories: xml import/export — untested
To be tested when history generation is available
|
2018-02-26 11:48:51 +01:00 |
|
|
a4de51b84a
|
Crawl: do not use global SEARCH_ENGINES
|
2018-02-26 11:48:51 +01:00 |
|
|
4f0148cb63
|
Crawler: use a random fingerprint
|
2018-02-26 11:48:51 +01:00 |
|
|
4a8bd32516
|
Fix tor_runner import
|
2018-02-26 11:48:51 +01:00 |
|
|
44cf26df8f
|
It can be useful to save a new object
|
2018-02-26 11:42:45 +01:00 |
|
|
adb892ab7d
|
Check if crawling a search engine
|
2018-02-26 11:12:36 +01:00 |
|
|
15db8b4697
|
Change option name due to downgrade of aiohttp
|
2018-02-26 10:23:32 +01:00 |
|
|
d6b26c0a46
|
Better use of history
|
2018-02-26 10:05:33 +01:00 |
|
|
8f5c4f3f0f
|
Use datetimes
|
2018-02-26 09:49:24 +01:00 |
|
|
71d9e18eec
|
Add headers support
|
2018-02-25 23:56:51 +01:00 |
|
|
8ad46c0481
|
Bug fix, syntax erro
|
2018-02-25 21:59:29 +01:00 |
|
|
f66c978466
|
Tor runner has a run function to replay the history
|
2018-02-25 21:53:28 +01:00 |
|
|
0a676a2f65
|
PEP8
|
2018-02-25 21:34:20 +01:00 |
|
|
e074d96f02
|
tor_runner can make requests
|
2018-02-25 21:27:15 +01:00 |
|
|
93b235cb6c
|
Fix interests import
|
2018-02-25 21:20:52 +01:00 |
|
|
ae5699c089
|
Basic tor runner
|
2018-02-25 19:42:58 +01:00 |
|
|
f7313ff659
|
Add populate.sh script
|
2018-02-25 16:16:04 +01:00 |
|
|
0661fe0f01
|
Fix path
|
2018-02-25 16:10:38 +01:00 |
|
|
4b19febdf6
|
Add interests
|
2018-02-25 16:10:22 +01:00 |
|
|
15323c3465
|
[REBASE ME] Crawl: enhance efficiency and output a tree
|
2018-02-25 15:08:06 +01:00 |
|
|
c3bcdea1eb
|
Add tentative export to RDF
|
2018-02-25 14:37:30 +01:00 |
|
|
05a2e2ca3f
|
Partial generation of profiles
|
2018-02-25 13:18:12 +01:00 |
|
|
d4aefb6bb7
|
Load the data
|
2018-02-25 13:17:44 +01:00 |
|
|
3eb82a4a0b
|
data for names and emails
|
2018-02-25 13:17:27 +01:00 |
|
|
7c0fb7dda1
|
Better naming
|
2018-02-25 11:49:44 +01:00 |
|
|
ee32e5385b
|
Finished data import
|
2018-02-25 11:49:11 +01:00 |
|
|
bc7348f677
|
Integration of crawl module in histories
|
2018-02-24 23:17:24 +01:00 |
|