Commit Graph

98 Commits

Author SHA1 Message Date
Théophile Bastian b88aeffd5a Helpful README 2018-02-26 17:09:05 +01:00
Théophile Bastian 2005c0f24f Add xml string gen 2018-02-26 17:03:27 +01:00
Théophile Bastian 185c1cf8a4 Fix XML generation 2018-02-26 17:00:53 +01:00
Rémi Oudin 9dd1954067 Partial runner fix 2018-02-26 17:00:53 +01:00
Rémi Oudin 04270e88c0 Bug fix 2018-02-26 17:00:12 +01:00
Théophile Bastian 6bc64ceb7a Add requirement for aiohttp 2018-02-26 16:38:16 +01:00
Théophile Bastian 8cdc50c04e Fix stupid typo 2018-02-26 16:34:43 +01:00
Rémi Oudin 22fa039f1b Remove debug print 2018-02-26 16:23:14 +01:00
Théophile Bastian e4ad8c7ce6 Towards a working XML export 2018-02-26 15:58:30 +01:00
Théophile Bastian 67ad232533 Add a timeout to a single page retrieval 2018-02-26 15:42:36 +01:00
Théophile Bastian e140d4a8a7 Fix merge remanences 2018-02-26 15:37:05 +01:00
Théophile Bastian 98fe69ba62 Real async crawling 2018-02-26 15:30:38 +01:00
Théophile Bastian 968ff6d24c More robust crawling 2018-02-26 15:29:36 +01:00
Rémi Oudin 5d4bd30e20 Exception handling 2018-02-26 15:15:03 +01:00
Rémi Oudin bdfa285e6b We do not want to use settings 2018-02-26 15:14:53 +01:00
Rémi Oudin 65f777f00f Should get the objects and not the Manager 2018-02-26 15:04:26 +01:00
Rémi Oudin 236e40d359 Sanity check 2018-02-26 14:57:46 +01:00
Rémi Oudin 22017cea91 Typo in data u_u 2018-02-26 14:56:22 +01:00
Rémi Oudin 549c861908 Bug fixé 2018-02-26 14:38:26 +01:00
Rémi Oudin 517be1d822 Merge rdf branch 2018-02-26 14:11:06 +01:00
Rémi Oudin c4f63a92b2 Error in the merge, mea culpa 2018-02-26 14:01:29 +01:00
Rémi Oudin db067e56fc Typo 2018-02-26 13:59:34 +01:00
Rémi Oudin 33bdae96e4 merge commit from histories_tobast into histories_models 2018-02-26 12:59:38 +01:00
Rémi Oudin 526aad1364 Add interests 2018-02-26 12:33:23 +01:00
Théophile Bastian 02e91bb2b7 Fix function calls 2018-02-26 11:56:02 +01:00
Théophile Bastian 3e5fc2f9b3 Fix search engine URL generation 2018-02-26 11:49:24 +01:00
Théophile Bastian 45ddbff91a Crawling and histories: fix a lot of stuff 2018-02-26 11:49:24 +01:00
Théophile Bastian e6d587bffd Actually save to DB a created history 2018-02-26 11:49:24 +01:00
Théophile Bastian 8baf408e02 Use dict from data/nicknames_dict for nicknames 2018-02-26 11:49:24 +01:00
Théophile Bastian 6463e348ac Fix populate.sh exec path 2018-02-26 11:48:51 +01:00
Théophile Bastian 22064ebee3 Histories: xml import/export — untested
To be tested when history generation is available
2018-02-26 11:48:51 +01:00
Théophile Bastian a4de51b84a Crawl: do not use global SEARCH_ENGINES 2018-02-26 11:48:51 +01:00
Théophile Bastian 4f0148cb63 Crawler: use a random fingerprint 2018-02-26 11:48:51 +01:00
Théophile Bastian 4a8bd32516 Fix tor_runner import 2018-02-26 11:48:51 +01:00
Rémi Oudin 44cf26df8f It can be useful to save a new object 2018-02-26 11:42:45 +01:00
Rémi Oudin adb892ab7d Check if crawling a search engine 2018-02-26 11:12:36 +01:00
Rémi Oudin 15db8b4697 Change option name due to downgrade of aiohttp 2018-02-26 10:23:32 +01:00
Rémi Oudin d6b26c0a46 Better use of history 2018-02-26 10:05:33 +01:00
Rémi Oudin 8f5c4f3f0f Use datetimes 2018-02-26 09:49:24 +01:00
Rémi Oudin 71d9e18eec Add headers support 2018-02-25 23:56:51 +01:00
Rémi Oudin 8ad46c0481 Bug fix, syntax erro 2018-02-25 21:59:29 +01:00
Rémi Oudin f66c978466 Tor runner has a run function to replay the history 2018-02-25 21:53:28 +01:00
Rémi Oudin 0a676a2f65 PEP8 2018-02-25 21:34:20 +01:00
Rémi Oudin e074d96f02 tor_runner can make requests 2018-02-25 21:27:15 +01:00
Rémi Oudin 93b235cb6c Fix interests import 2018-02-25 21:20:52 +01:00
Rémi Oudin ae5699c089 Basic tor runner 2018-02-25 19:42:58 +01:00
Rémi Oudin f7313ff659 Add populate.sh script 2018-02-25 16:16:04 +01:00
Rémi Oudin 0661fe0f01 Fix path 2018-02-25 16:10:38 +01:00
Rémi Oudin 4b19febdf6 Add interests 2018-02-25 16:10:22 +01:00
Théophile Bastian 15323c3465 [REBASE ME] Crawl: enhance efficiency and output a tree 2018-02-25 15:08:06 +01:00