mpri-webdam

Author	SHA1	Message	Date
Théophile Bastian	15323c3465	[REBASE ME] Crawl: enhance efficiency and output a tree	2018-02-25 15:08:06 +01:00
Théophile Bastian	c3bcdea1eb	Add tentative export to RDF	2018-02-25 14:37:30 +01:00
Rémi Oudin	05a2e2ca3f	Partial generation of profiles	2018-02-25 13:18:12 +01:00
Rémi Oudin	d4aefb6bb7	Load the data	2018-02-25 13:17:44 +01:00
Rémi Oudin	3eb82a4a0b	data for names and emails	2018-02-25 13:17:27 +01:00
Rémi Oudin	7c0fb7dda1	Better naming	2018-02-25 11:49:44 +01:00
Rémi Oudin	ee32e5385b	Finished data import	2018-02-25 11:49:11 +01:00
Rémi Oudin	bc7348f677	Integration of crawl module in histories	2018-02-24 23:17:24 +01:00
Rémi Oudin	60bfc8cb77	Merge branch 'crawl' into histories_models	2018-02-24 18:44:27 +01:00
Rémi Oudin	12c8c652d7	Serialisation function	2018-02-24 18:40:27 +01:00
Rémi Oudin	c58f42476f	Missing script for `854481d`	2018-02-24 17:22:52 +01:00
Rémi Oudin	854481dbd3	Import utilities	2018-02-24 17:21:41 +01:00
Rémi Oudin	d19c2e8216	Add mailto adresses to forbidden list	2018-02-24 15:41:46 +01:00
Rémi Oudin	e56c088632	Better filter	2018-02-24 11:39:04 +01:00
Théophile Bastian	2732e4115f	Add RDF models export classes — untested Also add a dependency to https://github.com/tobast/RDFSerializer/	2018-02-23 13:32:32 +01:00
Rémi Oudin	f0b8672c89	Silly me. (bis)	2018-02-23 10:44:51 +01:00
Rémi Oudin	f6da179820	If robots.txt file is invalid, abort mission.	2018-02-23 10:36:14 +01:00
Rémi Oudin	0e02f22d08	Exception handling Big problem with the url https:/plus.google.com/+Python concerning robots parsing. Didn't find the bug. @tobast, if you have some time to look at it :)	2018-02-23 00:37:36 +01:00
Rémi Oudin	77ca7ebcb9	Silly me.	2018-02-22 15:35:46 +01:00
Rémi Oudin	9b78e268c9	Nearly working crawler	2018-02-22 14:33:07 +01:00
Rémi Oudin	e19e623df1	Multiple bug fixes. TODO : remove <div id=footer>-like patterns	2018-02-22 14:07:53 +01:00
Rémi Oudin	5decd205fb	Typos + improvements	2018-02-22 11:06:45 +01:00
Rémi Oudin	236e15296c	It can be useful to return the links list	2018-02-21 23:11:57 +01:00
Rémi Oudin	4e6ac5ac7b	Url getter function : retrieves the list of so-called relevant links	2018-02-21 22:51:05 +01:00
Rémi Oudin	a907cad33d	Start of url getter function	2018-02-21 19:06:46 +01:00
Rémi Oudin	ad0ad0a783	Command to add browser fingerprint data	2018-02-21 16:50:27 +01:00
Théophile Bastian	b05e642c79	Make the code somewhat readable	2018-02-21 11:54:41 +01:00
Rémi Oudin	cd4d8a4c3f	More generic code using @8f4458b	2018-02-21 11:50:28 +01:00
Rémi Oudin	8f4458b009	Url generation method, for more genericity	2018-02-21 11:37:44 +01:00
Rémi Oudin	5539f57139	Add missing docstrings	2018-02-21 11:35:53 +01:00
Rémi Oudin	4920de5838	Going on in the generation of history	2018-02-20 23:42:21 +01:00
Théophile Bastian	c97acb22b5	Add tentative crawl file Nothing functional, just tests	2018-02-20 12:48:53 +01:00
Théophile Bastian	c05c2561d2	Add crawler settings and requirements	2018-02-20 12:48:16 +01:00
Théophile Bastian	bef1fca5b9	Init app 'crawl'	2018-02-20 08:51:16 +01:00
Rémi Oudin	7c13ee17d4	Skeleton of history generation	2018-02-19 22:56:16 +01:00
Rémi Oudin	7f343d8ad8	Better formatting	2018-02-19 13:59:29 +01:00
Rémi Oudin	3b0fa27951	Add histories application to settings file	2018-02-19 13:59:29 +01:00
Rémi Oudin	60f09bd4d3	Add basic models for histories	2018-02-19 13:58:55 +01:00
Théophile Bastian	924657abdb	Generate profiles' migration	2018-01-24 22:49:34 +01:00
Théophile Bastian	e9b3127226	Use `profiles` as an installed application in pinocchio	2018-01-24 22:49:08 +01:00
Théophile Bastian	cbf1911fe7	Add models for Interest and Profile	2018-01-24 22:48:53 +01:00
Théophile Bastian	37581fb96a	Add models for Place and Event	2018-01-24 22:39:20 +01:00
Théophile Bastian	6531415d63	Add model for a webpage and website	2018-01-24 14:09:33 +01:00
Théophile Bastian	114c8a3d3e	Add model for search engines	2018-01-24 13:52:43 +01:00
Théophile Bastian	225742798b	Add BrowserFingerprint model	2018-01-24 13:36:55 +01:00
Théophile Bastian	a3e6308837	Init apps `histories` and `profiles`	2018-01-23 18:12:47 +01:00
Théophile Bastian	397784a673	Add first version of requirements.txt Mainly Django, by now	2018-01-23 18:11:07 +01:00
Théophile Bastian	132b7250c8	Initialize Django	2018-01-23 18:11:00 +01:00
Théophile Bastian	c1e3be346f	Initial commit	2018-01-23 17:53:08 +01:00

1 2

99 commits