2019-07-04 11:57:16 +02:00
|
|
|
# Trump vocabulary
|
2019-07-04 11:31:32 +02:00
|
|
|
|
2019-07-04 11:57:16 +02:00
|
|
|
**NOTE:** this was written in a few minutes without bothering with clean and robust
|
|
|
|
code.
|
|
|
|
|
|
|
|
This code goes through the tweets of Donald Trump and produces a ranked list of words
|
|
|
|
used.
|
|
|
|
|
|
|
|
The result (not much updated, though) can be found
|
|
|
|
[here](https://tobast.fr/files/trumprank.txt).
|
|
|
|
|
2019-07-04 12:00:06 +02:00
|
|
|
## Methodology
|
|
|
|
|
|
|
|
A word is considered to be a contiguous sequence of letters and quotes (`'`)
|
|
|
|
only. Words that have less than four occurrences are removed (considered
|
|
|
|
irrelevant — probably some random name).
|
|
|
|
|
2019-07-04 11:57:16 +02:00
|
|
|
## Install
|
|
|
|
|
|
|
|
Clone this reopsitory with submodules: `git clone --recurse-submodules`
|
|
|
|
|
|
|
|
Alternatively, if you already cloned the repo, you can run
|
|
|
|
|
|
|
|
```bash
|
|
|
|
git submodule update --init --depth 1
|
|
|
|
```
|
|
|
|
|
|
|
|
## Get a shell
|
|
|
|
|
|
|
|
You can explore the data in a shell by using `count_words.py` as an init script for
|
|
|
|
your favorite shell, eg.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
ipython -i count_words.py
|
|
|
|
```
|
|
|
|
|
|
|
|
The following will be available to you as variables:
|
|
|
|
|
|
|
|
* `tweets`: the list of all tweets ever,
|
|
|
|
* `occur`: python dictionary of occurrences of words in Trump's tweets
|
|
|
|
* `ranked`: ranked list of occurrences of words in Trump's tweets
|
|
|
|
|
|
|
|
## Generating the list
|
|
|
|
|
|
|
|
Simply run
|
|
|
|
|
|
|
|
```bash
|
|
|
|
python ./generate_list.py [OUTPUT_FILE]
|
|
|
|
```
|
|
|
|
|
|
|
|
If you omit `OUTPUT_FILE`, the list will be generated to `trumprank.txt`.
|