# Trump vocabulary **NOTE:** this was written in a few minutes without bothering with clean and robust code. This code goes through the tweets of Donald Trump and produces a ranked list of words used. The result (not much updated, though) can be found [here](https://tobast.fr/files/trumprank.txt). ## Methodology A word is considered to be a contiguous sequence of letters and quotes (`'`) only. Words that have less than four occurrences are removed (considered irrelevant — probably some random name). ## Install Clone this reopsitory with submodules: `git clone --recurse-submodules` Alternatively, if you already cloned the repo, you can run ```bash git submodule update --init --depth 1 ``` ## Get a shell You can explore the data in a shell by using `count_words.py` as an init script for your favorite shell, eg. ```bash ipython -i count_words.py ``` The following will be available to you as variables: * `tweets`: the list of all tweets ever, * `occur`: python dictionary of occurrences of words in Trump's tweets * `ranked`: ranked list of occurrences of words in Trump's tweets ## Generating the list Simply run ```bash python ./generate_list.py [OUTPUT_FILE] ``` If you omit `OUTPUT_FILE`, the list will be generated to `trumprank.txt`.