trump-vocabulary/README.md

# Trump vocabulary

**NOTE:** this was written in a few minutes without bothering with clean and robust
code.

This code goes through the tweets of Donald Trump and produces a ranked list of words
used.

The result (not much updated, though) can be found
[here](https://tobast.fr/files/trumprank.txt).

## Methodology

A word is considered to be a contiguous sequence of letters and quotes (`'`)
only. Words that have less than four occurrences are removed (considered
irrelevant — probably some random name).

## Install

Clone this reopsitory with submodules: `git clone --recurse-submodules`

Alternatively, if you already cloned the repo, you can run

```bash
git submodule update --init --depth 1
```

## Get a shell

You can explore the data in a shell by using `count_words.py` as an init script for
your favorite shell, eg.

```bash
ipython -i count_words.py
```

The following will be available to you as variables:

* `tweets`: the list of all tweets ever,
* `occur`: python dictionary of occurrences of words in Trump's tweets
* `ranked`: ranked list of occurrences of words in Trump's tweets

## Generating the list

Simply run

```bash
python ./generate_list.py [OUTPUT_FILE]
```

If you omit `OUTPUT_FILE`, the list will be generated to `trumprank.txt`.
Initial commit 2019-07-04 11:57:16 +02:00			`# Trump vocabulary`
Initial commit 2019-07-04 11:31:32 +02:00
Initial commit 2019-07-04 11:57:16 +02:00			`NOTE: this was written in a few minutes without bothering with clean and robust`
			`code.`

			`This code goes through the tweets of Donald Trump and produces a ranked list of words`
			`used.`

			`The result (not much updated, though) can be found`
			`[here](https://tobast.fr/files/trumprank.txt).`

Detail a bit methodology 2019-07-04 12:00:06 +02:00			`## Methodology`

			A word is considered to be a contiguous sequence of letters and quotes (`'`)
			`only. Words that have less than four occurrences are removed (considered`
			`irrelevant — probably some random name).`

Initial commit 2019-07-04 11:57:16 +02:00			`## Install`

			Clone this reopsitory with submodules: `git clone --recurse-submodules`

			`Alternatively, if you already cloned the repo, you can run`

			```bash
			`git submodule update --init --depth 1`
			```

			`## Get a shell`

			You can explore the data in a shell by using `count_words.py` as an init script for
			`your favorite shell, eg.`

			```bash
			`ipython -i count_words.py`
			```

			`The following will be available to you as variables:`

			* `tweets`: the list of all tweets ever,
			* `occur`: python dictionary of occurrences of words in Trump's tweets
			* `ranked`: ranked list of occurrences of words in Trump's tweets

			`## Generating the list`

			`Simply run`

			```bash
			`python ./generate_list.py [OUTPUT_FILE]`
			```

			If you omit `OUTPUT_FILE`, the list will be generated to `trumprank.txt`.