|Théophile Bastian 4c9540afc5 Detail a bit methodology||1 year ago|
|trump_tweet_data_archive @ 4398599156||1 year ago|
|.gitignore||1 year ago|
|.gitmodules||1 year ago|
|LICENSE||1 year ago|
|README.md||1 year ago|
|__init__.py||1 year ago|
|bootstrap.py||1 year ago|
|count_words.py||1 year ago|
|generate_list.py||1 year ago|
NOTE: this was written in a few minutes without bothering with clean and robust code.
This code goes through the tweets of Donald Trump and produces a ranked list of words used.
The result (not much updated, though) can be found here.
A word is considered to be a contiguous sequence of letters and quotes (
only. Words that have less than four occurrences are removed (considered
irrelevant — probably some random name).
Clone this reopsitory with submodules:
git clone --recurse-submodules
Alternatively, if you already cloned the repo, you can run
git submodule update --init --depth 1
You can explore the data in a shell by using
count_words.py as an init script for
your favorite shell, eg.
ipython -i count_words.py
The following will be available to you as variables:
tweets: the list of all tweets ever,
occur: python dictionary of occurrences of words in Trump's tweets
ranked: ranked list of occurrences of words in Trump's tweets
python ./generate_list.py [OUTPUT_FILE]
If you omit
OUTPUT_FILE, the list will be generated to