24e69a8ad9
new: [option] --analysis to limit the output to a specific analysis
2020-10-09 23:23:36 +02:00
32a899a4a0
chg: [logo] added
2020-10-09 21:53:08 +02:00
02ea4cc717
chg: [doc] README improved + funky logo
2020-10-09 21:52:00 +02:00
fb289cec1b
chg: [analysis] get rid of single char token in the analysis
...
TODO: What about Chinese and alike? Need to be tested
2020-10-09 21:17:03 +02:00
02938bd464
new: [requirement] requirement file added
2020-10-09 20:50:40 +02:00
793e7ae9c5
chg: [langdetect] detection of language before further processing
...
Before processing the text, we use cld3 to detect the language
and compare if the foreseen spacy model to be used.
2020-10-09 20:47:43 +02:00
98a8d8275e
chg: [output] make readable table-like with bold headers
...
Official request from @C00kie-
2020-10-09 18:36:33 +02:00
2c295e79cf
chg: [export] fix CSV export
...
TODO: Review escaping in CSV
2020-10-09 17:19:11 +02:00
49be2bf809
new: [output] readable output to help analyst reading the output
...
First version based on @C00kie- feedback.
Potential improvement could be a more tabular representation.
2020-10-09 07:48:06 +02:00
9364c75477
chg: [score] scores are now integer
2020-10-09 07:27:08 +02:00
193ad08144
fix: [bug] punctuation was not part of OOV and were not accounted
2020-10-09 07:25:26 +02:00
ef5011a64f
chg: [cleanup] key names used in redis has been simplified
2020-10-09 07:18:16 +02:00
10049a69b6
new: [option] --binary to dump in binary format instead of UTF-8
2020-10-08 23:30:57 +02:00
26244739dd
new: [option] Don't flush the redisdb, useful when you want to process multiple files and aggregate the results.
2020-10-08 23:22:00 +02:00
949e41d19f
new: [lemmatized/verbatim] displaying verbatim or lemmatized version is now an option
2020-10-08 23:13:51 +02:00
3d71d9288e
chg: [args] add an option to force the language
2020-10-01 23:06:39 +02:00
3a09abc80c
new: [output] JSON export added
2020-09-21 07:50:57 +02:00
3c3760019e
chg: [feature] add punct statistics for the oov (but the punct in
...
spacy.io seems super buggy or incorrect)
2020-08-20 14:33:15 +02:00
526f88071c
new: [feature] -s option to display the overall statistics of different tokens seen
2020-08-20 13:28:49 +02:00
dd7c796460
new: [napkin] first release
...
Napkin is a Python tool to produce statistical analysis of a text.
Analysis features are :
- Verbs frequency
- Nouns frequency
- Digit frequency
- Labels frequency such as (Person, organisation, product, location) as defined in spacy.io [named entities](https://spacy.io/api/annotation#named-entities )
- URL frequency
- Email frequency
- Mention frequency (everything prefixed with an @ symbol)
- Out-Of-Vocabulary (OOV) word frequency meaning any words outside English dictionary
2020-08-19 17:33:04 +02:00
e3e27c7ce9
Initial commit
2020-08-18 16:49:24 +02:00