Commit graph

30 commits

Author SHA1 Message Date
570829652d Set theme jekyll-theme-minimal 2020-10-11 16:41:52 +02:00
a313b14410
chg: [doc] missing ! 2020-10-11 14:32:45 +02:00
18d39aa591
chg: [doc] overview of processing added 2020-10-11 14:32:00 +02:00
3ba0a643c1
chg: [doc] show the scope of spacy.io library 2020-10-11 14:29:46 +02:00
a159473f8d
new: [doc] mindmap overview of napkin 2020-10-11 14:23:32 +02:00
5fe2e2ae1f
chg: [readable] add span description (token/word queried) 2020-10-11 11:38:06 +02:00
42e3094489
new: [option] --token-span to find a specific token in a sentence
This output the sentence where a specific token has been seen.

Require parser module of spacy.
2020-10-11 11:24:17 +02:00
85044335f4
new: [option] to disable parser and/or tagger from the standard processing pipeline of Spacy
If you don't need any of the syntactic information while using napkin,
you can disable parser and tagger. You can gain some memory space and time
for processing. By default, it's still active as napkin might use
of the syntactic information in the future.
2020-10-11 11:04:30 +02:00
ab728e60c6
add: [sample] french text - Alice in Wonderland 2020-10-11 10:45:36 +02:00
24e69a8ad9
new: [option] --analysis to limit the output to a specific analysis 2020-10-09 23:23:36 +02:00
32a899a4a0
chg: [logo] added 2020-10-09 21:53:08 +02:00
02ea4cc717
chg: [doc] README improved + funky logo 2020-10-09 21:52:00 +02:00
fb289cec1b
chg: [analysis] get rid of single char token in the analysis
TODO: What about Chinese and alike? Need to be tested
2020-10-09 21:17:03 +02:00
02938bd464
new: [requirement] requirement file added 2020-10-09 20:50:40 +02:00
793e7ae9c5
chg: [langdetect] detection of language before further processing
Before processing the text, we use cld3 to detect the language
and compare if the foreseen spacy model to be used.
2020-10-09 20:47:43 +02:00
98a8d8275e
chg: [output] make readable table-like with bold headers
Official request from @C00kie-
2020-10-09 18:36:33 +02:00
2c295e79cf
chg: [export] fix CSV export
TODO: Review escaping in CSV
2020-10-09 17:19:11 +02:00
49be2bf809
new: [output] readable output to help analyst reading the output
First version based on @C00kie- feedback.

Potential improvement could be a more tabular representation.
2020-10-09 07:48:06 +02:00
9364c75477
chg: [score] scores are now integer 2020-10-09 07:27:08 +02:00
193ad08144
fix: [bug] punctuation was not part of OOV and were not accounted 2020-10-09 07:25:26 +02:00
ef5011a64f
chg: [cleanup] key names used in redis has been simplified 2020-10-09 07:18:16 +02:00
10049a69b6
new: [option] --binary to dump in binary format instead of UTF-8 2020-10-08 23:30:57 +02:00
26244739dd
new: [option] Don't flush the redisdb, useful when you want to process multiple files and aggregate the results. 2020-10-08 23:22:00 +02:00
949e41d19f
new: [lemmatized/verbatim] displaying verbatim or lemmatized version is now an option 2020-10-08 23:13:51 +02:00
3d71d9288e
chg: [args] add an option to force the language 2020-10-01 23:06:39 +02:00
3a09abc80c
new: [output] JSON export added 2020-09-21 07:50:57 +02:00
3c3760019e
chg: [feature] add punct statistics for the oov (but the punct in
spacy.io seems super buggy or incorrect)
2020-08-20 14:33:15 +02:00
526f88071c
new: [feature] -s option to display the overall statistics of different tokens seen 2020-08-20 13:28:49 +02:00
dd7c796460
new: [napkin] first release
Napkin is a Python tool to produce statistical analysis of a text.

Analysis features are :

- Verbs frequency
- Nouns frequency
- Digit frequency
- Labels frequency such as (Person, organisation, product, location) as defined in spacy.io [named entities](https://spacy.io/api/annotation#named-entities)
- URL frequency
- Email frequency
- Mention frequency (everything prefixed with an @ symbol)
- Out-Of-Vocabulary (OOV) word frequency meaning any words outside English dictionary
2020-08-19 17:33:04 +02:00
e3e27c7ce9
Initial commit 2020-08-18 16:49:24 +02:00