Alexandre Dulaunoy
8541ae3192
chg: [cli] black the napkin binary
2024-02-25 15:53:15 +01:00
Alexandre Dulaunoy
a2a074436e
chg: [doc] updated
2020-11-22 14:22:18 +01:00
Alexandre Dulaunoy
a6d5a5bbe4
new: [input] you can now read from stdin directly with -i
2020-11-22 14:19:21 +01:00
Alexandre Dulaunoy
b1ddcfa53c
fix: [punct] analysis is now using the proper token value
2020-10-19 08:56:28 +02:00
Alexandre Dulaunoy
afc06a3850
chg: [redis] use default Redis port
2020-10-19 08:52:23 +02:00
Alexandre Dulaunoy
793ce3228f
Create python-app.yml
...
First action for testing napkin-text-analysis
2020-10-18 22:56:02 +02:00
Alexandre Dulaunoy
1192d17e86
chg: [download] if a spacy.io model is missing, napkin automatically downloads it
...
Fix #1
TODO: map cld3 language to potential model to be downloaded
2020-10-18 22:37:00 +02:00
Alexandre Dulaunoy
5b6136cfaf
new: [feature] option to save all labels in redis ranked set
2020-10-15 07:12:15 +02:00
Alexandre Dulaunoy
7bb9a78096
Merge branch 'master' of github.com:adulau/napkin-text-analysis
2020-10-13 08:09:37 +02:00
Alexandre Dulaunoy
13817d2f20
chg: [doc] example of a GitHub markdown format output
2020-10-13 07:46:21 +02:00
Alexandre Dulaunoy
2c2aaf3917
add: [readable] --table-format added to use any of the tabulate format available
2020-10-13 07:41:01 +02:00
Alexandre Dulaunoy
04c3fc3cf7
fix: [readable] fix bug where the first value was missed
2020-10-13 07:32:04 +02:00
Alexandre Dulaunoy
9fb0cc8488
fix: [bug] email was missing + first value of the ranked set was missed
2020-10-13 07:24:35 +02:00
Alexandre Dulaunoy
570829652d
Set theme jekyll-theme-minimal
2020-10-11 16:41:52 +02:00
Alexandre Dulaunoy
a313b14410
chg: [doc] missing !
2020-10-11 14:32:45 +02:00
Alexandre Dulaunoy
18d39aa591
chg: [doc] overview of processing added
2020-10-11 14:32:00 +02:00
Alexandre Dulaunoy
3ba0a643c1
chg: [doc] show the scope of spacy.io library
2020-10-11 14:29:46 +02:00
Alexandre Dulaunoy
a159473f8d
new: [doc] mindmap overview of napkin
2020-10-11 14:23:32 +02:00
Alexandre Dulaunoy
5fe2e2ae1f
chg: [readable] add span description (token/word queried)
2020-10-11 11:38:06 +02:00
Alexandre Dulaunoy
42e3094489
new: [option] --token-span to find a specific token in a sentence
...
This output the sentence where a specific token has been seen.
Require parser module of spacy.
2020-10-11 11:24:17 +02:00
Alexandre Dulaunoy
85044335f4
new: [option] to disable parser and/or tagger from the standard processing pipeline of Spacy
...
If you don't need any of the syntactic information while using napkin,
you can disable parser and tagger. You can gain some memory space and time
for processing. By default, it's still active as napkin might use
of the syntactic information in the future.
2020-10-11 11:04:30 +02:00
Alexandre Dulaunoy
ab728e60c6
add: [sample] french text - Alice in Wonderland
2020-10-11 10:45:36 +02:00
Alexandre Dulaunoy
24e69a8ad9
new: [option] --analysis to limit the output to a specific analysis
2020-10-09 23:23:36 +02:00
Alexandre Dulaunoy
32a899a4a0
chg: [logo] added
2020-10-09 21:53:08 +02:00
Alexandre Dulaunoy
02ea4cc717
chg: [doc] README improved + funky logo
2020-10-09 21:52:00 +02:00
Alexandre Dulaunoy
fb289cec1b
chg: [analysis] get rid of single char token in the analysis
...
TODO: What about Chinese and alike? Need to be tested
2020-10-09 21:17:03 +02:00
Alexandre Dulaunoy
02938bd464
new: [requirement] requirement file added
2020-10-09 20:50:40 +02:00
Alexandre Dulaunoy
793e7ae9c5
chg: [langdetect] detection of language before further processing
...
Before processing the text, we use cld3 to detect the language
and compare if the foreseen spacy model to be used.
2020-10-09 20:47:43 +02:00
Alexandre Dulaunoy
98a8d8275e
chg: [output] make readable table-like with bold headers
...
Official request from @C00kie-
2020-10-09 18:36:33 +02:00
Alexandre Dulaunoy
2c295e79cf
chg: [export] fix CSV export
...
TODO: Review escaping in CSV
2020-10-09 17:19:11 +02:00
Alexandre Dulaunoy
49be2bf809
new: [output] readable output to help analyst reading the output
...
First version based on @C00kie- feedback.
Potential improvement could be a more tabular representation.
2020-10-09 07:48:06 +02:00
Alexandre Dulaunoy
9364c75477
chg: [score] scores are now integer
2020-10-09 07:27:08 +02:00
Alexandre Dulaunoy
193ad08144
fix: [bug] punctuation was not part of OOV and were not accounted
2020-10-09 07:25:26 +02:00
Alexandre Dulaunoy
ef5011a64f
chg: [cleanup] key names used in redis has been simplified
2020-10-09 07:18:16 +02:00
Alexandre Dulaunoy
10049a69b6
new: [option] --binary to dump in binary format instead of UTF-8
2020-10-08 23:30:57 +02:00
Alexandre Dulaunoy
26244739dd
new: [option] Don't flush the redisdb, useful when you want to process multiple files and aggregate the results.
2020-10-08 23:22:00 +02:00
Alexandre Dulaunoy
949e41d19f
new: [lemmatized/verbatim] displaying verbatim or lemmatized version is now an option
2020-10-08 23:13:51 +02:00
Alexandre Dulaunoy
3d71d9288e
chg: [args] add an option to force the language
2020-10-01 23:06:39 +02:00
Alexandre Dulaunoy
3a09abc80c
new: [output] JSON export added
2020-09-21 07:50:57 +02:00
Alexandre Dulaunoy
3c3760019e
chg: [feature] add punct statistics for the oov (but the punct in
...
spacy.io seems super buggy or incorrect)
2020-08-20 14:33:15 +02:00
Alexandre Dulaunoy
526f88071c
new: [feature] -s option to display the overall statistics of different tokens seen
2020-08-20 13:28:49 +02:00
Alexandre Dulaunoy
dd7c796460
new: [napkin] first release
...
Napkin is a Python tool to produce statistical analysis of a text.
Analysis features are :
- Verbs frequency
- Nouns frequency
- Digit frequency
- Labels frequency such as (Person, organisation, product, location) as defined in spacy.io [named entities](https://spacy.io/api/annotation#named-entities )
- URL frequency
- Email frequency
- Mention frequency (everything prefixed with an @ symbol)
- Out-Of-Vocabulary (OOV) word frequency meaning any words outside English dictionary
2020-08-19 17:33:04 +02:00
Alexandre Dulaunoy
e3e27c7ce9
Initial commit
2020-08-18 16:49:24 +02:00