napkin-text-analysis

mirror of https://github.com/adulau/napkin-text-analysis.git synced 2024-11-22 09:57:07 +00:00

Author	SHA1	Message	Date
Alexandre Dulaunoy	2c2aaf3917	add: [readable] --table-format added to use any of the tabulate format available	2020-10-13 07:41:01 +02:00
Alexandre Dulaunoy	04c3fc3cf7	fix: [readable] fix bug where the first value was missed	2020-10-13 07:32:04 +02:00
Alexandre Dulaunoy	9fb0cc8488	fix: [bug] email was missing + first value of the ranked set was missed	2020-10-13 07:24:35 +02:00
Alexandre Dulaunoy	5fe2e2ae1f	chg: [readable] add span description (token/word queried)	2020-10-11 11:38:06 +02:00
Alexandre Dulaunoy	42e3094489	new: [option] --token-span to find a specific token in a sentence This output the sentence where a specific token has been seen. Require parser module of spacy.	2020-10-11 11:24:17 +02:00
Alexandre Dulaunoy	85044335f4	new: [option] to disable parser and/or tagger from the standard processing pipeline of Spacy If you don't need any of the syntactic information while using napkin, you can disable parser and tagger. You can gain some memory space and time for processing. By default, it's still active as napkin might use of the syntactic information in the future.	2020-10-11 11:04:30 +02:00
Alexandre Dulaunoy	24e69a8ad9	new: [option] --analysis to limit the output to a specific analysis	2020-10-09 23:23:36 +02:00
Alexandre Dulaunoy	fb289cec1b	chg: [analysis] get rid of single char token in the analysis TODO: What about Chinese and alike? Need to be tested	2020-10-09 21:17:03 +02:00
Alexandre Dulaunoy	793e7ae9c5	chg: [langdetect] detection of language before further processing Before processing the text, we use cld3 to detect the language and compare if the foreseen spacy model to be used.	2020-10-09 20:47:43 +02:00
Alexandre Dulaunoy	98a8d8275e	chg: [output] make readable table-like with bold headers Official request from @C00kie-	2020-10-09 18:36:33 +02:00
Alexandre Dulaunoy	2c295e79cf	chg: [export] fix CSV export TODO: Review escaping in CSV	2020-10-09 17:19:11 +02:00
Alexandre Dulaunoy	49be2bf809	new: [output] readable output to help analyst reading the output First version based on @C00kie- feedback. Potential improvement could be a more tabular representation.	2020-10-09 07:48:06 +02:00
Alexandre Dulaunoy	9364c75477	chg: [score] scores are now integer	2020-10-09 07:27:08 +02:00
Alexandre Dulaunoy	193ad08144	fix: [bug] punctuation was not part of OOV and were not accounted	2020-10-09 07:25:26 +02:00
Alexandre Dulaunoy	ef5011a64f	chg: [cleanup] key names used in redis has been simplified	2020-10-09 07:18:16 +02:00
Alexandre Dulaunoy	10049a69b6	new: [option] --binary to dump in binary format instead of UTF-8	2020-10-08 23:30:57 +02:00
Alexandre Dulaunoy	26244739dd	new: [option] Don't flush the redisdb, useful when you want to process multiple files and aggregate the results.	2020-10-08 23:22:00 +02:00
Alexandre Dulaunoy	949e41d19f	new: [lemmatized/verbatim] displaying verbatim or lemmatized version is now an option	2020-10-08 23:13:51 +02:00
Alexandre Dulaunoy	3d71d9288e	chg: [args] add an option to force the language	2020-10-01 23:06:39 +02:00
Alexandre Dulaunoy	3a09abc80c	new: [output] JSON export added	2020-09-21 07:50:57 +02:00
Alexandre Dulaunoy	3c3760019e	chg: [feature] add punct statistics for the oov (but the punct in spacy.io seems super buggy or incorrect)	2020-08-20 14:33:15 +02:00
Alexandre Dulaunoy	526f88071c	new: [feature] -s option to display the overall statistics of different tokens seen	2020-08-20 13:28:49 +02:00
Alexandre Dulaunoy	dd7c796460	new: [napkin] first release Napkin is a Python tool to produce statistical analysis of a text. Analysis features are : - Verbs frequency - Nouns frequency - Digit frequency - Labels frequency such as (Person, organisation, product, location) as defined in spacy.io [named entities](https://spacy.io/api/annotation#named-entities) - URL frequency - Email frequency - Mention frequency (everything prefixed with an @ symbol) - Out-Of-Vocabulary (OOV) word frequency meaning any words outside English dictionary	2020-08-19 17:33:04 +02:00

23 commits