mirror of
https://github.com/adulau/napkin-text-analysis.git
synced 2024-11-22 01:47:06 +00:00
chg: [doc] README improved + funky logo
This commit is contained in:
parent
fb289cec1b
commit
02ea4cc717
1 changed files with 92 additions and 113 deletions
205
README.md
205
README.md
|
@ -1,5 +1,7 @@
|
||||||
# napkin-text-analysis
|
# napkin-text-analysis
|
||||||
|
|
||||||
|
![napkin text analysis - logo](./logo/logo.png)
|
||||||
|
|
||||||
Napkin is a Python tool to produce statistical analysis of a text.
|
Napkin is a Python tool to produce statistical analysis of a text.
|
||||||
|
|
||||||
Analysis features are :
|
Analysis features are :
|
||||||
|
@ -21,7 +23,9 @@ Intermediate results are stored in a Redis database to allow the analysis of mul
|
||||||
|
|
||||||
- Python >= 3.6
|
- Python >= 3.6
|
||||||
- spacy.io
|
- spacy.io
|
||||||
- redis (a redis server running on port 6380)
|
- redis (a redis server running on port 6380 is required)
|
||||||
|
- pycld3
|
||||||
|
- tabulate
|
||||||
|
|
||||||
# how to use napkin
|
# how to use napkin
|
||||||
|
|
||||||
|
@ -51,124 +55,99 @@ optional arguments:
|
||||||
|
|
||||||
A sample file "The Prince, by Nicoló Machiavelli" is included to test napkin.
|
A sample file "The Prince, by Nicoló Machiavelli" is included to test napkin.
|
||||||
|
|
||||||
`python3 napkin.py -f ../samples/the-prince.txt`
|
`python3 ./bin/napkin.py -o readable -f samples/the-prince.txt -t 4`
|
||||||
|
|
||||||
Example output:
|
Example output:
|
||||||
|
|
||||||
~~~~
|
~~~~
|
||||||
# Top 100 of verb:napkin
|
╒═════════════════╕
|
||||||
b'can',137.0
|
│ Top 4 of verb │
|
||||||
b'make',116.0
|
╞═════════════════╡
|
||||||
b'may',106.0
|
│ 116 occurences │
|
||||||
b'would',102.0
|
├─────────────────┤
|
||||||
b'must',97.0
|
│ make │
|
||||||
b'take',86.0
|
├─────────────────┤
|
||||||
b'have',73.0
|
│ 106 occurences │
|
||||||
b'see',72.0
|
├─────────────────┤
|
||||||
b'become',62.0
|
│ may │
|
||||||
b'find',61.0
|
├─────────────────┤
|
||||||
b'know',59.0
|
│ 102 occurences │
|
||||||
b'should',54.0
|
├─────────────────┤
|
||||||
b'keep',53.0
|
│ would │
|
||||||
b'give',53.0
|
╘═════════════════╛
|
||||||
b'hold',51.0
|
╒═════════════════╕
|
||||||
b'say',50.0
|
│ Top 4 of noun │
|
||||||
b'wish',48.0
|
╞═════════════════╡
|
||||||
b'could',48.0
|
│ 108 occurences │
|
||||||
b'fear',46.0
|
├─────────────────┤
|
||||||
b'maintain',45.0
|
│ state │
|
||||||
b'think',42.0
|
├─────────────────┤
|
||||||
b'use',40.0
|
│ 90 occurences │
|
||||||
b'consider',40.0
|
├─────────────────┤
|
||||||
b'come',40.0
|
│ people │
|
||||||
b'lose',37.0
|
├─────────────────┤
|
||||||
b'live',35.0
|
│ one │
|
||||||
b'follow',33.0
|
╘═════════════════╛
|
||||||
b'do',33.0
|
╒════════════════════╕
|
||||||
b'remain',32.0
|
│ Top 4 of hashtag │
|
||||||
b'gain',31.0
|
╞════════════════════╡
|
||||||
b'avoid',31.0
|
╘════════════════════╛
|
||||||
b'arise',31.0
|
╒════════════════════╕
|
||||||
b'speak',29.0
|
│ Top 4 of mention │
|
||||||
...
|
╞════════════════════╡
|
||||||
# Top 100 of noun:napkin
|
╘════════════════════╛
|
||||||
b'man',120.0
|
╒══════════════════╕
|
||||||
b'state',108.0
|
│ Top 4 of digit │
|
||||||
b'people',90.0
|
╞══════════════════╡
|
||||||
b'one',90.0
|
│ 750175 │
|
||||||
b'time',85.0
|
├──────────────────┤
|
||||||
b'work',83.0
|
│ 6221541 │
|
||||||
b'other',82.0
|
├──────────────────┤
|
||||||
b'thing',71.0
|
│ 57037 │
|
||||||
b'way',60.0
|
╘══════════════════╛
|
||||||
b'order',57.0
|
╒═════════════════════════════════════════╕
|
||||||
b'fortune',49.0
|
│ Top 4 of url │
|
||||||
b'army',45.0
|
╞═════════════════════════════════════════╡
|
||||||
b'force',44.0
|
│ 1 occurences │
|
||||||
b'arm',44.0
|
├─────────────────────────────────────────┤
|
||||||
b'soldier',43.0
|
│ www.gutenberg.org/license │
|
||||||
b'subject',42.0
|
├─────────────────────────────────────────┤
|
||||||
b'power',41.0
|
│ www.gutenberg.org/contact │
|
||||||
b'difficulty',39.0
|
├─────────────────────────────────────────┤
|
||||||
b'law',34.0
|
│ http://www.gutenberg.org/5/7/0/3/57037/ │
|
||||||
b'reputation',33.0
|
╘═════════════════════════════════════════╛
|
||||||
b'position',33.0
|
╒════════════════╕
|
||||||
b'enemy',33.0
|
│ Top 4 of oov │
|
||||||
b'war',32.0
|
╞════════════════╡
|
||||||
b'kingdom',32.0
|
│ 6 occurences │
|
||||||
b'cause',31.0
|
├────────────────┤
|
||||||
b'possession',29.0
|
│ Vitelli │
|
||||||
b'action',29.0
|
├────────────────┤
|
||||||
b'ruler',28.0
|
│ Pertinax │
|
||||||
b'rule',28.0
|
├────────────────┤
|
||||||
b'example',28.0
|
│ Orsinis │
|
||||||
b'hand',27.0
|
╘════════════════╛
|
||||||
b'friend',27.0
|
╒═══════════════════╕
|
||||||
b'country',27.0
|
│ Top 4 of labels │
|
||||||
b'king',26.0
|
╞═══════════════════╡
|
||||||
b'case',26.0
|
│ 197 occurences │
|
||||||
...
|
├───────────────────┤
|
||||||
# Top 100 of digit:napkin
|
│ CARDINAL │
|
||||||
b'84116',1.0
|
├───────────────────┤
|
||||||
b'750175',1.0
|
│ 189 occurences │
|
||||||
b'6221541',1.0
|
├───────────────────┤
|
||||||
b'57037',1.0
|
│ ORG │
|
||||||
b'55901',1.0
|
├───────────────────┤
|
||||||
#
|
│ 131 occurences │
|
||||||
# Top 100 of url:napking
|
├───────────────────┤
|
||||||
#
|
│ NORP │
|
||||||
# Top 100 of oov:napkin
|
╘═══════════════════╛
|
||||||
b'Fermo',7.0
|
|
||||||
b'Vitelli',6.0
|
|
||||||
b'Pertinax',6.0
|
|
||||||
b'Orsinis',6.0
|
|
||||||
b'Colonnas',6.0
|
|
||||||
b'Bentivogli',6.0
|
|
||||||
b'Agathocles',6.0
|
|
||||||
b'Oliverotto',5.0
|
|
||||||
b'C\xc3\xa6sar',5.0
|
|
||||||
...
|
|
||||||
# Top 100 of labels:napkin
|
|
||||||
b'GPE',305.0
|
|
||||||
b'CARDINAL',197.0
|
|
||||||
b'ORG',189.0
|
|
||||||
b'NORP',131.0
|
|
||||||
b'ORDINAL',72.0
|
|
||||||
b'DATE',44.0
|
|
||||||
b'LAW',30.0
|
|
||||||
b'LOC',18.0
|
|
||||||
b'PRODUCT',9.0
|
|
||||||
b'LANGUAGE',5.0
|
|
||||||
b'WORK_OF_ART',4.0
|
|
||||||
b'QUANTITY',4.0
|
|
||||||
b'TIME',3.0
|
|
||||||
b'FAC',3.0
|
|
||||||
b'MONEY',2.0
|
|
||||||
b'PERCENT',1.0
|
|
||||||
b'EVENT',1.0
|
|
||||||
|
|
||||||
~~~~
|
~~~~
|
||||||
|
|
||||||
|
# what about the name?
|
||||||
|
|
||||||
|
The name 'napkin' came after a first sketch of the idea on a napkin. The goal was also to provide a simple text analysis tool which can be run on the corner of table in a kitchen.
|
||||||
|
|
||||||
# LICENSE
|
# LICENSE
|
||||||
|
|
||||||
napkin is free software under the AGPLv3 license.
|
napkin is free software under the AGPLv3 license.
|
||||||
|
|
Loading…
Reference in a new issue