DomainClassifier/params.json

1 line
No EOL
7.6 KiB
JSON

{"name":"Domainclassifier","tagline":"DomainClassifier is a simple Python library to extract and classify Internet domains/hostnames/IP addresses from raw text files following their existence, localization or attributes.","body":"DomainClassifier\r\n================\r\n\r\nDomainClassifier is a simple Python library to extract and classify Internet\r\ndomains/hostnames/IP addresses from raw text files following their existence,\r\nlocalization or attributes.\r\n\r\nDomainClassifier can be used to extract Internet hosts from any free texts.\r\n\r\n![An overview of the DomainClassifier methods](https://raw.github.com/adulau/DomainClassifier/master/doc/domainclassifier-flow.png)\r\n\r\nHow To Use It\r\n-------------\r\n\r\n```python\r\nimport domainclassifier\r\n\r\nc = domainclassifier.Extract( rawtext = \"www.xxx.com this is a text with a domain called test@foo.lu another test abc.lu something a.b.c.d.e end of 1.2.3.4 foo.be www.belnet.be ht\r\ntp://www.cert.be/ www.public.lu www.allo.lu quuxtest www.eurodns.com something-broken-www.google.com www.google.lu trailing test www.facebook.com www.nic.ru www.youporn.com 8.8.8.\r\n8 201.1.1.1\")\r\n\r\n# extracting potentially valid domains from rawtext\r\nprint c.domain()\r\n\r\n# reduce set of potentially valid domains to existing domains\r\n# (based on SOA,A,AAAA,CNAME,MX records)\r\nprint c.validdomain(extended=True)\r\n\r\n# reduce set of valid domains with DNS records associated to a\r\n# specified country\r\nprint \"US:\"\r\nprint c.localizedomain(cc='US')\r\nprint \"LU:\"\r\nprint c.localizedomain(cc='LU')\r\nprint \"BE:\"\r\nprint c.localizedomain(cc='BE')\r\nprint \"Ranking:\"\r\nprint c.rankdomain()\r\n\r\n# extract valid IPv4 addresses (using the potential list of valid domains)\r\nprint \"List of ip addresses:\"\r\nprint c.ipaddress(extended=True)\r\n\r\n# some more filtering\r\nprint \"Include dot.lu:\"\r\nprint c.include(expression=r'\\.lu$')\r\nprint \"Exclude dot.lu:\"\r\nprint c.exclude(expression=r'\\.lu$')\r\n```\r\n\r\n### Sample output\r\n\r\n```python\r\n['www.xxx.com', 'foo.lu', 'abc.lu', 'a.b.c.d.e', '1.2.3.4', 'foo.be', 'www.belnet.be', 'www.cert.be', 'www.public.lu', 'www.allo.lu', 'www.eurodns.com', 'something-broken-www.google.com', 'www.google.lu', 'www.facebook.com', 'www.nic.ru', 'www.youporn.com', '8.8.8.8', '201.1.1.1']\r\n[('www.xxx.com', 'A', <DNS IN A rdata: 67.23.112.226>), ('abc.lu', 'SOA', <DNS IN SOA rdata: neptun.vo.lu. Administrator.vo.lu. 2006063001 86400 7200 2419200 3600>), ('abc.lu', 'MX', <DNS IN MX rdata: 10 proteus.vo.lu.>), ('foo.be', 'A', <DNS IN A rdata: 188.65.217.78>), ('foo.be', 'AAAA', <DNS IN AAAA rdata: 2001:6f8:202:2df::2>), ('foo.be', 'SOA', <DNS IN SOA rdata: ka.quuxlabs.com. adulau.foo.be. 2010121901 21600 3600 604800 86400>), ('foo.be', 'MX', <DNS IN MX rdata: 10 mail.foo.be.>), ('www.belnet.be', 'A', <DNS IN A rdata: 193.190.130.15>), ('www.belnet.be', 'AAAA', <DNS IN AAAA rdata: 2001:6a8:3c80:8300::15>), ('www.belnet.be', 'CNAME', <DNS IN CNAME rdata: fiorano.belnet.be.>), ('www.cert.be', 'A', <DNS IN A rdata: 193.190.198.61>), ('www.cert.be', 'AAAA', <DNS IN AAAA rdata: 2001:6a8:3c80::61>), ('www.cert.be', 'SOA', <DNS IN SOA rdata: ns.belnet.be. hostmaster.belnet.be. 2013053039 360 180 1209600 3600>), ('www.cert.be', 'MX', <DNS IN MX rdata: 10 asp-mxa.belnet.be.>), ('www.cert.be', 'CNAME', <DNS IN CNAME rdata: cert.be.>), ('www.public.lu', 'A', <DNS IN A rdata: 194.154.200.74>), ('www.allo.lu', 'A', <DNS IN A rdata: 80.90.47.69>), ('www.eurodns.com', 'A', <DNS IN A rdata: 80.92.65.165>), ('www.google.lu', 'A', <DNS IN A rdata: 173.194.66.94>), ('www.google.lu', 'AAAA', <DNS IN AAAA rdata: 2a00:1450:400c:c03::5e>), ('www.facebook.com', 'A', <DNS IN A rdata: 31.13.64.1>), ('www.facebook.com', 'AAAA', <DNS IN AAAA rdata: 2a03:2880:10:8f07:face:b00c::1>), ('www.facebook.com', 'MX', <DNS IN MX rdata: 10 msgin.t.facebook.com.>), ('www.facebook.com', 'CNAME', <DNS IN CNAME rdata: star.c10r.facebook.com.>), ('www.nic.ru', 'A', <DNS IN A rdata: 194.85.61.42>), ('www.nic.ru', 'MX', <DNS IN MX rdata: 0 nomail.nic.ru.>), ('www.youporn.com', 'A', <DNS IN A rdata: 31.192.116.24>), ('www.youporn.com', 'SOA', <DNS IN SOA rdata: pdns1.ultradns.net. dns.manwin.com. 2012041840 86400 86400 86400 86400>), ('www.youporn.com', 'MX', <DNS IN MX rdata: 20 smtp-scan01.mx.reflected.net.>), ('www.youporn.com', 'CNAME', <DNS IN CNAME rdata: youporn.com.>)]\r\nUS:\r\n[('www.xxx.com', 'A', <DNS IN A rdata: 67.23.112.226>), ('www.google.lu', 'A', <DNS IN A rdata: 173.194.66.94>)]\r\nLU:\r\n[('www.public.lu', 'A', <DNS IN A rdata: 194.154.200.74>), ('www.allo.lu', 'A', <DNS IN A rdata: 80.90.47.69>), ('www.eurodns.com', 'A', <DNS IN A rdata: 80.92.65.165>)]\r\nBE:\r\n[('foo.be', 'A', <DNS IN A rdata: 188.65.217.78>), ('www.belnet.be', 'A', <DNS IN A rdata: 193.190.130.15>), ('www.belnet.be', 'CNAME', <DNS IN CNAME rdata: fiorano.belnet.be.>), ('www.cert.be', 'A', <DNS IN A rdata: 193.190.198.61>), ('www.cert.be', 'CNAME', <DNS IN CNAME rdata: cert.be.>)]\r\nRanking:\r\n[(1.0, 'www.youporn.com'), (1.0, 'www.youporn.com'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.cert.be'), (1.0000120563271599, 'www.cert.be'), (1.0000372023809501, 'foo.be'), (1.0001395089285701, 'www.public.lu'), (1.00015419407895, 'www.allo.lu'), (1.0003662109375, 'www.eurodns.com'), (1.0004111842105301, 'www.xxx.com'), (1.0005944293478299, 'www.nic.ru'), (1.0024646577381, 'www.facebook.com'), (1.0024646577381, 'www.facebook.com'), (1.002635288165, 'www.google.lu')]\r\nList of ip addresses:\r\n('15169', 'AU', <DNS IN TXT rdata: \"15169 | 1.2.3.0/24 | AU | apnic | 2011-08-11\">)\r\n('15169', 'US', <DNS IN TXT rdata: \"15169 | 8.8.8.0/24 | US | arin | 1992-12-01\">)\r\n('27699', 'BR', <DNS IN TXT rdata: \"27699 | 201.1.0.0/17 | BR | lacnic | 2003-12-08\">)\r\nset([('201.1.1.1', '(\\'27699\\', \\'BR\\', <DNS IN TXT rdata: \"27699 | 201.1.0.0/17 | BR | lacnic | 2003-12-08\">)'), ('8.8.8.8', '(\\'15169\\', \\'US\\', <DNS IN TXT rdata: \"15169 | 8.8.8.0/24 | US | arin | 1992-12-01\">)'), ('1.2.3.4', '(\\'15169\\', \\'AU\\', <DNS IN TXT rdata: \"15169 | 1.2.3.0/24 | AU | apnic | 2011-08-11\">)')])\r\nInclude dot.lu:\r\n['abc.lu', 'abc.lu', 'www.public.lu', 'www.allo.lu', 'www.google.lu', 'www.google.lu']\r\nExclude dot.lu:\r\n['www.xxx.com', 'foo.be', 'foo.be', 'foo.be', 'foo.be', 'www.belnet.be', 'www.belnet.be', 'www.belnet.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.eurodns.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.nic.ru', 'www.nic.ru', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com']\r\n```\r\n\r\n### Software Required\r\n\r\n* Python (tested successfully on version 2.6)\r\n* dnspython library - http://www.dnspython.org/\r\n* IPy library\r\n\r\n### License\r\n\r\nCopyright (C) 2012-2013 Alexandre Dulaunoy - a(at)foo.be\r\n\r\nThis program is free software: you can redistribute it and/or modify\r\nit under the terms of the GNU Affero General Public License as\r\npublished by the Free Software Foundation, either version 3 of the\r\nLicense, or (at your option) any later version.\r\n\r\nThis program is distributed in the hope that it will be useful,\r\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\r\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\r\nGNU Affero General Public License for more details.\r\n\r\nYou should have received a copy of the GNU Affero General Public License\r\nalong with this program. If not, see <http://www.gnu.org/licenses/>.\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}