A new default option has been introduced in DomainClassifier which
is the validTLD option (enable by default). Based on the assigned
list of TLD, the extraction of potential domains is filtered to the
IANA assigned list.
If you are extracting the data for non-assigned/internal TLDs, you can disable
the default option with validTLD=False on the potentialdomain function.
The list of assigned TLDs is downloaded from IANA.
This method extracts valid IPv4 addresses from raw text. The validation
is done using the standard socket call. The extended parameter adds the
origin of the IP address via Cymru IP/ASN service.
The class has been extended to add the localizedomain
method to geolocalize DNS records associated for an existing domain.
The localization rely on the Team Cymru ip2asn lookup via DNS.
The class domainclassifer got two methods:
- domain() to extract all potential domains from a raw text
The method returns a list.
- validdomain() returning all the existing domains based on their
known DNS records sets like A,AAAA or CNAME records.
The method returns a set. If the extended option is requested, it's a list
of tuples containing the domain with their existing DNS records and
their returned data.