IP Intelligence - Proxy / VPN / Hosting / Bad IP Detection



IP Intelligence is a service that determines how likely an IP address is a proxy / VPN / hosting / bad IP using advanced mathematical and modern computing techniques





The system is serving millions of API requests a week and growing as more people find it useful in protecting their online infrastructure. Our service is used by gaming communities, e-commerce websites, research universities & institutions, law enforcement, and large financial institutions. Feel free to compare the results from this service to any other, including paid options from various vendors.
It is recommended that you fully read the information below before implementation.



How It Works

Given an IP address, the system will return a probabilistic value (between a value of 0 and 1) of how likely the IP is a VPN / proxy / hosting / bad IP. A value of 1 means that IP is explicitly banned (a web host, VPN, or TOR node) by our static lists. Otherwise, the output will return a real number value between 0 and 1, of how likely the IP is bad / VPN / proxy, which is inferred through machine learning & probability theory techniques using dynamic checks with large datasets. Billions of new records are parsed each month to ensure the datasets have the latest information and old records automatically expire. The system is designed to be efficient, fast, simple, and accurate.

Assumptions



Usage & Implementation

Web Interface

The web interface allows you to quickly lookup IPs without touching any code. It is assumed that the IP you're looking up has made requests to your services on an application level. The web interface uses flags=f which requests full bad IP detection including compromised systems. If you wish to skip full bad IP detection, please use the API instead. A full lookup might take up to 5 seconds to complete because results are generated in real-time.




API


Expected Input

The proxy check system takes in an input via HTTP GET request. The URL is http://check.getipintel.net/check.php and the parameter is ip. The system fully supports IPv4 with partial support for IPv6.

Include Your Contact Information

Include your contact information so I can notify you if a problem arise or if there are core changes. In some situations, people query the system in a wrong manner and assume everything is working (but due to the lack of or improper handling of error codes), it's not the case. Since I only have the connecting IP address, I cannot help the person correct the error.
To include your contact information, add another parameter to your request called contact and provide your email.
A typical query looks like:

If you are contacted, please respond in 2 days or the contact information could be considered as inaccurate. Your information will only be used for the purpose of communication with GetIPIntel.

Optional settings for Input:




Expected Output

On a valid request, the system will return a value between 0 - 1 (inclusive) of how likely the given IP is a proxy. On error, a negative value will be returned. If format=json is used, a valid JSON format will be returned with extra information, see below for details.

Interpretation of the Results

If a value of 0.50 is returned, then it is as good as flipping a 2 sided fair coin, which implies it's not very accurate. From my personal experience, values > 0.95 should be looked at and values > 0.99 are most likely proxies. Anything below the value of 0.90 is considered as "low risk". Since a real value is returned, different levels of protection can be implemented. It is best for a system admin to test some sample datasets with this system and adjust implementation accordingly. I only recommend automated action on high values ( > 0.99 or even > 0.995 ) but it's always best to manually review IPs that return high values. For example, mark an order as "under manual review" and don't automatically provision the product for high proxy values. Be sure to experiment with the results of this system before you use it live on your projects. If you believe the result is wrong, don't hesitate to contact me, I can tell you why. If it's an error on my end, I'll correct it. If you email me, expect a reply within 12 hours.

Variations of Implementation

Use Static Ban List Only (Skip Dynamic Check and Bad IP Checks)
If you get a value between 0 - 1, exclusive (like 0.99, 0.99999, 0.97), these values are generated by dynamic checks which looks for characteristics of the given IP. IPs that are either manually banned or seen on a public proxy site will return a value of 1. If you only want manually banned or public proxies, then in your code just look for the value "1". However, there are many IPs that haven't gone through manual review and IPs can change behavior very frequently (which is why dynamic checks exist in the first place). If you only look for the value of "1", then expect to have more proxy / VPN / bad IPs go through your system, however, false positives are less likely if you use the static ban list option.

If you wish to use only manually banned & public proxy IPs, append the parameter &flags=m, the system will only return a result of 0 or 1. This option is the best to start off with that will have a noticeable impact in bot / proxy / VPN traffic, especially if you don't have any data sets to test with the system. The query should look something like
This option is the fastest.
Use Static Ban List and Dynamic Checks Only (Skip Some of the Bad IP Checks)
In this scenario, you want to use dynamic checks as well but you want to skip additional checks to see if the IP is a bad ip (see What do you mean by "Bad IP"?). In this mode, some bad IPs are still detected but the system does not attempt to go through the full bad IPs check because the time for the extra checks vary wildly (between an extra 200ms to 2 seconds). In this mode, false positives are more likely than static ban lists only. Scores are lower compared to the full IP check (without any flag options) because less attributes are considered.

If you wish to use static ban list and dynamic checks only, append the parameter &flags=b. This option is the best if static ban lists isn't catching enough IPs but you don't want to run the full check because it takes too long and/or you want to have a predictable execution time. The query should look something like
This option is slower than static ban lists only, but much faster than the full check (no flags in query). This option is good if you only want proxy / VPN detection and you do not care about bad IPs, but &flags=m is not catching enough proxy / VPN IPs.
Default Lookup
This is the default lookup with no flags. Since the system is designed to work with real-time systems (return a result as fast as possible), some time consuming checks are put into a background process. This allows the system to return a result much faster. If those time consuming checks reveal that the returned result was not accurate (which is rare), the system will adjust the values. However, you must query the service again with the same IP to obtain the new result. Typically, the background jobs take no longer than 5 seconds to complete. If you want to force the system to do a full lookup (no background processes), use &flags=f option.
Force Full Lookup
If you don't mind waiting up to 5 seconds for a result and you want the system to do a full lookup with one query, then use &flags=f option. The query should look something like
This option is the slowest and should only be used on non-real-time applications.

Comparing the Different Flags

Flags Data Sets Used Pros Cons Response Time (No Network Latency) Suggested Use Based on Requirements
flags=m static block lists fastest, smallest chance for false positives IPs that are not on blocklists will get through < 60 ms Least amount of false positives | fastest speeds | ok with letting some IPs through | only care about proxies & VPNs
flags=b static block lists, dynamic checks, some bad IP checks fast, catches more proxy / VPN IPs than flags=m, skips some compromised system detection so complaints from residential users are reduced because most likely the user do not know they're compromised or they received a dirty IP from their ISP higher chance of false positives than flags=m < 130 ms fast speeds, want to let less proxy / VPN IPs through than flags=m | do not want to fully utilize bad IP detection | only care about proxies & VPNs
no flags (default query) static block lists, dynamic checks, full bad IP checks fast, full IP check, a balance between speed and full IP check higher chance of false positives than flags=m | might require 1 more query after 5 seconds to be sure < 130 ms fast speeds, ok with making multiple queries with the same IP
flags=f static block lists, dynamic checks, full bad IP checks forces a full IP check which does not take additional queries to be sure higher chance of false positives than flags=m, slowest < 5000 ms ok with waiting for a full lookup that can take up to 5 secs
oflags option can take multiple character arguments. For example, &oflags=bc will show if the IP is a bad IP and which country the IP belongs to.
Show if the IP is Considered as a Bad IP
Append &oflags=b to your query if you wish to know if the IP is considered as a bad IP. Standard output will append another integer value seperated by a comma. If JSON output is chosen, then an additional element called "BadIP" is added in the results. A value of 1 represents that it is a bad IP, 0 otheriwse. Note that if you are using flags option, the results may vary. For example, &flags=b only does partial lookups for bad IP compared to &flags=f. IPv6 is currently not supported yet for this feature. If the Proxy / VPN / Bad IP score for flags=m return "1" then oflags=b always returns 0 because if an IP is explicitly banned, then all bad IP checks are skipped.
Show which Country the IP belongs to
Append &oflags=c to your query if you wish to know which country the IP belongs to. Standard output will append the country seperated by a comma. If JSON output is chosen, then an additional element called "Country" is added in the results.
All countries are represented by a 2 character country code in ISO-3166 format.
Note that country information is directly pulled from RIRs. Accuracy will vary so consider this as an experimental feature.
IPv6 is not supported yet.
Ouptut Results in JSON Format
Normally, the system outputs a negative value on error, or a value >=0 and <=1 on success. If you're more comfortable with JSON format, append the option &format=json to your queries.

An example of a query that's successful:



An example of a query that results in an error:

Error Codes


The proxy check system will return negative values on error, along with HTTP 400 error:


FAQs

Why is this service free?

I created this project because I couldn't find any good alternatives for a reasonable price. Since I have a masters degree in Computer Science specializing in Networking with interests in Machine Learning and NetSec, it's a fitting project for me to embark on. Compared to a popular paid service, the number of free queries that's being served by GetIPIntel translates to $60,000+/month and I've been told by a few people that GetIPIntel catches more proxies / VPN / bad IPs than said paid service. I'm offering it for free in the spirit of openess. Just because it's free, it does not mean it's bad, inaccurate, easy to develop, or easy to maintain. To keep things simple, please do not abuse this service as a free user and if you need more queries, contact me for a custom plan. If you're feeling generous, BTC / Paypal is at the bottom of the page.

Why is this different from similar services (even paid?)

There are many other services like this one that uses simple block lists, meaning a particular IP / IP block is specifically added or removed either manually or by code from various known/trusted sources. During a lookup, if the IP is on the list, then simply return the result accordingly. However, it's a very limited view because if the IP is not on a list, it doesn't mean it's not a proxy / VPN / bad IP. It means that the simple block list system does not know or have not come across that IP address. To claim an IP address is not a proxy / VPN / bad IP just because the system has never come across the IP is a logical fallacy (see Argument from Ignorance). GetIPIntel uses Machine Learning & Probability Theory techniques to infer on IPs it doesn't have knowledge about (see What are dynamic checks?) and compute the output when you request it using up to date and large data sets. Thus, using a combination of block lists with dynamic checks will produce a more accurate result because the overall system is more intelligent.

What are dynamic checks?

Dynamic checks are used if the IP address is not explicitly listed in the static and dynamic files. The system attempts to retrieve characteristics (or attributes in Machine Learning terms) of the given IP. Based on that data, it uses concepts from Probability Theory and ML boosting techniques to generate an overall result. All results from dynamic checks are computed in real time using large & frequently updated datasets.
In short, dynamic checks allows the system to infer when it doesn't explicitly know if it's a proxy / vpn or not with mathematics.

What do you mean by "Bad IP"?

It refers any combination of crawlers / comment & email spammers / brute force attacks. IPs that are behaving "badly" in an automated manner. Networks that are infected with malware / trojans / botnet / etc are also considered "bad". It may be possible that the user is not aware that their systems are infected or they have received an IP by their ISP that was recently infected with malicious code. If you wish to skip this, see variations of implementation.

How many queries can I make?

There's a rate limit 15 requests / minute to prevent abuse as well as a burst parameter set to ensure smoothing of traffic. If you hit any of these limits, the web server will return a 429 error. Please do not exceed 500 queries per day. The limits may change based on abuse and/or server load which will be posted on twitter and at least one week in advance. If you need guaranteed resources and/or more queries, please contact me. In most cases, the cost is significantly less than other paid services.

What do you offer with your custom plans?

With custom plans, I can provide any amount of queries either as a query pack that do not expire or a monthly plan. All custom plans comes with a default of 300/queries per minute instead of 15 (could go higher if you want), automatic fail-over (2N redundancy), and dedicated resources. Please contact me via email (which is listed below) with your requirements.

Can I cache my results?

Of course but I do not recommend caching a particular value for more than 6 hours. The Internet drastically changes over a short period of time. Hijacked networks pop up and go away relatively quickly. A low scoring IP's behavior can change in a matter of seconds, as well as a high scoring IP. When the system detects an IP with high variance in previous scores, the probability will be recomputed live with the most up to date dataset for accuracy.

Sometimes the API will take longer than average to respond, why?

The free API is on a shared resource pool which means other people's actions can have an effect on your requests. All custom plans are on dedicated resources. If you're interested in one, please contact me.

Will corporate / business IPs generate a high score?

If the business ISP provides hosting or hosting related services on their network, then yes. See Variations of Implementation for a solution or just whitelist the IPs on your own system.

The IP looks like it's on a residential network but the score is high, why?

The IP is most likely a compromised system involved in spamming / brute-forcing / etc. It falls under the category of Bad IPs or it could perhaps be a public proxy that someone runs on their home computer.

What's whitelisted?

Known crawlers such as Google Bot, Yahoo bot, Bing bot, CloudFlare, etc., as well as some known DNS resolvers are whitelisted and will return a result of 0. If you believe there's an error, please contact me.

Is SSL supported?

Yes, but be aware that the time to set up an SSL connection takes longer than a normal HTTP connection.

Why not look for HTTP headers, User agents, etc?

These values can easily be spoofed and therefore, unreliable.

I wrote some code but your server isn't responding. Why?

You might need to change the user-agent as cloudflare block certain connections with a weird user-agent. It shows up as "Bad Browser." If that doesn't work, please contact me and I'll look into the issue.

Code samples to get me started?

There's some code samples on my Github.

Disclaimer

No guaranatees, warranties, blah blah blah, is provided. Use at your own risk. GetIPIntel is not liable for damages of any kind.


Terms of Service

By using this service, you agree to:


Contact

You can find me on twitter, github, or email. If I do not respond to your email within 24 hours then something wrong. Please send an email to my gmail address, or contact me via twitter. Ultimately, I want the system to be as accurate as possible, so please let me know if there are any inaccuracies, I'd like to fix the issue. Let me know if you have any custom requirements such as more queries per minute, skip cache so it always gets the latest data and recompute the result, etc.



BTC: 1Mk7jWhP3EZ5c9bXe5rRbbD14YdX8oMg5E | Paypal: Available if you e-mail me
Last update: Sep 9, 2016 | System Version 3.7.0.5