Ricebridge
Search This Site
May 14 2008 23:53 UTC


$15 Gift Certificate for every bug you find.

MaxMind GeoIP
This IP address to country database is provided as a CSV file that can easily be read by CSV Manager.

Got a question for us?
Just Ask!

Bookmark MaxMind IP Locator and CSV Manager Java Component at del.icio.us Digg MaxMind IP Locator and CSV Manager Java Component at Digg.com Bookmark MaxMind IP Locator and CSV Manager Java Component at reddit.com Bookmark MaxMind IP Locator and CSV Manager Java Component at YahooMyWeb Bookmark MaxMind IP Locator and CSV Manager Java Component at Spurl.net Bookmark MaxMind IP Locator and CSV Manager Java Component at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this MaxMind IP Locator and CSV Manager Java Component at blinklist.com Bookmark MaxMind IP Locator and CSV Manager Java Component at Furl.net Fark MaxMind IP Locator and CSV Manager Java Component at Fark.com

Using the MaxMind IP Address Location Database with CSV Manager

CSV Manager can be used to easily and quickly load the IP address lookup database from MaxMind, allowing you to determine which country an IP address belongs to. MaxMind supplies this database in a number forms, including as a CSV file.

PLEASE NOTE: We are an affiliate of MaxMind.

Which Country Does That IP Address Come From?

So you want to know where your website visitors are coming from? You can solve this problem by using an IP Address (example: 216.239.57.99) to two-letter Country Code (example: US) lookup table. This table lists all known IP addresses beside the country they come from.

It's a pretty big table. MaxMind supplies this table as a CSV file (among other formats), and Ricebridge supplies the CSV file reader. Now all you have to do is:

  • Load the data into a database from the CSV file
  • Query the database to find the Country Code
  • Update the database each week with changes

It's a simple case of programming!

What Does the Data Look Like?

MaxMind CSV Data

We are going to use the free MaxMind GeoLite Country Database. Commercial versions with higher accuracy are also available in the same data format, so this article still applies to them.

The data consists of rows with six fields. Each row maps a range of IP addresses to a single country, by giving a start and end IP address. The data fields we are interested in are the 3rd, 4th and 5th; the starting IP number, the ending IP number, and the Country Code.

What are IP numbers? Well, instead of 255.255.255.255, which is just 4 bytes stuck together, we can say 0xFFFFFFFF in hex, or 4294967295 in decimal. So the IP number is just the IP address interpreted as a number.

Database Table Columns

CreateDatabase.sql

CREATE TABLE IPLookup (
  BeginIPNum  BIGINT NOT NULL,
  EndIPNum    BIGINT NOT NULL,
  CountryCode SMALLINT NOT NULL
);

See the MySQL Numeric Types reference page for more details about these column data types. We have used signed types to keep things simple for coding in Java, but here's an description of how to squeeze the database size down.

OK, let's design a database table for these three fields. We're going to use MySQL for our examples, but the SQL should be quite portable.

We want to store the two IP numbers, and the country they map to. First, we'll assign numbers to the Country Codes and store them in a 2-byte integer field - hey, there could easily be more than 256 countries in a few years time. For the purposes of this article, we will hard-code the list of two letter Country Codes that these numbers are assigned to, but you can easily change this in your own code.

The IP numbers need to be stored in data fields that will keep them positive. If you try to jam them into a 4-byte integer field, then IP numbers with the leftmost bit set will turn into negative numbers. Of course, if your database has unsigned types, go ahead and use those.

Using CSV Manager to Load the Data

CSV Manager Save Process

OK, we've got a CSV file on one side, and a database table on the other. Sounds like a job for CSV Manager!

We want to create an INSERT statement for each row, and execute these as fast as possible without wasting memory. We could read all the rows into memory at once, with the CsvManager.load(File) method, but if we define our own LineListener, we can load and save the data as a continuous stream.

CSV Manager LineListeners are simple objects that wait around listening for new CSV data records, one for each line of the CSV file. This data is handed over as a String[], which is guaranteed to contain the minimum number of data fields.

A LineListener to Insert the Data

MaxMindInserter.java

public class MaxMindInserter extends LineListenerSupport {
  protected Connection        iConnection;
  protected PreparedStatement iInsertStatement;

  public MaxMindInserter( Connection pConnection  ) {
    iConnection = pConnection;
  }

  public void startLoadImpl() {
    iInsertStatement = iConnection.prepareStatement( 
      "INSERT INTO IPLookup VALUES( ?,?,? )" );
  }

  public BadLine addLineImpl( String[] pLine, 
    int pNumFields, long pLineNumber, String pOriginalLine ) 
  {
    MaxMindLineData mmld = new MaxMindLineData( pLine );
    if( mmld.valid() ) {
      short ci 
        = ((Short) MaxMindDef.sCountryMap.get( 
          mmld.getCountryCode() )).shortValue();
      iInsertStatement.setLong(   1, mmld.getBeginIPNum() );
      iInsertStatement.setLong(   2, mmld.getEndIPNum() );
      iInsertStatement.setShort(  3, ci );
      iInsertStatement.executeUpdate();
    }
    return null;
  }

  public void endLoadImpl() {
    iInsertStatement.close();
  }
}

PLEASE NOTE: the above code has been condensed for this example

This is our custom code to actually insert the data into the database. We have used very simple JDBC database access code for this example. In your own code you will probably use something more modern like a DataSource or a connection pool.

The INSERT statement is loaded with data and executed in the addLineImpl method. We use a utility class, MaxMinLineData, to convert the text data into the correct data types for our database. We will cover this class in the next section.

The MaxMindLineData class is a good example of a CSV Manager custom LineListener. As each line of the CSV file is read, it is inserted directly into the database. Thus we do not have to load the entire data file into memory.

Converting Strings into Numbers

MaxMindLineData.java

String iCountryCode = pLine[2];
long iBeginIPNum  
  = Long.parseLong( pLine[3] );
long iEndIPNum    
  = Long.parseLong( pLine[4] );

The variable pLine refers to the String[] parameter passed in by MaxMindInserter, as shown above. The 3rd, 4th and 5th data fields (BeginIPNum, EndIPNum, CountryCode) are at index 2,3 and 4 in the array, respectively.

The CSV data that we get from MaxMind is provided as plain text data. That means we have to convert the String value of each data field into the correct Java data type so that we save it into the database.

This is where we have to be careful. A 4-byte IP address can fit inside a Java int variable, but all IP Addresses above 127.255.255.255 will be converted into negative numbers. Thus we have to store the IP address in a long variable to prevent this.

For the Country Code, we do a quick lookup from our hard coded HashMap in the MaxMindInserter class. This gives us a short variable.

Lock and Load! Let's Get the Data in the Database!

MaxMindManager.java

DriverManager.registerDriver( 
  new com.mysql.jdbc.Driver() );
Connection conn = DriverManager.getConnection( 
  "jdbc:mysql://localhost/testdb", 
  "testuser", "testpass" );

CsvManager      csvman = new CsvManager();
MaxMindInserter mmi    = new MaxMindInserter( conn );
csvman.load( pDataFile, mmi );

Again, condensed. For full code, see MaxMindManager.java. You will have to replace testdb, testuser and testpass with correct values for your own database. And you will need to use a different database driver if you are not using MySQL.

CSV Data Saved Into the Database

This code shows how to actually insert the data. It's only a few lines, as we have done all the hard work already. CSV Manager of course takes care of the CSV file for us.

How long does the data take to load? There are about 75-80K lines in the MaxMind CSV file. On a desktop machine, they usually take about 2 minutes to load fully into the IPLookup database table. On a high-spec database server, you'll be an order of magnitude (or more) faster.

If you drop and insert the entire table each time the CSV file changes, this delay can still be annoying. Later in this article we will present a much faster solution.

So Where Are You From?

Earth
MaxMindManager.java

ArrayList sProxyHeaders = new ArrayList();
sProxyHeaders.add("HTTP_X_FORWARDED_FOR");
sProxyHeaders.add("HTTP_FORWARDED");
sProxyHeaders.add("HTTP_CLIENT_IP");    

String ipaddr 
  = pHttpServletRequest.getRemoteAddr();
for( Iterator hI 
       = sProxyHeaders.iterator(); 
     hI.hasNext(); ) {
  String ph = (String) hI.next();
  String v  = pRequest.getHeader( ph );
  if( null != ph 
      && null != v && !"".equals( v ) ) {
    ipaddr = v;
  }
}

The ipaddr variable will contain the most accurate IP address for this visitor.

Alright. Now we have a visitor on our website. We can get their IP address from the TCP/IP packets they send in order to request HMTL pages. If they are behind a proxy or firewall, the packet IP address might be incorrect, so we also have to check the following HTTP headers:

  • HTTP_X_FORWARDED_FOR
  • HTTP_FORWARDED
  • HTTP_CLIENT_IP

Once we have IP Address of our visitor, we convert it to an IP number (we'll look at that in a minute). Once we have the IP number, we can use this to query the database. We want to identify the database record where the visitor IP number is in the IP number range assigned to a particular country. We do this by searching for the row where the visitor IP number is greater-than-or-equal-to the BeginIPNum column, and less-than-or-equal-to the EndIpNum column.

IP Address to Number conversion

MaxMindManager.java

String[] quads = pIPAddress.split( "\\." );
long ipnum 
  = (long)16777216*Long.parseLong(quads[0]) 
  + (long)65536*Long.parseLong(quads[1])  
  + (long)256*Long.parseLong(quads[2]) 
  + (long)1*Long.parseLong(quads[3]);

We are using the regular expression \. to split the IP address text. We have escaped the dot as it normally means match any character.

Let's return to the question of how to convert the text of an IP address, a string such as 255.255.255.255, into an IP number, such as 4294967295.

All we have to is take each byte and multiply it by two to the power of the number of bits we have to shift it left to get it in the right position. Take 128 in the IP address 255.255.128.255. We need to move it 8 bits left. Or look at it another way, we need 128 multiplied by 100000000 in binary. So we end up with 128 * 256.

Getting the Country Code from the Database

MaxMindManager.java

String    cc  = "US"; // default
String    ipq = "SELECT CountryCode FROM IPLookup WHERE "
              + "BeginIPNum <= "+ipnum
              + " AND "+ipnum+" <= EndIPNum";
short     ci = -1;
Statement st = pConnection.createStatement();
ResultSet rs = st.executeQuery( ipq );
if( rs.next() ) {
  ci = rs.getShort(1);
}
if( -1 != ci ) {
  cc = MaxMindDef.sCountry[ci];
}

The variable ipnum holds the IP number. The class MaxMindDef.java defines the hard-coded Country Codes. The variable cc holds the two letter Country Code.

Well, this is the biggie. Finally we get to find out which country our visitor is coming from.

All we have to do is query the database, and request the Country Code for the IP number range that includes our visitor's IP Number.

Then we dereference the country index in our hard-coded list of Country Code's to get the two-letter Country Code.

What About the Monthly Updates?

Aztec Calendar

So we're pretty much done. Or are we? The IP address to country table is not static. In fact, it changes all the time. MaxMind publishes an update to the free version every month, and an update to the higher-accuracy commercial version every week.

So every week, or every month, you have to delete the old table from your database, and import the new table. That's nice. Or rather, it's not very nice at all, since your IP lookup code will fail during this update. If you have high visitor volume this is not really what you want. Even if you have low visitor volume, it's still bound to happen just when an important buyer is on your site (trust me, I've been there).

So what to do? Well, it would be much better to take each update, figure out which records are new, and which records are old, add the new records, delete the old ones, and keep the database table working. So that's what we're going to do.

An Nice Algorithm for Updates

IP Concatenation

To find out which records are new and which are old, we have to compare the new IP address table with the old table we have in the database. The easy way to do this is just to go through each table, and for the new table, add anything not in the old, and for the old, delete anything not in the new. Unfortunately, that approach is very wasteful of resources and will take longer than simply deleting and loading the entire new table.

Instead, let's look at the data again. It's just numbers, even the country code. So we can append them all together to get a really big number. We have 4 bytes each for the start and end IP numbers, and 2 bytes for the Country Code. So that gives us a 10 byte number. This really big number defines the IP address range to country mapping entirely and it is all we need.

Sorted!

Add Drop Algorithm

So how do we use these numbers? Well, for starters, we sort them. Luckily, MaxMind has already sorted the CSV file, so long as we build our big number in the following order: BeginIPNum, EndIPNum, CountryCode. We can use the database to sort the old table. In fact we only need to sort by BeginIPNum, since we know this is unique (Otherwise the MaxMind CSV file would be wrong).

Now we have two sorted sets of numbers, and we can apply a very simple batch processing algorithm to them: starting from the first position in both lists, keep moving to the next position so long as the numbers at the current position are equal (that is, we don't need to add or delete them, they have stayed the same). If the number in the new list is less than the number in the old, then we know we have to add it - it must be missing as the old list has already passed over it. Similarly, if the new number is greater than the old number, then delete the number in the old list - it can't be in the new list because we've already passed over it.

Yeah, But That's A Lot of Data

MaxMindLineListener.java

public class MaxMindLineListener 
  extends LineListenerSupport {

  protected long[]  iRanges;
  protected short[] iCountries;
  protected int     iNumLines;
  protected int     iIndex;

  public MaxMindLineListener( int pNumLines ) {
    iNumLines = pNumLines;
  }

  public void startLoadImpl() {
    iRanges    = new long[iNumLines];
    iCountries = new short[iNumLines];
    iIndex     = 0;
  }

  public BadLine addLineImpl( String[] pLine, 
    int pNumFields, long pLineNumber, 
    String pOriginalLine ) {

    long  start = Long.parseLong( pLine[2] );
    long  end   = Long.parseLong( pLine[3] );
    short ci    
      = ((Short) MaxMindDef.sCountryMap.get(
                   pLine[4])).shortValue();
    
    iRanges[iIndex] = (start << 32) | end;
    iCountries[iIndex] = ci;
    iIndex++;

    return null;
  }
}

The variable iNumLines is the estimated maximum number of lines in the new CSV file. Luckily this is quite easy to work out, as the average minimum length is probably a bit less than 64 characters (it's easy to see this by inspecting the CSV file). To estimate the maximum number of lines, we just divide the CSV file size by 64.

It is. We need to get the new data in the latest CSV file, and the old data in the database table into a common format in main memory so that we can run the comparison algorithm.

Now, we can't store a 10 byte integer natively in java, but we certainly want to use native operations, as they are the fastest. So we'll split our number into two parts: a long and a short. We can then have two arrays, one long[] and one short[] array, for each data table. If we assume that each table has 100K rows (in reality we have about 75-80k), then we require about 1.6Mb of main memory: 2 * (8+2)*100000 / (1024*1024). By default, the Java virtual machine starts with 64Mb, so we should be OK.

We will store two IP numbers (4 bytes) in one long (8 bytes). First we shift the start IP number 32 bits (4 bytes) left. Then we OR it with the end number. We'll have to perform similar trickery to get the IP numbers out again. We use shorts for the countries to allow for additional values in future.

We use the MaxMindLineListener to load the data in the latest CSV file and store it in the long and short arrays.

Compare and Contrast

MaxMindManager.java

public void compare( 
  long[] pNewRanges, short[] pNewCountries, int pNewMax,
  long[] pOldRanges, short[] pOldCountries, int pOldMax,
  ArrayList pAdd, ArrayList pDelete ) {

  long[]  nr = pNewRanges; short[] nc = pNewCountries;
  long[]  or = pOldRanges; short[] oc = pOldCountries;
  int nmax = pNewMax; int omax = pOldMax;

  // ns: new start ip, ne: new end ip, nk: new country
  // os: old start ip, oe: old end ip, ok: old country
  long ns, ne, os, oe; short nk, ok; int nI, oI; 

  while( nI < nmax && oI < omax ) {
    ns = nr[nI] >>> 32; ne = nr[nI] & 0xFFFFFFFFL; 
    nk = nc[nI];
    os = or[oI] >>> 32; oe = or[oI] & 0xFFFFFFFFL; 
    ok = oc[oI];

    if( ns == os && ne == oe && nk == ok ) {
      nI++; oI++;
    }
    else if( ns < os || ( ns == os && ne < oe )
             || ( ns == os && ne == oe && nk < ok ) ) {
      pAdd.add( new Record( ns, ne, nk ) );
      nI++;
    }
    else if( ns > os || ( ns == os && ne > oe )
             || ( ns == os && ne == oe && nk > ok ) ) {
      pDelete.add( new Record( os, oe, ok ) );
      oI++;
    }
  }

  if( nI < nmax ) {
    for( ; nI < nmax; nI++ ) {
      ns = nr[nI] >>> 32; ne = nr[nI] & 0xFFFFFFFFL; 
      nk = nc[nI];
      pAdd.add( new Record( ns, ne, nk ) );
    }
  }
  if( oI < omax ) {
    for( ; oI < omax; oI++ ) {
      os = or[oI] >>> 32; oe = or[oI] & 0xFFFFFFFFL; 
      ok = oc[oI];
      pDelete.add( new Record( os, oe, ok ) );
    }
  }
}

This code has been heavily condensed, see the MaxMindManager source code for the full version. The Record class refers to a simple internal class for holding the start, end and country values;

The old data from the database table can be pulled out with a simple query and placed into similar long and short arrays. Once we have the new and old data in the same data structures, we can run the comparison algorithm. This algorithm will return a list of new records that we have to add to the database, and a list of old records that we have to delete.

The key question when implementing the algorithm is, how do we compare the 10 byte numbers? Well, it's actually quite easy - we just compare the parts. Since we have decided on the order BeginIPNum, EndIPNum, CountryCode, we just need to compare the BeginIPNums with each other, and then the EndIPNums and finally the CountryCodes.

You can see how this works in the comparison method code opposite. But there's one more catch. We still need to get valid numbers to compare with. We have to carefully extract the two ints that we have stored in a long, making sure that we don't end up with negative values. That's why we use the >>> operator (right shift with zeros) for the start IP number, and mask off the end IP number with 0xFFFFFFFFL (the L suffix denoting a long value).

Note that we also use two loops at the end to handle values at the end of each list that are greater than the last value of the other list.

The Additions and Deletions

Curta Calculator

After all that bit fiddling, doing the actual inserts and deletes is nice and straight forward. We construct INSERT statements for the Records in the add list, and DELETE statements for the Records in the delete list.

Our example code in MaxMindManager shows how to do this in the insertRecords and deleteRecords methods. The approach taken is very simplistic and you will probably want to convert the code to the preferred data access mechanism of your application. Certainly, it would be nice to make the full update into a single transaction.

We have also included a test class MaxMindTest, so that you can see how to use the MaxMindManager class from client code. This test class requires JUnit in order to compile and run.

The Source Code

The source code to fully implement all the functionality described in this article is made freely available under a BSD-style license. This means that you can freely use it for commercial or private use, without any restrictions.

All the required classes are listed opposite, linked to a downloadable text file of the source code.

Comments and Suggestions

Contact Us

Thanks for taking the time to read this article. We hope you've found something of use here. We'd love to hear what you think, and we'd definitely like to hear about anything that we got wrong, so please send us your comments and suggestions.

Finally, if you'd like to get an email whenever we publish a new article, sign up here:

Name:

Email:



Best Regards,
Richard Rodger
Ricebridge

PS. If you liked this article, please bookmark it. Thanks!
Bookmark MaxMind IP Locator and CSV Manager Java Component at del.icio.us Digg MaxMind IP Locator and CSV Manager Java Component at Digg.com Bookmark MaxMind IP Locator and CSV Manager Java Component at reddit.com Bookmark MaxMind IP Locator and CSV Manager Java Component at YahooMyWeb Bookmark MaxMind IP Locator and CSV Manager Java Component at Spurl.net Bookmark MaxMind IP Locator and CSV Manager Java Component at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this MaxMind IP Locator and CSV Manager Java Component at blinklist.com Bookmark MaxMind IP Locator and CSV Manager Java Component at Furl.net Fark MaxMind IP Locator and CSV Manager Java Component at Fark.com




comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2008 Ricebridge. All Rights Reserved.