Wednesday, October 17, 2012

I'm a C++ dinosaur, but I'm OK

So here's a nice challenge. Let's say you have a list of member email addresses which you get from your account list. But you also have a list of email addresses that you have of your customers, addresses that your customers have used to communicate with you. And finally you have a list of email addresses of potential customers, but who are not yet signed up.

Now let's say someone decides to do a survey of future and existing customers, and sends out the survey to a list that purports to be that. And when the numbers of the survey come in.. mass confusion arises because the numbers don't add up.

So we now have a bunch of files, 'email addresses we mailed the survey to', 'main account email addresses', 'potential customer email addresses' and 'other customer email addresses'. And nobody knows why the last three don't add up to the first list.

Did I mention typos and polluted data? The pain.

So what would normal people do? Well, no idea, but we got on the case using 'diff'. This was tremendously helpful, but only up to a point. It is hard to answer questions like 'give me everybody not on list A but who is on B or C'. This is of course all very easy to do from SQL.

But I wasn't feeling like that. So, because C++ is what I do, I wrote a C++ program. I might be a dinosaur like that, but I feel compelled to point out that the program I present below will scale to literally billions and billions of customers. You know, in case you have more users than there are people on the planet.

So what does this small program do? Using slightly modern C++, it reads lines of text from several files. And once it is done, it will print for each unique line of text in which files it was found. This allows you to quickly see for each email address if it was part of 'main account email addresses', but perhaps not of 'potential customer addresses' etc.

Once it has printed that, it prints out a list of those same lines of text per 'configuration'. So you get groups of lines, and one group might represent 'on the list of addresses we emailed to, but not in any of the other files'.  In other words, no clue why we emailed this address.

Another group might represent 'addresses we emailed and only present on the potential customer list'. And finally, and importantly, you might get the group 'we didn't email, but IS actually a customer'.

The output can be read by most spreadsheets, and they'll do the right thing.

All in all, this helped solve a tricky problem, and was actually implemented in less time than it would take to do it by hand. I hope ;-)

// read all input files, output for each line of text in which of the input files it was found
// public domain code by bert.hubert@netherlabs.nl
#include <string>
#include <vector>
#include <map>
#include <iostream>
#include <fstream>
#include <boost/dynamic_bitset.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/foreach.hpp>

using namespace std;

// this allows us to make a Case Insensitive container
struct CIStringCompare: public std::binary_function<string, string, bool>  
{
  bool operator()(const string& a, const string& b) const
  {
    return strcasecmp(a.c_str(), b.c_str()) < 0;
  }
};

int main(int argc, char**argv)
{
  typedef map<string, boost::dynamic_bitset<>, CIStringCompare> presence_t;
  presence_t presence;
  
  string line;
  cout << '\t';
  for(int n = 1; n < argc; ++n) {
    cout << argv[n] << '\t';
    ifstream ifs(argv[n]);
    if(!ifs) {
      cerr<<"Unable to open '"<<argv[n]<<"' for reading\n"<<endl;
      exit(EXIT_FAILURE);
    }
    
    while(getline(ifs, line)) {
      boost::trim(line);
      if(line.empty())
        continue;
      presence_t::iterator iter = presence.find(line);
      if(iter == presence.end()) { // not present, do a very efficient 'insert & get location'
        iter = presence.insert(make_pair(line, boost::dynamic_bitset<>(argc-1))).first;  
      }
      iter->second[n-1]=1;
    }
  }
  cout << '\n';
  
  // this is where we store the reverse map, 'presence groups', so which lines where present in file1, but not file2 etc
  typedef map<boost::dynamic_bitset<>, vector<string> > revpresence_t;
  revpresence_t revpresence;
  
  BOOST_FOREACH(presence_t::value_type& val, presence) {
    revpresence[val.second].push_back(val.first);
    cout << val.first << '\t';
    for (boost::dynamic_bitset<>::size_type i = 0; i < val.second.size(); ++i) {
      cout << val.second[i] << '\t';
    }
    cout << endl;
  }
  
  cout << "\nPer group output\t\n";
  BOOST_FOREACH(revpresence_t::value_type& val, revpresence) {
    cout<<"\nGroup: \t";
    
    for (boost::dynamic_bitset<>::size_type i = 0; i < val.first.size(); ++i) {
      cout << val.first[i] << '\t';
    }
    
    cout << endl;
    
    BOOST_FOREACH(string& entry, val.second) {
      cout << entry << "\t\n";
    }
  }
}

Monday, October 8, 2012

On binding datagram (UDP) sockets to the ANY addresses

This story goes back a long time. For around 10 years now, people have been requesting that PowerDNS learn how to automatically listen on all available IP addresses. And for slightly less than that time, we've been telling people we would not be adding that feature.

For one, if you run a nameserver, you should *know* what IP addresses you listen on! How else could people delegate to you, or rely on you to resolve their queries? Secondly, running services by default on 'all' IP addresses is a security risk. The PowerDNS Recursor for this reason binds to 127.0.0.1 by default.

But still, people wanted this feature, and we didn't do it. Because we knew it'd be hard work. There, the truth is out. But we finally bit the bullet and had to figure out how to do it. This page shares that knowledge, including the fact that the Linux manpages tell you to do the wrong thing.

There are two ways to listen on all addresses, one of which is to enumerate all interfaces, grab all their IP addresses, and bind to all of them. Lots of work, and non-portable work too.  We really did not want to do that. You also need to monitor new addresses arriving.

Secondly, just bind to 0.0.0.0 and ::! This works very well for TCP and other connection-oriented protocols, but can fail silently for UDP and other connectionless protocols. How come? When a packet comes in on 0.0.0.0, we don't know which IP address it was sent to. And this is a problem when replying to such a packet - what would the correct source address be? Because we are connectionless (and therefore stateless), the kernel doesn't know what to do.

So it picks the most appropriate address, and that may be the wrong one. There are some heuristics that make some kernels do the right thing more reliably, but there are no guarantees.

When receiving packets on datagram sockets, we usually use recvfrom(2), but this does not provide the missing bit of data: which IP address the packet was actually sent to. There is no recvfromto(). Enter the very powerful recvmsg(2). Recvmsg() allows for the getting of a boatload of parameters per datagram, as requested via setsockopt().

One of the parameters we can request is the original destination IP address of the packet.

IPv6

For IPv6, this is actually standardized in RFC 3542, which tells us to request parameter IPV6_RECVPKTINFO via setsockopt(), which will lead to the delivery of the IPV6_PKTINFO parameter when we use recvmsg(2).

This parameter is sent to us as a struct in6_pktinfo, and its ipi6_addr member contains the original destination IPv6 address of the query.

When replying to a packet from a socket bound to ::, we have the reverse problem: how to specify which *source* address to use. To do so, use sendmsg(2) and specify an IPV6_PKTINFO parameter, which again contains a struct in6_pktinfo.

And we are done!

To get this to work on OSX, please #define __APPLE_USE_RFC_3542, but otherwise this feature is portable across FreeBSD, OSX and Linux. (Please let me know about Windows, I want to make this page as valuable as possible).

IPv4
For IPv4 the situation is more complicated. Linux and the BSDs picked a slightly different way to do things, since they did not have an RFC to guide them. Confusingly, the Linux manpages document this incorrectly (I'll submit a patch to the manpages as soon as everybody agrees that this page describes things correctly).

For BSD, use a setsockopt() called IP_RECVDSTADDR to request the original destination address. This then arrives as an IP_RECVDSTADDR option over recvmsg(), which carries a struct in_addr, which does NOT necessarily have all fields filled out (like for example the destination port number).

For Linux, use the setsockopt() called IP_PKTINFO, which will get you a parameter over recvmsg() called IP_PKTINFO, which carries a struct in_pktinfo, which has a 4 byte IP address hiding in its ipi_addr field.

Conversely, for sending on Linux pass a IP_PKTINFO parameter using sendmsg()  and make it contain a struct in_pktinfo. 

On FreeBSD, pass the IP_SENDSRCADDR option, and make it contain a struct in_addr, but again note that it probably does not make sense to set the source port in there, as your socket is bound to exactly one port number (even if it covers many IP addresses).

Binding to :: for IPv6 *and* IPv4 purposes

On Linux, one can bind to :: and get packets destined for both IPv6 and IPv4. The good news is that this combines well with the above, and Linux delivers an IPv4 IP_PKTINFO for IPv4 packets, and will also honour the IP_PKTINFO for outgoing IPv4 packets on such a combined IPv4/IPv6 socket.

On FreeBSD, and probably other BSD-derived systems, one should bind explicitly to :: and 0.0.0.0 to cover IPv4 and IPv6. This is probably better. To get this behaviour on Linux, use the setsockopt() IPV6_V6ONLY, or set /proc/sys/net/ipv6/bindv6only to 1.

Actual source code

To see all this in action, head over to http://wiki.powerdns.com/trac/browser/trunk/pdns/pdns/nameserver.cc - it contains the relevant setsockopt(), sendmsg() and recvmsg() calls.