Opened 14 years ago

Last modified 11 years ago

#175 assigned defect

Connections from scripts to scripts bypass LVS and go to the same host

Reported by: andersk Owned by: mitchb
Priority: normal Milestone:
Component: internals Keywords:
Cc:

Description

which is a problem for connections that are supposed to go to the primary.

Change History (16)

comment:1 Changed 13 years ago by adehnert

I suppose we could supplement scripts-director with something that's bound by the directors and always goes to the primary instead...

What's the reason for the machines binding the scripts IP?

comment:2 Changed 13 years ago by mitchb

The machines must bind all load-balanced addresses as loopback addresses in order for LVS to work at all. If they don't have the address bound on an interface, they'll ignore all traffic destined for the scripts balanced IP addresses that the directors send towards their MACs.

comment:3 Changed 13 years ago by mitchb

scripts / trac-#175 / adehnert (Alex Dehnert)

"Can we do weird things with routing rules?"

scripts / trac-#175 / mitchb (BREAKFAST.COM halted... cereal port not res

I don't think there's an easy right answer here, because in a good number of cases, it's *correct* behavior for an app on one machine that needs to fetch pages to just get them from the same machine - why should a single instance of an app tie up multiple realservers?

scripts / trac-#175 / mitchb (And on the seventh day, He exited from appe

I suppose we could put in iptables rules to know about what we classify on the directors as FWM[23] and route *other* stuff initially destined for the balanced addresses to scripts-primary.mit.edu or something.

scripts / trac-#175 / andersk (Anders Kaseorg)

That would be a fine optimization, though I don’t think it would be terrible for everything to go to scripts-primary, because I doubt we have much loopback traffic to FWM[23].

scripts / trac-#175 / mitchb (And on the seventh day, He exited from appe

While that may be true (I'm hesitant to assume it), I already worry a lot about the primary being overtaxed because it's treated on equal footing by the LVS directors with respect to web requests, and yet it has to run *all* the cron jobs and persistent daemons. This won't change with hacron, either - the role will just be able to move automatically.

scripts / trac-#175 / andersk (Anders Kaseorg)

If that’s your worry, shouldn’t you be much more worried about all the FWM1 traffic that originates from _outside_ scripts?

scripts / trac-#175 / mitchb (AUI!!! I'm bleeding!)

Not really; that's all directed at those persistent daemons I already mentioned, no? What else would go there and be accepted? (Except perhaps LDAP queries, and I've already commented that I think we should consider adding port 389 to FWM2, but got no reply.)

Last edited 11 years ago by andersk (previous) (diff)

comment:4 Changed 13 years ago by quentin

The plan to solve this with hacron is to have the primary bind an additional IP address on the backend network and use /etc/hosts on the servers to point the name "scripts.mit.edu" at that IP.

comment:5 Changed 13 years ago by mitchb

Yeah... let's not do that, and do what I said instead, for the reasons I already gave.

comment:6 Changed 13 years ago by andersk

We cannot solve this with /etc/hosts because scripts.mit.edu is not the only affected hostname.

comment:7 Changed 13 years ago by adehnert

#195 is sorta another way of viewing this.

comment:8 Changed 13 years ago by mitchb

  • Owner set to mitchb
  • Status changed from new to assigned

It occurred to me tonight that this isn't as simple as managing scripts-primary.mit.edu directly through LVS as I planned and having it do the same thing as FWM 1 does now, because for that to work, we'd have to bind the scripts-primary IP as a loopback on all the realservers, and then we're back to square 1. But I think I have a solution that's actually better.

We'll have the directors actually tag packets destined for scripts-primary's IP with FWM 1 and then DNAT them to 18.181.0.43 and let them run through the actual existing scripts.mit.edu pool for the primary. (The iptables munging that I previously proposed for the realservers to take stuff destined for the balanced IPs they have on loopback and send ports we balance to the directors remains unchanged.) As a side benefit, this means that LVS won't have to add another series of probes to all the realservers to get the same information it already has.

comment:9 Changed 13 years ago by quentin

If we're going to use a different hostname, we might as well use the original hacron plan and use pacemaker to manage the scripts-primary IP address.

comment:10 Changed 13 years ago by mitchb

You're still missing what Anders and I have said. The original hacron plan *does not work* for the reason Anders gave (you can't enumerate every name we might possibly have in /etc/hosts to point there), and isn't great even if it would work for the reason I gave (traffic that normally would be balanced *should* stay on the local machine).

The IP address was always going to be managed with Pacemaker. The only question you've really raised here is whether it should be the directors' Pacemaker cluster or the realservers'. I'm not sure that I feel terribly strongly about it, as I think it should work either way, but I lean towards the directors managing it for a couple reasons: (a) Organization - the directors' function is to manage our floating

IP addresses and general routing. I'm not sure I see a great reason to divide this responsibility.

(b) I worry about hosed machines causing a realserver Pacemaker

cluster split-brain where the outgoing primary still has the primary address bound and is too locked up to take the cluster's directions to unbind it, and then the incoming primary also binds it on an external interface, and they fight and the world gets packet loss to the primary. If the directors' cluster manages that IP, then the only situation in which it could be multiply bound is the one in which all of our addresses are and Scripts is completely down as a result already.

Admittedly, having the realserver cluster manage it would avoid having to add iptables rules to the directors, but since they already have a nontrivial set of rules, that doesn't seem like a major deal.

comment:11 Changed 13 years ago by andersk

  • sensitive changed from 0 to 1

Based on a long zephyr discussion, I think this configuration on the real servers will do what we want:

iptables -t mangle -A OUTPUT \
  -d 18.181.0.46,18.181.0.43,18.181.0.50 \
  -m tcp -p tcp \
  -m multiport ! --dports 25,80,443,444 \
  -j MARK --set-mark 1
ip route add default dev eth0 via 18.181.0.132 table 1
ip rule add fwmark 1 table 1

(We could give the table a symbolic name in /etc/iproute2/rt_tables.)

comment:12 Changed 13 years ago by andersk

  • sensitive changed from 1 to 0

comment:13 Changed 13 years ago by andersk

The rule needs to go before the default local rule, whose priority overrides all other rules by default:

-ip rule add fwmark 1 table 1
+ip rule add priority 100 from all lookup local
+ip rule del priority 0 from all lookup local
+ip rule add priority 50 from all fwmark 0x1 lookup 1

But it still doesn’t work because the source address on the outgoing packets is still 18.181.0.43. We haven’t figured out how to fix that.

comment:14 Changed 13 years ago by mitchb

As you likely know if you've been following along, the experiment described above didn't actually work - the packets continued despite all our best efforts (including putting an explicit src= into the 'ip route') to leave from the balanced IP address, which obviously isn't going to come back to us.

Furthermore, Anders has pointed out that my original proposed solution of DNATing from the realserver-client to scripts-director, and then DNATing again from the director to the balanced address has the unsolved issue of us losing track of *which* of the three (or four, depending on how you count) balanced addresses you were trying to reach.

I think at great length and with much reading and confusion, I've come up with an answer that actually works. If you read the following page, you'll see that this question has been asked before, and for the most part, all the answers people including the developers have come up with appear not to work for the case of having realservers also be LVS clients in an LVS/DR setup: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.lvs_clients_on_realservers.html

However, it does appear that with some fighting, we can succeed if we stop configuring the balanced addresses on the loopback interface and instead follow option 2 to handle the ARP problem described at: http://www.linuxvirtualserver.org/docs/arp.html

The following config was tested successfully on w-e this morning: iptables -A PREROUTING -d 18.181.0.43/32 -j REDIRECT iptables -A OUTPUT -d 18.181.0.43/32 -p tcp -m multiport --dports 80,25,443,444 -j REDIRECT (these, of course, would be repeated for each balanced address)

The first rule takes inbound traffic destined for the scripts.mit.edu IP and redirects it at the primary IP of the interface it came in on (eth0), meaning we accept and handle that traffic even though we don't have its destination address bound on any of our interfaces at all (and as a result of not having that address bound, we won't respond to ARP queries for it).

The second rule takes locally-generated traffic for the balanced ports of scripts.mit.edu's IP and also redirects those back at us... probably on the loopback, though I'm not 100% sure, so we handle our own balanced traffic and maintain sessions that are in use by the app performing a fetch, etc.

Any other traffic we locally generate for the scripts.mit.edu IP will go to the directors and be handled normally, because as far as w-e is concerned right now, 18.181.0.43 is a normal address on the internet that happens to also be on its subnet - we're not binding it, so it's not ours, and it's locally generated, so the PREROUTING chain doesn't handle it.

There is a snag. Without some additional problem solving, what really happens at some point along the way is that you appear not to be able to connect to scripts.mit.edu from w-e. In reality, you're sending traffic out eth0 with your nonbalanced IP address, and you're reaching the director. The director is assigning your request to a realserver and sending it along. The realserver attempts to respond to you at your nonbalanced IP, but has a static route for you over the backend network. You ignore it because you sent something out eth0 and the response came back on eth1 - this is almost certainly the nonsense that causes n-b to lose ability to reach LVS-handled ports on scripts.mit.edu, yet be able to get finger responses from that IP (and in fact, w-e could get 'finger @scripts' output).

For the purposes of this test, I removed the static routes between w-e and s-a. For actual deployment, I see two possible ways forward:

1) We add a routing table that has the "normal" config sending 18.181.0.0/16

over eth0, and add rules to lookup routes in that table at a higher priority than the main table for traffic from the unbalanced addresses of the realservers (traffic that comes in over the backend would come from the 172.21.0.0/16 addresses and would not match that rule, so should still work). This way, when you send a request from one realserver to another over the public network via the directors, it will return to you over the public network. Since SQL traffic is not balanced through this LVS cluster, all of that will still go over the backend, and most web fetches are from outside and all of those go over the public net, so this shouldn't pose a significant security issue.

2) We put the directors on the backend network, which we've waffled on the

question of whether that changes the security properties of Scripts from time to time.

comment:15 Changed 13 years ago by quentin

-j REDIRECT is a very clever solution. It basically means we're accepting traffic for scripts.mit.edu without having any of the other features associated with a local interface. The only downside I can think of is that it would mean that nothing can explicitly bind to one of the IPs; they'll need to bind to all interfaces to get traffic destined to the balanced ones. I don't know if we have anything that does that now (webzephyr?) or if we'd want to do that in the future.

Linux allows you to configure whether it drops packets that come in the wrong interface. (It calls this "rp_filter"). The reason it's on by default is that there are attacks involving untrusted networks. But we already trust our backend network, so we could just turn it off. This solves your routing problem at the end there, without needing either #1 or #2.

I still think we should use these powers to route directly to a scripts server, and not through LVS...

comment:16 Changed 11 years ago by andersk

Since the following conversation didn’t end up on this ticket, see zlogs from 2011-03-25 and the next few days.

Note: See TracTickets for help on using tickets.