Planet Apache

Syndicate content
Updated: 1 day 13 hours ago

Chiradeep Vittal: How to manage a million firewalls – part 2

Fri, 2015-04-10 11:05

Continuing from my last post where I hinted about the big distributed systems problem involved in managing a CloudStack Basic Zone.

It helps to understand how CloudStack is architected at a high level. CloudStack is typically operated as a cluster of identical Java applications (called the “Management Server” or “MS”). There is a MySQL database that holds the desired state of the cloud. API calls arrive at a management server (through a load balancer). The management server uses the current state as stored in the MySQL database, computes/stores a new state and communicates any changes to the cloud infrastructure.

In response to an API call, the management server(s) usually have to communicate with one or more hypervisors. For example, adding a rule to a security group (a single API call)  could involve communicating changes to dozens or hundreds of hypervisors. The job of communicating with the hypervisors is split (“sharded”) among the cluster members. For example if there’s 3000 hypervisors and 3 management servers, then each MS handles communications with 1000 hypervisors. If the API call arrives at MS ‘A’ and needs to update a hypervisor managed by MS ‘B’, then the communication is brokered through B.

Now updating a thousand firewalls  (remember, the firewalls are local to the hypervisor) in response to a single API call requires us to think about the API call semantics. Waiting for all 1000 firewalls to respond could take a very long time. The better approach is to return success to the API and work in the background to update the 1000 firewalls. It is also likely that the update is going to fail on a small percentage of the firewalls. The update could fail due to any number of problems: (transient) network problems between the MS and the hypervisor, a problem with the hypervisor hardware, etc.

This problem can be described in terms of the CAP theorem as well. A piece of state (the state of the security group) is being stored on a number of distributed machines (the hypervisors in this case). When there is a network partition (P), do we want the update to the state to be Consistent (every copy of the state is the same), or do we want the API to be Available (partition-tolerant).  Choosing Availability ensures that the API call never fails, regardless of the state of the infrastructure. But it also means that the state is potentially inconsistent across the infrastructure when there is a partition.

A lot of the problems with an inconsistent state can be hand-waved away1 since the default behavior of the firewall is to drop traffic. So if the firewall doesn’t get the new rule or the new IP address, it means that inconsistency is safe: we are not letting in traffic that we didn’t want to.

A common strategy in AP systems is to be eventually consistent. That is, at some undefined point in the future, every node in the distributed system will agree on the state. So, for example, the API call needs to update a hundred hypervisors, but only 95 of them are available. At some point in the future, the remaining 5 do become available and are updated to the correct state.

When a previously disconnected hypervisor reconnects to the MS cluster, it is easy to bring it up to date, since the authoritative state is stored in the MySQL database associated with the CloudStack MS cluster.

A different distributed systems problem is to deal with concurrent writes. Let’s say you send a hundred API calls in quick succession to the MS cluster to start a hundred VMs. Each VM creation leads to changes in many different VM firewalls. Not every API call lands on the same MS: the load balancer in front of the cluster will distribute it to all the machines in the cluster. Visualizing the timeline:

A design goal is to push the updates to the VM firewalls as soon as possible (this is to minimize the window of inconsistency). So, as the API calls arrive, the MySQL database is updated and the new firewall states are computed and pushed to the hypervisors.

While MySQL concurrency primitives allow us to safely modify the database (effectively serializing the updates to the security groups), the order of updates to the database may not be the order of updates that flow to the hypervisor. For example, in the table above, the firewall state computed as a result of the API call at T=0 might arrive at the firewall for VM A after the firewall state computed at T=2. We cannot accept the “older” update.

The obvious2 solution is to insert the order of computation in the message (update) sent to the firewall. Every time an API call results in a change to the state of a VM firewall, we update a persistent sequence number associated with that VM. That sequence number is transmitted to the firewall along with the new state. If the firewall notices that the latest update received is “older” than the one it is has already processed, it just ignores it. In the figure above, the “red” update gets ignored.

An crucial point is that every update to the firewall has to contain the complete state: it cannot just be the delta from the previous state3.

The sequence number has to be stored on the hypervisor so that it can compare the received sequence number. The sequence number also optimizes the updates to hypervisors that reconnect after a network partition has healed: if the sequence number matches, then no updates are necessary.

Well, I’ve tried to keep this part under a thousand words. The architecture discussed here did not converge easily — there was a lot of mistakes and learning along the way. There is no way for other cloud / orchestration systems to re-use this code, however, I hope the reader will learn from my experience!

1. The only case to worry about is when rules are deleted: an inconsistent state potentially means we are allowing traffic when we didn’t intend to. In practice, rule deletes are a very small portion of the changes to security groups. Besides if the rule exists because it was intentionally created — it probably is OK to take a little time to delete it
2. Other (not-so-good) solutions involve locks per VM, and queues per VM
3. This is a common pattern in orchestrating distributed infrastructure


Categories: FLOSS Project Planets

Justin Mason: Links for 2015-04-09

Thu, 2015-04-09 18:58
Categories: FLOSS Project Planets

Sebastien Goasguen: 1 command to Kubernetes with Docker compose

Thu, 2015-04-09 14:59

After 1 command to Mesos, here is 1 command to Kubernetes.

I had not looked at Kubernetes in over a month. It is a fast paced project so it is hard to keep up. If you have not looked at Kubernetes, it is roughly a cluster manager for containers. It takes a set of Docker hosts under management and schedules groups of containers in them. Kubernetes was open sourced by Google around June last year to bring all the Google knowledge of working with containers to us, a.k.a The people :) There are a lot of container schedulers or orchestrators if you wish out there, Citadel, Docker Swarm, Mesos with the Marathon framework, Cloud Foundry lattice etc. The Docker ecosystem is booming and our heads are spinning.

What I find very interesting with Kubernetes is the concept of replication controllers. Not only can you schedule groups of colocated containers together in a cluster, but you can also define replica sets. Say you have a container you want to scale up or down, you can define a replica controller and use it to resize the number of containers running. It is great for scaling when the load dictates it, but it is also great when you want to replace a container with a new image. Kubernetes also exposes a concept of services basically a way to expose a container application to all the hosts in your cluster as if it were running locally. Think the ambassador pattern of the early Docker days but on steroid.

All that said, you want to try Kubernetes. I know you do. So here is 1 command to try it out. We are going to use docker-compose like we did with Mesos and thanks to this how-to which seems to have landed 3 days ago, we are going to run Kubernetes on a single host with containers. That means that all the Kubernetes components (the "agent", the "master" and various controllers) will run in containers.

Install compose on your Docker host, if you do not have it yet:

curl -L https://github.com/docker/compose/releases/download/1.1.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Then create this YAML file, call it say k8s.yml:

etcd:
image: kubernetes/etcd:2.0.5.1
net: "host"
command: /usr/local/bin/etcd --addr=127.0.0.1:4001 --bind-addr=0.0.0.0:4001 --data-dir=/var/etcd/data
master:
image: gcr.io/google_containers/hyperkube:v0.14.1
net: "host"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command: /hyperkube kubelet --api_servers=http://localhost:8080 --v=2 --address=0.0.0.0 --enable_server --hostname_override=127.0.0.1 --config=/etc/kubernetes/manifests
proxy:
image: gcr.io/google_containers/hyperkube:v0.14.1
net: "host"
privileged: true
command: /hyperkube proxy --master=http://127.0.0.1:8080 --v=2

And now, 1 command:

$ docker-compose -f k8s.yml up -d

Quickly there after, you will see a bunch of containers pop-up:

$ docker ps
CONTAINER ID IMAGE
56c36dc7bf7e nginx:latest
a17cac87965b kubernetes/pause:go
659917e61d3e gcr.io/google_containers/hyperkube:v0.14.1
caf22057dbad gcr.io/google_containers/hyperkube:v0.14.1
288fcb4408c7 gcr.io/google_containers/hyperkube:v0.14.1
820cc546b352 kubernetes/pause:go
0bfac38bdd10 kubernetes/etcd:2.0.5.1
81f58059ca8d gcr.io/google_containers/hyperkube:v0.14.1
ca1590c1d5c4 gcr.io/google_containers/hyperkube:v0.14.1

In the YAML file above, you see in the commands that it used a single binary hyperkube that allows you to start all the kubernetes components, the API server, the replication controller etc ... One of the components it started is the kubelet which is normally used to monitor containers on one of the host in your cluster and make sure they stay up. Here by passing the /etc/kubernetes/manifests it helped us start the other components of kubernetes defined in that manifest. Clever ! Note also that the containers where started with a host networking. So these containers have the network stack of the host, you will not see an interface on the docker bridge.

With all those up, grab the kubectl binary, that is your kubernetes client that you will use to interact with the system. The first thing you can do is list the nodes:

$ ./kubectl get nodes
NAME LABELS STATUS
127.0.0.1 <none> Ready

Now start your first container:

./kubectl run-container nginx --image=nginx --port=80

That's a simple example, where you can actually start a single container. You will want to group your containers that need to be colocated and write a POD description in YAML or json than pass that to kubectl. But it looks like they extended kubectl to take single container start up. That's handy for testing.

Now list your pods:

$ ./kubectl get pods
POD IP CONTAINER(S) IMAGE(S)
nginx-127 controller-manager gcr.io/google_containers/hyperkube:v0.14.1
apiserver gcr.io/google_containers/hyperkube:v0.14.1
scheduler gcr.io/google_containers/hyperkube:v0.14.1
nginx-p2sq7 172.17.0.4 nginx nginx

You see that there is actually two pods running. The nginx one that you just started and one pod made of three containers. That's the pod that was started by your kubelet to get Kubernetes up. Kubernetes managed by Kubernetes...

It automatically created a replication controller (rc):

$ ./kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
nginx nginx nginx run-container=nginx 1

You can have some fun with the resize capability right away and see a new container pop-up.

$ ./kubectl resize --replicas=2 rc nginx
resized

Now that is fine and dandy but there is no port exposed on the host, so you cannot access your application on the outside. That's where you want to define a service. Technically it is used to expose a service to all nodes in a cluster but of course you can bind that service proxy to a publicly routed interface:

$ ./kubectl expose rc nginx --port=80 --public-ip=192.168.33.10

Now take your browser and open it at http://192.168.33.10 (if that's the IP of your host of course) and enjoy a replicated nginx managed by Kubernetes deployed in 1 command.

You will get more of that good stuff in my book, if I manage to finish it. Wish me luck.

Categories: FLOSS Project Planets

Lars Heinemann: Updated install guide for Eclipse Luna / JBDS 8

Thu, 2015-04-09 14:22
Installation of JBoss Fuse Tooling for Eclipse Luna / JBDS 8:

I received some comments on the existing installation guide that it is no longer working because of missing dependencies. We setup a new installation guide at our GitHub wiki and will keep that updated in future. Thank you for reporting it!

Before continuing please keep in mind that you are now installing a development snapshot of the tooling for Eclipse Luna. This is a work in progress and not released yet.

You can find the new guide >>> HERE <<<.

Have fun and please report any issues back to me :)
Categories: FLOSS Project Planets

Lars Heinemann: How to install JBoss Fuse Tooling into Eclipse Luna

Thu, 2015-04-09 14:21
Installation of JBoss Fuse Tooling for Eclipse Luna:

Before continuing please keep in mind that you are now installing a development snapshot of the tooling for Eclipse Luna. This is a work in progress and not released yet.

Lets choose the download for Eclipse Standard 4.4. Once the download finished unpack the archive onto your hard drive and start Eclipse by executing the Eclipse launcher. Choose your workspace and then you should find yourself in the Eclipse Welcome Screen. (only if you start it for the first time)
Welcome to Eclipse Luna!
Now lets open the Help menu from the top menu bar and select the entry "Install new Software".




Lets define a new update site location for JBoss Fuse Tooling. Click the "Add" button next to the combo box for the update sites.



Now enter the following...

Name: 
JBoss Fuse Tooling (Luna)
Location: 
http://download.jboss.org/jbosstools/updates/integration/luna/integration-stack/fuse-tooling/7.3.0/all/repo/
Click the OK button to add the new update site to your Eclipse installation.
A new dialog will open up and ask you what features to install from the new update site:




There are three features available from that update site:

JBoss Fuse Camel Editor Feature:
This feature gives you the route editor for Apache Camel Routes, a new project wizard to setup new integration modules and the option to launch your new routes locally for testing purposes.

JBoss Fuse Runtimes Feature:
This allows you to monitor your routes, trace messages, edit remote routes and there are also Fabric bits available to work with Fuse Fabric.

JBoss Fuse Server Extension Feature:
When installing this feature you will get server adapters for Apache ServiceMixApache Karaf, Fabric8 and JBoss Fuse runtimes. It allows you to start / stop those servers and to connect to their shell. Deployment options are also available.

Once you are done with selecting your features click on the Next button. The following screen will show you what will be installed into your Luna installation.




You can review your selection and make changes to it by clicking the Back button if needed. If all is fine you can click the Next button instead. This will lead you to the license agreement screen. You need to agree to the licenses in order to install the software.




Once that is done you can click again the Next button and the installation will start by downloading all needed plugins and features. Once that is done the new features will be installed into your Luna folder. Before that happens Eclipse will warn you that you are going to install unsigned content. This happens because our plugins are not signed but thats nothing to worry about. Just click OK to do the installation.


After everything is installed Eclipse will ask you for a restart.



Click the Yes button to restart Eclipse. When the restart is done you will be able to select the Fuse Integration perspective from the perspectives list.




Well done! You've installed JBoss Fuse Tooling for Eclipse Luna! 


Categories: FLOSS Project Planets

Matt Raible: Getting Hip with JHipster at Denver's Java User Group

Thu, 2015-04-09 14:20

Last night, I had the pleasure of speaking at Denver's Java User Group Meetup about JHipster. I've been a big fan of JHipster ever since I started using it last fall. I developed a quick prototype for a client and wrote about solving some issues I had with it on OS X. I like the project because it encapsulates the primary open source tools I've been using for the last couple of years: Spring Boot, AngularJS and Bootstrap. I also wrote about its 2.0 release on InfoQ in January.

To add some humor to my talk, I showed up as a well-dressed Java Developer. Like a mature gentleman might do, I started the evening with a glass of scotch (Glenlivet 12). Throughout the talk I became more hip and adjusted my attire, and beverage, accordingly. As you might expect, my demos had failures. The initial project creation stalled during Bower's download all JavaScript dependencies. Luckily, I had a backup and was able to proceed. Towards the end, when I tried to deploy to Heroku, I was presented with a lovely message that "Heroku toolbelt updating, please try again later". I guess auto-updating has its downsides.

After finishing the demo, I cracked open a cold PBR to ease my frustration.

I did two live coding sessions during this presentation; standing on the shoulders of giants to do so. I modeled Josh Long's Getting Started with Spring Boot to create a quick introduction to Spring Boot. IntelliJ IDEA 14.1 has a nice way to create Spring Boot projects, so that came in handy. For the JHipster portion, I created a blogging app and used relationships and business logic similar to what Julien Dubois did in his JHipster for Spring Boot Webinar. Watching Josh and Julien's demos will give you a similar experience to what DJUG attendees experienced last night, without the download/deployment failures.

You can click through my presentation below, download it from my presentations page, or view it on SlideShare.

You might notice my announcement on slide #32 that I've signed up to write a book on JHipster.

I haven't started writing the book yet, but I have been talking with InfoQ and other folks about it for several months. I plan to use Asciidoctor and Gradle as my authoring tools. If you have experience writing a book with these tools, I'd love to hear about it. If you've developed an application with JHipster and have some experience in the trenches, I'd love to hear your stories too.

As I told DJUG last night, I plan to be done with the book in a few months. However, if you've been a reader of this blog, you'll know I've been planning to be done with my '66 VW Bus in just a few more months for quite some time, so that phrase has an interesting meaning for me.

Categories: FLOSS Project Planets

Bryan Pendleton: Popular music as cultural analysis

Thu, 2015-04-09 14:15

We're going to be seeing The Decemberists in a few weeks, and I'm starting to get really excited about the show.

I had listened to them, on and off, but hadn't paid enough attention, and with the upcoming tour as inspiration I've been really paying a lot more attention to them.

And they're fascinating.

Musically, I started listening to them because I picked up Long Live The King due to its inclusion of a cover of a wonderful Grateful Dead song, Row Jimmy.

Then I moved on to The King Is Dead, their blockbuster, which of course I adored because of Peter Buck.

So at first What a Terrible World, What a Beautiful World freaked me out a little bit, because it's considerably different from The King Is Dead. But over the last few months I've grown to love WATWWABW at least as much as TKID, if not more.

Besides just their music, one of the interesting things about The Decemberists is how much people like to talk about them as a way of talking about the world at large. This is true of many popular artists, but it is particularly true about The Decemberists, perhaps because their songs get people thinking about larger topics.

So, for example, we have Colin Meloy being interviewed: YA Books, RPG and the New Decemberists LP: Colin Meloy Rolls the Dice:

That relationship between bands or singers and their audience, it's kind of a funny relationship and abusive in its own right, going both ways. I shouldn't say abusive, but it can be antagonistic. I think that it's an odd relationship, and it's just that particular singer trying to come to terms with that aspect of it. Having an audience, you may want to continue doing things on your own terms, but that becomes more challenging when there are expectations. And audiences have more of a voice than ever with the advent of the Internet.

Over at Slate, Carl Wilson takes an even broader view: Against indie: New albums from Modest Mouse, Sufjan Stevens, and more show it's time to eliminate the racist term for good.

Other music listeners might ask if bands of the Decemberists’ vintage can change enough to feel pertinent in 2015. A decade ago, music blogs, film and TV music supervisors, Pitchfork, and other new media outlets boosted “indie” to a rare visibility. Now, many of those acts are returning from long absences to quite an altered atmosphere.

Wilson goes on to explain why he uses the powerful term "racist" in this situation:

Few of them claim to be fighting any kind of battle against pop anymore—fans are almost always worse than artists on that count. But this decade has also seen a more widespread suspicion and critique of the workings of social privilege, and “indie” has a problem there—because its creators and listeners seem so disproportionately white, male, and upper-middle-class.

Later, Wilson more directly skewers The Decemberists for what he sees as their failings:

Likewise I am a bit skeptical that without “indie,” the Decemberists could even exist. If there were then still a call for a post-modern folk-rock Gilbert and Sullivan, it would have to have more of the courage of its strangeness. The band’s hiatus has done it some good, and the songwriting is more grounded on this year’s What a Terrible World, What a Beautiful World. But I still find Meloy’s unrelenting streams of conceits wearying, like a prog concept album from 1975 without even the gonzo musicianship to liven up the occasion.

More than any other band, they bring me back to the self-regarding turn that America made in the 2000s—the post-9/11 world-wariness and self-soothing. It would be too much to say that’s what made it an ideal period for “indie.” But when I listen to the Decemberists, I’m tempted.

I am, indeed, white, male, and upper-middle-class. So, guilty as charged. But does that mean I'm somehow committing a social offense by being a Decemberists fan?

I'll have to spend more time listening to their music before I can come to a more considered opinion about whether they are letting us down.

But it also seems like Wilson is asking Meloy and company to fight Wilson's battles, which is unfair. As Meloy says,

I just like stories. I like people telling stories.

Seems fair enough, to me. I'll keep listening, and hopefully I'll enjoy going to their show and meeting the people I meet there.

Sometimes art can just be entertainment, after all; it doesn't always have to change the world. That's a lot to ask, of anybody.

Categories: FLOSS Project Planets

Mads Toftum: New blog software and layout

Thu, 2015-04-09 13:33

After quite some time ignoring my blog, I wanted to take a look at getting a slightly more modern theme. A quick look at the site found an official EOL notice for NanoBlogger. To replace NanoBlogger, I've chosen Blogofile which admittedly hasn't had any commits to its repo for about 2 years and could appear as dead as NanoBlogger. In addition to Blogofile, I've grabbed a basic blog template from Start Bootstrap to get something a bit cleaner than the basic Blogofile template.

Reasons for picking Blogofile

I'm a big fan of pre-generated websites letting static content be static rather than generated on the fly. Apart from the fact that it scales a whole lot better, it also avoids the usual security headaches that seem to follow things like Wordpress and friends. There's even an option to turn on comments without having anything dynamic on the site. It's written in Python which has been the language I've been playing with lately to add something beyond shell and Perl to my usual tool stack. There's a good varied template support giving me easy choices to write posts with a little more styling than just the plain html from NanoBlogger. The layout of the working directories is clean and simple and easily let me integrate old leftovers. Migration from NanoBlogger was pretty simple.

Installation and Migration notes

Installing Blogofile with pip doesn't quite work out of the box, but cloning the git repos and running python setup.py install did the trick. NanoBlogger keeps its post categories in two places - in cat__N_.db which contains the category name on the first line and in the master.db file which has an entry for each post with something like "2011-08-30T14_19_45.txt>6,8". That's the filename for the post and the N corresponding to the category files. In order to migrate, I manually changed the numbers in master.db to category names which didn't take long with a bit of vim. I also changed the > and the , into | for simplicity. Next I ran my conversion code (see below) as the simplest hack.

with open("master.db", "r") as master: for line in master: ent=line.split('|') file=ent.pop(0) with open(file, "r") as infile, open(file+".html", "w") as outfile: outfile.write("---\n") outfile.writelines("categories: " + ", ".join(ent)) body=False for inline in infile: if body==True: if not inline.startswith('END-----'): outfile.write(inline) elif inline.startswith("BODY:"): outfile.write("---\n") body=True else: head=inline.split(': ') if head[0] in ['AUTHOR', 'DATE', 'TITLE']: outfile.write(head[0].lower() + ": " + head[1])

Pretty simple stuff putting in the YAML headers, the new categories and lowercasing the remaining headers that are reused. Once it gets to the body it just copies line by line. The only problems that ran into was titles with : and # in them, which was easy to fix by hand.

Changing the templates only took a little bit more time than it should have because the last time I cared about writing HTML, CSS had only just been invented.

The future

I've got a bit more tweaking to my site to get done, making it less ugly and to add some of my pictures. There's also a couple of blog posts about OpenData half written in my mind, which should appear as time permits. After that, your guess is as good as mine. Maybe some of what currently goes out on Twitter will end up here along with some of the pictures that go on Flickr.

Categories: FLOSS Project Planets

Chiradeep Vittal: How to manage a million firewalls – part 1

Thu, 2015-04-09 11:38

In my last post I argued that security groups eliminate the need for network security devices in certain parts of the datacenter. The trick that enables this is the network firewall in the hypervisor. Each hypervisor hosts dozens or hundreds of VMs — and provides a firewall per VM. The figure below shows a typical setup, with Xen as the hypervisor. Ingress network traffic flows through the hardware into the control domain (“dom0″) where it is switched in software (so called virtual switch or vswitch) to the appropriate VM.

The vswitch provides filtering functions that can block or allow certain types of traffic into the VM. Traffic between the VMs on the same hypervisor goes through the vswitch as well. The vswitch used in this design is the Linux Bridge; the firewall function is provided by netfilter ( “iptables”).

Security groups drop all traffic by default and only allow those configured by the rules. Suppose the red VMs in the figure (“Guest 1″ and “Guest 4″) are in a security group “management”. We want to allow access to them from the subnet 192.168.1.0/24 on port 22 (ssh). The iptables rules might look like this:

iptables -A FORWARD -p tcp --dport 22 --src 192.168.1.0/24 -j ACCEPT iptables -A FORWARD -j DROP

Line 1 reads: for packets forwarded across the bridge (vswitch) that are destined for port 22, and are from source 192.168.1.0/24, allow (ACCEPT) them. Line 2 reads: DROP everything. The rules form a chain: packets traverse the chain until they match. (this is highly simplified: we want to match on the particular bridge ports that are connected to the VMs in question as well).

Now, let’s say we want to allow members of the ‘management’ group access their members over ssh as well. Let’s say there are 2 VMs in the group, with IPs of ‘A’ and ‘B’.  We calculate the membership and for each VM’s firewall, we write additional rules:

#for VM A iptables -I FORWARD -p tcp --dport 22 --source B -j ACCEPT #for VM B iptables -I FORWARD -p tcp --dport 22 --source A -j ACCEPT

As we add more VMs to this security group, we have to add more such rules to each VM’s firewall. (A VM’s firewall is the chain of iptables rules that are specific to the VM).  If there are ‘N’ VMs in the security group, then each VM has N-1 iptables rules for just this one security group rule. Remember that a packet has to traverse the iptables rules until it matches or gets dropped at the end. Naturally each rule adds latency to a packet (at least to the connection-initiating ones).  After a certain number (few hundreds) of rules, the latency tends to go up hockey-stick fashion. In a large cloud, each VM could be in several security groups and each security group could have rules that interact with other security groups — easily leading to several hundred rules.

Aha, you might say, why not just summarize the N-1 source IPs and write a single rule like:

iptables -I FORWARD -p tcp --dport 22 --source <summary cidr> -j ACCEPT

Unfortunately, this isn’t possible since it is never guaranteed that the N-1 IPs will be in a single CIDR block. Fortunately this is a solved problem: we can use ipsets. We can add the N-1 IPs to a single named set (“ipset”). Then:

ipset -A mgmt <IP1> ipset -A mgmt <IP2> ... iptables -I FORWARD -p tcp --dport 22 -m set match-set mgmt src -j ACCEPT

IPSets matching is usually very fast and fixes the ‘scale up’ problem. In practice, I’ve seen it handle tens of thousands of IPs without significantly affecting latency or CPU load.

The second (perhaps more challenging) problem is that when the membership of a group changes, or a rule is added / deleted, a large number of VM firewalls have to be updated. Since we want to build a very large cloud, this usually means thousands or tens of thousands of hypervisors have to be updated with these changes. Let’s say in the single group/single rule example above, there are 500 VMs in the security groups. Adding a VM to the group means that 501 VM firewalls have to be updated. Adding a rule to the security group means that 500 VM firewalls have to be updated. In the worst case, the VMs are on 500 different hosts — making this a very big distributed systems problem.

If we consider a typical datacenter of 40,000 hypervisor hosts, with each hypervisor hosting an average of 25 VMs, this becomes the million firewall problem.

Part 2 will examine how this is solved in CloudStack’s Basic Zone.


Categories: FLOSS Project Planets

Sebastien Goasguen: Running the CloudStack Simulator in Docker

Thu, 2015-04-09 09:44

CloudStack comes with a simulator. It is very handy for testing purposes, we use it to run our smoke tests on TravisCI for each commit to the code base. However if you want to run the simulator, you need to compile from source using some special maven profiles. That requires you to check out the code and setup your working environment with the dependencies for a successfull CloudStack build.

With Docker you can skip all of that and simply download the cloudstack/simulator image from the Docker Hub. Start a container from that image and expose port 8080 where the dashboard is being served. Once the container is running, you can use docker exec to configure a simulated data center. This will allow you to start fake virtual machines, create security groups and so on. You can do all of this through the dashboard or using the CloudStack API.

So you want to give CloudStack a try ? Use Docker :)

$ docker pull cloudstack/simulator

The image is a bit big and we need to work on slimming it down but once the image is pulled, starting the container will be almost instant. If you feel like sending a little PR just the Dockerfile, there might be a few obvious things to slim down the image.

$ docker run -d -p 8080:8080 --name cloudstak cloudstack/simulator

The application needs a few minutes to start however, something that I have not had time to check. Probably we need to give more memory to the container. Once you can access the dashboard at http://localhost:8080/client you can configure the simulated data-center. You can choose between a basic network which gives you L3 network isolation or advanced zone which gives you a VLAN base isolation:

$ docker exec -ti cloudstack python /root/tools/marvin/marvin/deployDataCenter.py -i /root/setup/dev/basic.cfg

Once the configuration completes, head over to the dashboard http://localhost:8080/client and check your simulated infrastructure

Enjoy the CloudStack simulator brought to you by Docker.

Categories: FLOSS Project Planets

Claus Ibsen: Getting started with JBoss Fuse - Where should I start?

Thu, 2015-04-09 03:47
Christina Lin, a Fuse evangelist, has created a great blog series of how to get started with JBoss Fuse.
She has done a great job of creating real life use-cases and detailing how to do this step by step, in both text and slides.
JBoss Fuse Blog Series by Christina Lin 
People who are new to JBoss Fuse is encouraged to take a look at this, as IMHO its easier to follow an use-case, than reading through then 100s of pages of reference documentation that is part of the product.
As Apache Camel is a corner stone of JBoss Fuse, then its often also even easier to get started by just learning some basic Camel first without thinking about Fuse / App Servers / OSGi / Karaf / Blueprint / Fabric / and many other concepts that JBoss Fuse brings to the table. Learn some Camel skills, which you can easily run on your computer locally, from within your Java editor. And then only thereafter take the leap and dive into the world of JBoss Fuse.
To get started with Apache Camel I point people to read this article written by Jonathan Anstey several years ago (don't worry the information is today 100% up to date, as Camel is table and dont throw you under the bus by frequent changes) which really captures well what Camel is and what it can do. For example the foundation of Camel which James Strachan laid out 8 years ago is still the road we travel today. If you are looking for other explanations what Camel is, then there is a good QA on stackoverflow.
Another great starting point to learn Apache Camel is chapter 1 of the Camel in Action book. And this is not the only Camel book, in fact there is 4 known books published.

Categories: FLOSS Project Planets

Adrian Sutton: Emoji One – Open source emoji designed for the web.

Wed, 2015-04-08 19:55
Mostly so I can find this again later when I inevitably need it, Emoji One is a creative commons licensed collection of Emoji and related tools for the web.
Categories: FLOSS Project Planets

Justin Mason: Links for 2015-04-08

Wed, 2015-04-08 18:58
Categories: FLOSS Project Planets

Chiradeep Vittal: CloudStack Basic Networking : frictionless infrastructure

Wed, 2015-04-08 11:39

Continuing on my series exploring CloudStack’s Basic Zone:

Back to Basics

Basic Networking deep dive

The origin of the term ‘Basic’ lies in the elimination of switch and router configuration (primarily VLANs) that trips up many private cloud implementations. When the cloud operator creates a Basic Zone, she is asked to add Pods to the availability zone. Pods are containers for hypervisor hosts. 

The figure above shows a section of a largish Basic Zone. The cloud operator has chosen to map each Rack to one Pod in CloudStack. Two Pods (Rack 1 and Rack 24) are shown with a sample of hypervisor hosts. VMs in three security groups are shown. As described in the previous post, the Pod subnets are defined by the cloud operator when she configures the Pods in CloudStack. The cloud user cannot chose the Pod (or subnet) when deploying a VM.

The firewalls shown in each host reflect the fact that the security group rules are enforced in the hypervisor firewall and not on any centralized or in-line appliance. CloudStack orchestrates the configuration of these firewalls (essentially iptables rules) every time a VM state changes or a security group is reconfigured using the user API.

Each Rack can have multiple uplinks to the L3 core. In fact this is the way data centers are architected for cloud and big data workloads. In a modern datacenter, the racks form the leafs and the L3 core consist of multiple spine routers. Each host has multiple network paths to every other host — at equal cost. CloudStack’s Basic Zone takes advantage of this any-to-any east-to-west bandwidth availability by not constraining the placement of VMs by networking location (although such a facility [placement groups] is available in CloudStack).

The cloud operator can still use VLANs for the rack-local links. For example, access VLAN 100 can be used in each  rack to connect to the hypervisors (the “guest network”), while the untagged interface (the “management network”) can be used to connect to the management interface of each hypervisor.

CloudStack automatically instantiates a virtual DHCP appliance (“virtual router”) in every Pod that serves DHCP and DNS to the VMs in the pod. The same appliance also serves as the userdata server and password change service. No guest traffic flows through the appliance. All traffic between VMs goes entirely over the physical infrastructure (leaf and spine routers). No network virtualization overhead is incurred. Broadcast storms, STP configurations, VLANs — all the traditional bugbears of a datacenter network are virtually eliminated.

When the physical layer of the datacenter network is architected right, Basic Zone provides tremendous scale and ease-of-use:

  1. Location-independent high bandwidth between any pair of VMs
  2. Elimination of expensive bandwidth sucking, latency-inducing security appliances
  3. Easy security configuration by end-users
  4. Elimination of VLAN-configuration friction
  5. Proven scale : tens of thousands of hypervisors
  6. Egress firewalls provide security for the legacy / non-cloud portions of the datacenter.
  7. The ideal architecture for your micro-services based applications, without the network virtualization overhead

Categories: FLOSS Project Planets

OpenSource.com: NASA's Chris Mattmann on Apache technology

Wed, 2015-04-08 04:00

Chris Mattmann is a frequent speaker at ApacheCon North America and has a wealth of experience in software design and the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting graduate students at the University of Southern California (his alma mater) in the study of software architecture, all the way to helping industry and open source as a member of the Apache Software Foundation.

Categories: FLOSS Project Planets

Justin Mason: Links for 2015-04-07

Tue, 2015-04-07 18:58
Categories: FLOSS Project Planets

Chiradeep Vittal: CloudStack Basic Networking : deeper dive

Tue, 2015-04-07 11:33

In my last post I sang the praise of the simplicity of Basic Networking. There’s a few more details which even seasoned users of CloudStack may not be aware of:

  1. Security group rules are stateful. This means active connections enabled by the rules are tracked so that traffic can flow bidirectionally. Although UDP and ICMP are connectionless protocols, their “connection” is defined by the tuple. Stateful connection also has the somewhat surprising property that if you remove a rule, the existing connections enabled by rule continue to exist, until closed by either end of the connection. This is identical to AWS security groups behavior.
  2. Security group rules can allow access to VMs from other accounts: Suppose you have a shared monitoring service across accounts. The VMs in the monitoring service can belong to the cloud operator. Other tenants can allow access to them:
    • > authorize securitygroupingress securitygroupname=web account=operator usersecuritygrouplist=nagios,cacti protocol=tcp startport=12489 ...
  3. There is always a default security group: Just like EC2-classic, if you don’t place a VM in a security group, it gets placed in the default security group. Each account has its own default security group.
  4. Security group rules work between availability zones:  Security groups in an account are common across a region (multiple availability zones). Therefore, if the availability zones are routable (without NAT) to each other then the security groups work just as well between zones. This is similar to AWS EC2-classic security groups.
  5. Subnets are shared between accounts / VMs in a security group may not share a subnet. Although tenants cannot create or choose subnets in Basic networking, their VMs are placed in subnets (“Pods”) predefined by the cloud operator. The table below shows a sample of VMs belonging to two accounts spread between two subnets.
  6. BUM traffic is silently dropped. Broadcast and multicast traffic is dropped at the VM egress to avoid attacks on other tenants in the same subnet. VMs cannot spoof their mac address either: unicast traffic with the wrong source mac is dropped as well.
  7. Anti-spoofing protection. VMs cannot spoof their mac address. VMs cannot send ARP responses for IP addresses they do not own. VMs cannot spoof DHCP server responses either. ARP is allowed only when the source MAC matches the VM’s assigned MAC. DHCP and DNS queries to the pod-local DHCP server are always allowed. If you run Wireshark/tcpdump within the VM you cannot see your neighbors traffic even though your NIC is set to promiscuous mode.
  8. Multiple IP addresses per VM: Once the VM is started you can request an additional IP for the VM (use the addIptoNic API).
  9. Live migration of the VM works as expected: When the operator migrates a VM, the security group rules move with the VM. Existing connections may get dropped during the migration.
  10. High Availability: As with any CloudStack installation, High Availability (aka Fast Restart) works as expected. When the VM moves to a different host, the rules move along with the VM.
  11. Effortless scaling: The largest CloudStack clouds (tens of thousands of nodes) use Basic networking. Just add more management servers.
  12. Available LBaaS: You can use a Citrix Netscaler to provide load balancing as well as Global Server Load Balancing (GSLB)
  13. Available Static NAT: You can use a Citrix Netscaler to provide Static NAT from a “public” IP to the VM IP.

There are limitations however when you use Basic Zone:

  1. Security groups function is only available on Citrix XenServer and KVM
  2. You can’t mix Advanced Networks and Basic Networks in the same availability zone, unlike AWS EC2
  3. You can’t add/remove security groups to a VM after it has been created. This is the same as EC2-classic
  4. No VPN functions are available.

Categories: FLOSS Project Planets

Justin Mason: Links for 2015-04-06

Mon, 2015-04-06 18:58
Categories: FLOSS Project Planets

Bryan Pendleton: Ten years of git

Mon, 2015-04-06 18:27

Linux.com is running an interesting short interview with Linus Torvalds about git.

Q: Does Git last forever, or do you foresee another revision control system in another 10 years? Will you be the one to write it?

Torvalds: I'm not going to be the one writing it, no. And maybe we'll see something new in ten years, but I guarantee that it will be pretty "git-like." It's not like git got everything right, but it got all the really basic issues right in a way that no other SCM had ever done before.

Over at the Atlassian web site, there's a pretty little animated page: "only a mere few days later, the world was given the gift of Git".

The "10 year anniversary" is based on this, I believe.

If you have any knowledge of git, and git internals, gotta love this:

"write-tree etc. by hand" (!)
Categories: FLOSS Project Planets

Bryan Pendleton: Can Linux network service names contain periods?

Mon, 2015-04-06 16:40

So I was doing some testing of inetd and xinetd.

inetd is just like xinetd except that xinetd has an 'x' in it. (Just joking. Here's a slightly better answer).

So I wanted to test two different versions of my service, and the machine I had root access to was using xinetd, so I followed some RedHat documentation I found on the net.

The files in the /etc/xinetd.d/ directory contains the configuration files for each service managed by xinetd and the names of the files correlate to the service. As with xinetd.conf, this file is read only when the xinetd service is started. For any changes to take effect, the administrator must restart the xinetd service.

I was a little bit confused about "the names of the files correlate to the service", so I read further in the RedHat docs:

service — Defines the service name, usually one listed in the /etc/services file.

I looked in my /etc/xinetd.d directory, and sure enough I had some existing files:


$ ls /etc/xinetd.d
chargen daytime discard echo time

Sounds pretty good, so I created two new files:


$ sudo vim /etc/xinetd.d/server.1 /etc/xinetd.d/server.2

I put in some simple configuration, restarted xinetd, and ...

...

... nothing happened.

After a bunch of flailing around, I found /var/log/syslog, where xinetd reported that it loaded the configuration files /etc/xinetd.d/chargen, /etc/xinetd.d/daytime, /etc/xinetd.d/discard, /etc/xinetd.d/echo, and /etc/xinetd.d/time.

There weren't any other error messages or complaints.

It just quietly omitted my service configuration files.

A colleague, stopping by (mostly to stop the full-volume stream of profanities I was directing at my computer screen), looked over my shoulder and said:

Hmm.. I wonder if period is an illegal character in a service name?

WHAT?

Well, sure enough, I renamed those two files from "server.1" and "server.2" to "server1" and "server2", and restarted xinetd, ...

... and everything worked fine.

So I poked around in places like Linux Network Administrators Guide and Linux Networking HOWTO, and read:

service specifies the service name and name

A single word name that represents the service being described.

I guess Real Linux System Administrators don't need things like this documented, and don't need their tools to print syntax error messages when they make simple syntax errors in naming their network services.

I'll go back to getting useful work done. Bummer about those 90 minutes of my life; I could have maybe used them for something useful.

Categories: FLOSS Project Planets