Planet Apache

Syndicate content
Updated: 5 hours 5 min ago

Community Over Code: Shane’s Apache Director Position Statement 2013

Fri, 2013-05-17 20:14

The ASF is holding it’s Annual Member’s Meeting this coming week, where the Membership elects a new board of directors along with other matters, like voting in new Member candidates. While I was nominated last year, I was not elected. I would have been sad about not getting a seat, except for the fact that such other fabulously good people got elected instead (including two new directors who got to serve their first terms, Rich and Ross, yay!).

Director candidates at the ASF write position statements about what their objectives for being a director are in preparation for the voting process. Since I write what I believe in, I also am posting my statement here, publicly. One of the biggest issues for the smooth functioning of the ASF as a home for healthy projects is doing a better job of explaining how we work – I hope this helps people understand us Apache types just a little bit better. You can also see what I wrote last year.

If you’re wondering how governance at Apache really works, I’ve written an Apache governance overview too.

Shane Curcuru (curcuru) Director Position Statement v2.0 statement

As the ASF scales in people, projects, and impact on the world, we
need directors that can ensure our organization stays true to it’s
ideals; that can delegate appropriately and efficiently to officers
and PMCs; and especially that can communicate calmly, clearly,
and consistently in all of their communications.

As we surpass one million $ in assets, with thousands of committers, nearly a gross of projects, and an huge impact both on the software world with our technology, and on the larger world of computer users with our products, I believe it’s important to do an even better job of explaining what the ASF is about and how the Apache Way works.

While we don’t need more rules, we do need to do a much better job of explaining what our few hard requirements are, as well as showcasing the wealth of best practices that our projects have created. This is important both to let the world know who we are, and also to ensure that the many different communities of contributors can more easily understand how to work with our projects.

With the fast growing scale of our organization, it is critical that directors and corporate officers can communicate clearly, calmly, and professionally in all of their Apache related activities – whether or not they’re explicitly showing which hat they’re wearing at the moment. As our impact grows, so does the impact of our words, both inside our communities, attracting (or not) new members to our communities, and also on the larger world of corporations, universities, and other computer using peoples. Even if we as long-time denizens of members@ understand which hat a director or officer is wearing when they speak, most other human beings and most other contributors don’t necessarily see the distinction.

It has been a long time since we held in-person member’s meetings
where everyone knew each others personal style. As we grow,
we need to be sure that we’re making it easy for new members and
contributors to feel welcomed and understand how Apache works. We also need to ensure that we both can keep the sense of family and enjoyable, collaborative community that the membership and our projects have, and that we manage the affairs of the ASF and of our projects in a consistent, documented, and professional manner.

About Shane

I’ve been a committer since November 1999, a Member since 2002,
and VP, Brand Management since 2009.

I am employed by IBM in the HR division as an Applications Architect. My employment and income have been unrelated to my work at the ASF for many years, and I will always clearly separate volunteer work from employer-funded work.

My involvement in the ASF is driven by a belief in, and a love of,
the ASF, and is not influenced by politics or finances. I live in
Massachusetts with my wife, young daughter, and 2 cats. I view
directorships and officer positions at the ASF as serious commitments.

I will attend every board meeting if elected.

Categories: FLOSS Project Planets

Justin Mason: Links for 2013-05-17

Fri, 2013-05-17 18:58
  • Deep In The Game: Not The RTE Guide

    Good interview with Alan Maguire, the satirist behind the very funny @NotTheRTEGuide on Twitter:

    I’ve always been a huge fan of TV Go Home and Charlie Brooker in general and it seemed like Irish TV and culture was a good target for the kind of barbed surrealism that he does. (I’m not claiming I’m in his league or anything but he’s the main influence). I was really surprised that there hadn’t been a parody RTÉ Guide already. TV listings are 140-ish characters already and the RTÉ Guide has a kind of weird place in Irish culture where everybody knows it but nobody our age really has any idea of what’s in it anymore. We associate it with a small-c conservatism, or I did at least and I play that up occasionally with the account.

    (tags: nottherteguide rte rte-guide ireland funny satire interviews)

  • My Philosophy on Alerting

    ‘based on my observations while I was a Site Reliability Engineer at Google.’ – by Rob Ewaschuk; very good, and matching the similar recommendations and best practices at Amazon for that matter

    (tags: monitoring ops devops alerting alerts pager-duty via:jk)

Categories: FLOSS Project Planets

Isabel Drost: BigDataCon

Fri, 2013-05-17 15:29


Together with Uwe Schindler I had published a series of articles on Apache
Lucene at Software and Support Media’s Java Mag several years ago. Earlier this
year S&S kindly invited my to their BigDataCon - co-located with JAX to give a
talk of my choosing that at least touches upon Lucene.


Thinking back and forth about what topic to cover what came to my mind was to
give a talk on how easy it is to do text classification with Mahout when
relying on Apache Lucene for text analysis, tokenisation and token filtering.
All classes essentially are in place to integrate Lucene Analyzers with Mahout
vector generation - needed e.g. as a pre-processing step for classification or
text clustering.


Feel free to check out some of my sandbox code over at <a
href=“http://github.org/MaineC/sofia”>github</a>.


After attending the conference I can only recommend everyone interested in Java
programming and able to understand German to buy a ticket for the conference.
It’s really well executed, great selection of talks (though the sponsored
keynotes usually aren’t particularly interesting), tasty meals, interesting
people to chat with.

Categories: FLOSS Project Planets

Sebastien Goasguen: CloudStack University

Fri, 2013-05-17 04:24

At Apache CloudStack we recently started an initiative to organize our content into learning modules. We call this initiative CloudStack University. Everyone is invited to participate by contributing content (slides and screencasts), suggesting new learning modules that are needed and even creating exercises and assignments. School fun ! As we were discussing the initiative on the mailing list we started by looking at our existing content: slideshares, youtube videos and thought about organizing them into a CloudStack 101 course. This is still a work in progress that requires everyones participation to make it a great resource.

In the meantime I have been putting all my CloudStack content on slideshare and I wanted to provide a narrated version of these slides together with hands-on demo to show folks how to do a few things with CloudStack specifically but also related Cloud and OSS tips and tricks. Here comes the CloudStack university screencasts. I will add more of them as I go along and receive requests from the community (reach out on twitter @sebgoa and tell me what you want to see). I wanted to give you a preview of what this looks like. To create a self-paced learning module, I decided to create slide decks that people can download from slideshare and cross-post the corresponding screencasts (for most of them at least) on youtube. People can choose a particular topic, or take the entire series. The idea is that at the end of watching all the screencasts and reading the material people graduate from CloudStack University.

Certainly one can imagine how this could evolve into a full fledge training and certification program. I do plan to create a final exam once I am done with a consistent set of modules :) In this post I wanted to introduce you to some of the first modules I created. I welcome all feedback and suggestions to improve them. Reach out to me on twitter (@sebgoa) or contribute your own modules via the wiki and the mailing lists.

To get started, I show you below the screencast of testing the Apache CloudStack (ACS) 4.0.2 testing procedure. We used this basic procedure as a smoke test for the release and as way to vote on a release. There is far more QA going on for a release, this is just a basic testing to vote on the release. This is definitely geared towards developers, I plan to create a more end-user version of an introduction to CloudStack.

Once you have been introduced to ACS with this testing procedure you can learn the API. CloudStack has a native API as well as an EC2 compatible interface. The following screencasts and slides dives into the ACS API, showing how to do unauthenticated and authenticated calls, create a signature, it finishes with a discussion on REST and nice exercise.

Intro to CloudStack API from Sebastien Goasguen

Learning the API and the details of how to create a call by hand is a very nice skill to have. CloudStack has 16 clients on github in various languages, these clients make it extremely easy to use the CloudStack API. However ACS comes with a terrific interactive shell: CloudMonkey. The following module shows you how to install CloudMonkey and how to use it to manage your ACS backed cloud. If you followed the testing procedure, you can use CloudMonkey with your local CloudStack and explore the API.

CloudMonkey from Sebastien Goasguen

If you want to participate in the development of CloudStack you can contribute in many different ways, but definitely modifying the source to include new features, fix bugs and add documentation are some of the basic contributions. The following module gives you an introduction to Git the version control system used by ACS. This is aimed at beginners and starts with a demo of gist on github, it then shows a walkthrough of the CloudStack git repo, looks at remote feature branches and finishes with the workflow to create a patch and submit for review.

Git 101 for CloudStack from Sebastien Goasguen

At ACS we put great care in making sure that everyone in the world can use our software and that means use the UI in their own language and read the documentation in their own language. Translation is a tedious work but very much appreciated by non english speaking users and developers. In this module we show you how to contribute to the translation of CloudStack UI and docs using the Transifex service. We are proud to have over 30 translators who allow us to support the CloudStack UI in 10 languages and have documentation almost complete in 5 languages.

How to Translate Apache CloudStack Docs from Sebastien Goasguen

As a final note, Apache Software Foundation is a mentoring organization in the 2013 Google Summer of code. As such CloudStack is participating in GSoC, we are currently reviewing proposals from students and are eager to see the program started. I embed a deck that introduced the various projects we proposed. Stay tuned to learn more about which ones got awarded, notification on May 27th. And remember to keep an eye out for CloudStack University, a great resource for Cloud training.

Apache CloudStack Google Summer of Code from Sebastien Goasguen

Categories: FLOSS Project Planets

Steve Loughran: Tilehurst? Where is Tilehurst and why does google maps care about it?

Fri, 2013-05-17 00:09
Google are being asked hard questions in Parliament about their UK tax setup.

I think the politicians are missing an opportunity to ask them the question that I'm always wondering: where is Tilehurst and why does google maps think it is so special.

Here is a google maps view of the UK


It has Bristol on it, but not Portsmouth or Cardiff. Its a always a mystery in Bristol while Pompey gets a dot on the BBC weather map, as does BRS's nearby rival, Cardiff. In the google map, Edinburgh and Manchester are the ones being left out.

But that is nothing compared to the Tilehurst question. Specifically : why?

Look what happens when you click to zoom in one notch.

Edinburgh exists, along with pretty much everything north of their excluding Mallaig, which is something all visitors to Scotland should do when laying out an itinerary.

And what is there between Bristol and London. One town merits a mention. Tilehurst.

Apart from this mention of Tilehurst, I have no data on whether or not this town actually exists. It's not on any motorway exits on the M4, no train stations, no buses from Bristol. I have never heard it mentioned in any conversation whatsoever.

Why then does Google Maps think that it is more important than, say, Reading, which meets all of the above criteria (admittedly, never in conversations that speak positively of it), Oxford, which people outside the UK have heard of.

No, Tilehurst it is.

It could be some bizarre quirk of the layout algorithm that picks a random place ignoring things like nearby population numbers or using M-way exit signs, mentions in pagerank or knowledge of public transport.

I think it could just be some spoof town made up to catch out people who have been copying map data from google maps without accreditation. If some map or tourist guide mentions Tilehurst, the google maps team will know that they are using Google map data and immediately demand some financial recompense, routed through the Ireland subsidiary.

There's only one way to be sure: using this resolution map as the cue, drive there and see what it is.
Categories: FLOSS Project Planets

Justin Mason: Links for 2013-05-16

Thu, 2013-05-16 18:58
  • Monitoring the Status of Your EBS Volumes

    Page in the AWS docs which describes their derived metrics and how they are computed — these are visible in the AWS Management Console, and alarmable, but not viewable in the Cloudwatch UI. grr. (page-joshea!)

    (tags: ebs aws monitoring metrics ops documentation cloudwatch)

  • Interpol filter scope creep: ASIC ordering unilateral website blocks

    Bloody hell. This is stupidity of the highest order, and a canonical example of “filter creep” by a government — secret state censorship of 1200 websites due to a single investment scam site.

    The Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios. The instrument through which the ISPs are blocking the Interpol list of sites is Section 313 of the Telecommunications Act. Under the Act, the Australian Federal Police is allowed to issue notices to telcos asking for reasonable assistance in upholding the law. [...] Tonight Senator Conroy’s office revealed that the incident that resulted in Melbourne Free University and more than a thousand other sites being blocked originated from a different source — financial regulator the Australian Securities and Investment Commission. On 22 March this year, ASIC issued a media release warning consumers about the activities of a cold-calling investment scam using the name ‘Global Capital Wealth’, which ASIC said was operating several fraudulent websites — www.globalcapitalwealth.com and www.globalcapitalaustralia.com. In its release on that date, ASIC stated: “ASIC has already blocked access to these websites.”

    (tags: scams australia filtering filter-creep false-positives isps asic fraud secrecy)

  • Obfuscatory pie-chart from Garda penalty-points corruption report

    “Twitter / gavinsblog: For sake of clarity here is helpful pie chart of the 95.4% of fixed charge notices not terminated #missingthepoint” Paging Edward Tufte: classic example of an obfuscatory pie-chart, diagramming the wrong thing misleadingly. By presenting it like this, it appears that the 95.4% of cases where fixed charge notices were issued by the guards are relevant to the discussion of the other classes; in reality, that means that 4.6% of cases, 37,000 cases, were terminated, some for good reasons, others for not, and it’s the difference between those two classes that are relevant. In my opinion, 2 separate pie charts would be better; one to show the dismissed-versus-undismissed count (which IMO could have been omitted entirely), and one to show the good-vs-not-so-good termination reason counts (which is the meat of the issue).

    (tags: dataviz visualisation data obfuscation gardai police corruption penalty-points)

  • Berkeley DB Java Edition Architecture [PDF]

    background white paper on the BDB-JE innards and design, from 2006. Still pretty accurate and good info

    (tags: bdb-je java berkeley-db bdb design databases pdf white-papers trees)

  • one Canadian judge’s 192-page judgement eviscerating the Freeman-on-the-Land and related “Organised Pseudolegal Commercial Argument” litigants

    This Court has developed a new awareness and understanding of a category of vexatious litigant. As we shall see, while there is often a lack of homogeneity, and some individuals or groups have no name or special identity, they (by their own admission or by descriptions given by others) often fall into the following descriptions: Detaxers; Freemen or Freemen-on-the-Land; Sovereign Men or Sovereign Citizens; Church of the Ecumenical Redemption International (CERI); Moorish Law; and other labels – there is no closed list. In the absence of a better moniker, I have collectively labelled them as Organized Pseudolegal Commercial Argument litigants [“OPCA litigants”], to functionally define them collectively for what they literally are. These persons employ a collection of techniques and arguments promoted and sold by ‘gurus’ (as hereafter defined) to disrupt court operations and to attempt to frustrate the legal rights of governments, corporations, and individuals.   Over a decade of reported cases have proven that the individual concepts advanced by OPCA litigants are invalid. What remains is to categorize these schemes and concepts, identify global defects to simplify future response to variations of identified and invalid OPCA themes, and develop court procedures and sanctions for persons who adopt and advance these vexatious litigation strategies.   One participant in this matter [...] appears to be a sophisticated and educated person, but is also an OPCA litigant. One of the purposes of these Reasons is, through this litigant, to uncover, expose, collate, and publish the tactics employed by the OPCA community, as a part of a process to eradicate the growing abuse that these litigants direct towards the justice and legal system we otherwise enjoy in Alberta and across Canada. I will respond on a point-by-point basis to the broad spectrum of OPCA schemes, concepts, and arguments advanced in this action by [him]. Via Ronan Lupton

    (tags: via:ronanlupton law canada legal freeman opca court tax judgements)

Categories: FLOSS Project Planets

Isabel Drost: Hadoop Summit Amsterdam

Thu, 2013-05-16 15:27


About a month ago I attended the first European Hadoop Summit, organised by
Hortonworks in Amsterdam. The two day conference brought together both vendors
and users of Apache Hadoop for talks, exhibition and after conference beer
drinking.


Russel Jurney kindly asked me to chair the Hadoop applied track during
Apache Con EU. As a result I had a good excuse to attend the event. Overall
there were at least three times as many submissions than could reasonably be
accepted. Accordingly accepting proposals was pretty hard.


Though some of the Apache community aspect was missing at Hadoop summit it was
interesting nevertheless to see who is active in this space both as users as
well as vendors.


If you check out the talks on Youtube make sure to not miss the two sessions by
Ted Dunning as well as the talk on handling logging data by Twitter.

Categories: FLOSS Project Planets

Sanjiva Weerawarana: Launching WorkInSriLanka.lk Initiative

Thu, 2013-05-16 13:28
Over the last many months, I've been privileged to be part of a fantastic team of volunteers working on a new effort:This is an effort to help people who are considering moving to Sri Lanka to work and live. 
Me? Move to Sri Lanka?? What?!
Yes, Sri Lanka. No more war. No more bombs. No one trying to (systematically .. yeah we have our share of crazies) kill anyone. Great weather. Majorly improving infrastructure. A second airport (with no flights yet .. but that's ok everyone's gotta start at the bottom!). A real, honest-to-goodness highway (dinner in Galle tonite?) and many more coming. Apartments everywhere. Parks all over Colombo.
Compare that to where you're living? Do you go thru a metal detector to your workplace? Not in Sri Lanka any more. We had a long period of that .. but no more .. war finished in 2009, nearly to the day today (May 18th is the anniversary).
Anyway :-). Our objective is to first be a one-stop-site for anyone who's considering moving to Sri Lanka. Everything you need to know from what kind of jobs are available, how much does housing cost, how much do cars cost to kids schooling to visa stuff. All there, all in one place. All done in an objective, volunteer, independent kind of way. The site is still in its infancy of course .. more to come but its got a lot of stuff already!
With regards to jobs- if you're a senior person returning we will even help you get into the "network" to get into the loop of things. We have a pretty connected set of friends who are helping to get that done. We're also partnering with pretty much every industry body so that we can reach into all of those networks.
Going beyond the information portal we want to become an advocacy group to promote what's good about moving to Sri Lanka and also to work hard on breaking down more barriers. Even ex-Sri Lankans returning have some major barriers in the system now and we want to work towards removing them. 
This was a totally volunteer group of people from all over the place. Check us out at the site!
We had a fantastic launch event on Tuesday (May 14th) evening. We had the Governor of the Central Bank of Sri Lanka come and give the keynote talk and then had a superb panel. More on that coming soon at the site itself.
Check it out and give us your feedback - plenty of places in the site to do that. Enjoy surfing!
http://workinsrilanka.lk/.

Categories: FLOSS Project Planets

Justin Mason: Links for 2013-05-15

Wed, 2013-05-15 18:58
  • Rusty’s API Design Manifesto

    This classic came up in discussions yesterday…

    In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not.  It’s a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.

    (tags: rusty-russell quality coding kernel linux apis design code-reviews code)

  • Sup relaunched

    hooray! Command-line gmailish goodness returns. And with a signed gem, to boot

    (tags: gems ruby sup mail gmail mua)

  • Martin Thompson, Luke “Snabb Switch” Gorrie etc. review the C10M presentation from Schmoocon

    on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:

    This talk has some good points and I think the subject is really interesting.  I would take the suggested approach with serious caution.  For starters the Linux kernel is nowhere near as bad as it made out.  Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning.  I’ve heard they have since taken this to over 5 million concurrent connections. BTW Open Onload is an open source implementation.  Writing a network stack is a serious undertaking.  In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases.  It is a great exercise in data structures and lock-free programming.  If you need very high-end performance I’d talk to the Solarflare or Mellanox guys before writing my own. There are some errors and omissions in this talk.  For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache.  A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections.  Creating and destroying 1 million connections per second is a major issue.  A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.

    (tags: mechanical-sympathy hardware scaling c10m tcp http scalability snabb-switch martin-thompson)

  • ec2-consistent-snapshot

    This program creates an EBS snapshot for an Amazon EC2 EBS volume. To help ensure consistent data in the snapshot, it tries to flush and freeze the filesystem(s) first as well as flushing and locking the database, if applicable. Filesystems can be frozen during the snapshot. Prior to Linux kernel 2.6.29, XFS must be used for freezing support. While frozen, a filesystem will be consistent on disk and all writes will block. There are a number of timeouts to reduce the risk of interfering with the normal database operation while improving the chances of getting a consistent snapshot. If you have multiple EBS volumes in a RAID configuration, you can specify all of the volume ids on the command line and it will create snapshots for each while the filesystem and database are locked. Note that it is your responsibility to keep track of the resulting snapshot ids and to figure out how to put these back together when you need to restore the RAID setup. Handy!

    (tags: ubuntu ec2 aws linux ebs snapshots ops tools alestic)

  • Measuring & Optimizing I/O Performance

    Another good writeup on iostat and EBS, from Ilya Grigorik

    (tags: io optimization sysadmin performance iostat ebs aws ops)

  • AWS forum post on interpreting iostat output for EBS

    Great post from AndrewC@EBS on interpreting iostat output on EBS volumes — from 2009, but still looks reasonable enough

    (tags: iostat ebs disks hardware aws ops)

  • Operations is Dead, but Please Don’t Replace it with DevOps

    This is so damn spot on.

    Functional silos (and a standalone DevOps team is a great example of one) decouple actions from responsibility. Functional silos allow people to ignore, or at least feel disconnected from, the consequences of their actions. DevOps is a cultural change that encourages, rewards and exposes people taking responsibility for what they do, and what is expected from them. As Werner Vogels from Amazon Web Services says, “you build it, you run it”. So a “DevOps team” is a risky and ultimately doomed strategy. Sure there are some technical roles, specifically related to the enablement of DevOps as an approach and these roles and tools need to be filled and built. Self service platforms, collaboration and communication systems, tool chains for testing, deployment and operations are all necessary. Sure someone needs to deliver on that stuff. But those are specific technical deliverables and not DevOps. DevOps is about people, communication and collaboration. Organizations ignore that at their peril.

    (tags: devops teams work ops silos collaboration organisations)

Categories: FLOSS Project Planets

Isabel Drost: ApacheConNA: Misc

Wed, 2013-05-15 15:26


In his talk on Spdy Mathew Steele explained how he implemented the spdy protocol
as an Apache httpd module - working around most of the safety measures and
design decisions in the current httpd version. Essentially to get httpd to
support the protocol all you need now is mod_spdy plus a modified version of
mod_ssl.


The keynote on the last day was given by the Puppet founder. Some interesting
points to take away from that:


  • Though hard in the beginning (and half way through, and after years) it
    is important to learn giving up control: It usually is much more productive and
    leads to better results to encourage people to do something than to be
    restrictive about it. A single developer only has so much bandwidth - by
    farming tasks out to others - and giving them full control - you substantially
    increase your throughput without having to put in more energy.

  • Be transparent - it’s ok to have commercial goals with your project. Just
    make sure that the community knows about it and is not surprised to learn about
    it.

  • Be nice - not many succeed at this, not many are truely able to ignore
    religion (vi vs. emacs). This also means to be welcoming to newbies, to hustle
    at conferences, to engage the community as opposed to announcing changes.


Overall good advise for those starting or working on an OSS project and seeking
to increase visibility and reach.

Categories: FLOSS Project Planets

Daniel Kulp: Apache CXF and WS-Discovery

Wed, 2013-05-15 12:01

One of the new features in Apache CXF 2.7.x that I worked hard on was the introduction of support for WS-Discovery. WS-Discovery is basically a standard way for a service to announce when it’s available as well as standard way to probe the network for services that meet certain criteria and have the services that meet that criteria provide a response. Most ESB’s now have some sort of registry component or locator component or similar that provide a similar need. However, they are generally more proprietary in nature and, in many cases, will only work with services deployed in or managed by that ESB. WS-Discovery is completely standards based (OASIS) and is completely independent of any ESB, application server, etc…

So, how does it work? If the CXF WS-Discovery jars/bundles are available when a service starts, CXF will automatically register a ServerLifecycleListener onto the Bus. When the service starts, that listener will send a WS-Discovery “HELLO” message out on the network using the SOAP over UDP spec. When the service stops, it will send a “BYE” message out. Most users don’t need those messages, but if you do have an application that needs to keep track of services that are available, you could listen for them. The CXF WS-Discovery listener will also start an internal WS-Discovery service that will listen for SOAP/UDP “PROBE” requests on the network, process those requests to see if the service matches it, and respond with information (such as the address URL) if it does. This is all automatic. All that is needed is to add the WS-Discovery jars.

CXF also provides an API for probing the network for services. It’s only slightly documented right now, but you can easily look at the source for WSDiscoveryClient. Basically, some simple code like:

WSDiscoveryClient client = new WSDiscoveryClient(); List references = client.probe(new QName("http://cxf.apache.org/hello_world/discovery", "Greeter")); client.close(); //loop through all of them and have them greet me. GreeterService service = new GreeterService(); for (EndpointReference ref : references) { Greeter g = service.getPort(ref, Greeter.class); System.out.println(g.greetMe("World")); }

would use the WSDiscoveryClient to probe the network for all the services that can provide the “Greeter” service and then calls off to each one. It’s very simple.

The main problem with the WS-Discovery implementation in CXF 2.7.0 through 2.7.4 was that it only implemented WS-Discovery 1.1 as that is the actual OASIS standard that I looked at. However, there are many devices out there that only will respond to WS-Discovery 1.0 probes. In particular, any of the IP cameras that implement the ONVIF specification will only respond to 1.0. Thus, in 2.7.5, I updated the code to also handle WS-Discovery 1.0. The WSDiscoveryClient object has a setVersion10() method on it to change the probes over to WS-Discovery 1.0. With support for WS-Discovery 1.0, you can now use CXF to probe for any devices on the network that meet the ONVIF standard. No proprietary registry or anything required.

That’s pretty cool.

Now that the WS-Discovery stuff in CXF is fairly well tested and is known to work, I expect more of the downstream consumers of CXF to start integrating it into product offerings. I’m hoping to work on getting the Talend locator updated to use it. However, with the next (5.3.1) version of Talend ESB (due next month), you’ll be able to just “feature:install” the cxf-ws-discovery feature into the ESB and have the above all work. I also see that JBoss has already started integrating it into their application server.

Categories: FLOSS Project Planets

Gary Gregory: Oops, I committed without a comment

Wed, 2013-05-15 11:34

It does not happen often, but when it does, I’m left scrambling and googling for how to retroactively set a comment on a specific Subversion revision.

Here’s my a note to myself on how to do it:

svn propset -r <revision> –revprop svn:log “Commit message” <URL>

where you need to plug in a <revision> and <URL>.


Categories: FLOSS Project Planets

Bryan Pendleton: Gluttony

Wed, 2013-05-15 09:32

Definition: Fringe Binge. A Fringe Binge is what happens when Netflix releases Season Four of Fringe on streaming, and so you watch nine episodes in three nights.

It means your daily conversations are punctuated with fragments like:

Wait, hon, I missed something: which alternate universe is this?
Categories: FLOSS Project Planets

Edward J. Yoon: [Android SDK] Can't open input server /Library/InputManagers/Safari140

Wed, 2013-05-15 04:02
What the .... ^Cmacbook:~ edwardyoon$ android avd 2013-05-15 17:55:48.414 java[1432:1707] Can't open input server /Library/InputManagers/Safari140 macbook:~ edwardyoon$ ls /Library/InputManagers/Safari140/ ls: : Permission denied macbook:~ edwardyoon$ sudo android avd Password:
Categories: FLOSS Project Planets

Justin Mason: Links for 2013-05-14

Tue, 2013-05-14 18:58
Categories: FLOSS Project Planets

Oliver Wulff: LDAP support enhanced for CXF STS 2.7.5

Tue, 2013-05-14 15:48
I described in a previous blog how to configure the CXF STS for an LDAP directory for authentication and to retrieve user claims (attributes). The new release 2.7.5 of CXF provides extended support for roles managed in a LDAP directory. In previous versions, the LdapClaimsHandler added groups as roles if the groups were assigned to a multi-value attribute of the user. The new release provides an LdapGroupClaimsHandler which supports the case where an attribute of the groups lists the users who belong to this group. Further, it introduces the semantic of an application role. A user might have the role "User" for application X and role "Manager" and "User" for application Y.

The STS provides the semantic of an application with the AppliesTo parameter which is a URI. If you request a SAML token which includes the roles for a specific application (ex. MyApp), you get User and Manager back. A mapping is required in the STS to map the AppliesTo URI (URL or URN) to a String value like MyApp.

The sub-project Fediz provides in 1.1 (not released yet) a Maven profile to build the STS with an LDAP backend (instead of managing users/claims in a file). You can have a look at the ldap.xmlhere. The following configuration configures the LdapClaimsHandler and LdapGroupClaimsHandler. There is nothing special for the LdapClaimsHandler. The LdapGroupClaimsHandler also uses the Spring LdapContextSource and LdapTemplate.


<util:list id="claimHandlerList">
<ref bean="userClaimsHandler" />
<ref bean="groupClaimsHandler" />
</util:list>

<bean id="contextSource" class="org.springframework.ldap.core.support.LdapContextSource">
<property name="url" value="ldap://localhost:389/" />
<property name="userDn" value="uid=admin,ou=system" />
<property name="password" value="secret" />
</bean>

<bean id="ldapTemplate" class="org.springframework.ldap.core.LdapTemplate">
<constructor-arg ref="contextSource" />
</bean>

<util:map id="claimsToLdapAttributeMapping">
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname"
value="givenName" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname"
value="sn" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress"
value="mail" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/country"
value="c" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/postalcode"
value="postalCode" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/streetaddress"
value="postalAddress" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/locality"
value="town" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/stateorprovince"
value="st" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/gender"
value="gender" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/dateofbirth"
value="dateofbirth" />
<entry key="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/role"
value="member" />
</util:map>

<bean id="userClaimsHandler" class="org.apache.cxf.sts.claims.LdapClaimsHandler">
<property name="ldapTemplate" ref="ldapTemplate" />
<property name="claimsLdapAttributeMapping" ref="claimsToLdapAttributeMapping" />
<property name="userBaseDN" value="ou=users,dc=fediz,dc=org" />
<property name="userNameAttribute" value="uid" />
</bean>

<util:map id="appliesToScopeMapping">
<entry key="urn:org:apache:cxf:fediz:fedizhelloworld"
value="Example" />
</util:map>

<bean id="groupClaimsHandler" class="org.apache.cxf.sts.claims.LdapGroupClaimsHandler">
<property name="ldapTemplate" ref="ldapTemplate" />
<property name="userBaseDN" value="ou=users,dc=fediz,dc=org" />
<property name="userNameAttribute" value="uid" />
<property name="groupBaseDN" value="ou=groups,dc=fediz,dc=org" />
<property name="appliesToScopeMapping" ref="appliesToScopeMapping" />
</bean>

<jaxws:endpoint id="transportSTS1" implementor="#transportSTSProviderBean"
address="/STSService" wsdlLocation="/WEB-INF/wsdl/ws-trust-1.4-service.wsdl"
xmlns:ns1="http://docs.oasis-open.org/ws-sx/ws-trust/200512/"
serviceName="ns1:SecurityTokenService" endpointName="ns1:TransportUT_Port">
<jaxws:properties>
<entry key="ws-security.ut.validator">
<bean class="org.apache.ws.security.validate.JAASUsernameTokenValidator">
<property name="contextName" value="LDAP" />
</bean>
</entry>
</jaxws:properties>
</jaxws:endpoint>
I've highlighted the important beans to support the mapping of groups to (application) roles. The bean LdapGroupClaimsHandler has got the following attributes:

NameMandatoryDefaultDescriptionldapTemplateYesN.A.The Spring LDAP templategroupBaseDNYesN.A.The base group context where the search startsgroupObjectClassNogroupOfNamesObject class for groups. Used for search filter.groupMemeberAttributeNomemberThe group attribute where the list of users are storedgroupURINohttp://schemas.xmlsoap.org/ws/2005/05/identity/claims/roleThe SAML attribute name where the roles should be storedgroupNameGlobalFilterNoROLEDefault uses the CN of the group as role namegroupNameScopedFilterNoSCOPE_ROLEDefault cuts the SCOPE and the underscore of the CN of the groupappliesToScopeMappingNoN.A.The mapping is required if application specific roles must be supporteduserNameAttributeNocnUser id attribute. Only required if LDAP is not used for authentication and thus the DN of the user must be resolved first. Used for search filter.userObjectClassNopersonObject class for users. Only required if LDAP is not used for authentication and thus the DN of the user must be resolved first. Used for search filter.

The bean appliesToScopeMapping defines the mapping of the URI in the AppliesTo variable to a Name as URI's are not valid within a CN of an LDAP group.

One example for the usage of groupNameScopedFilter. One more example. Let's assume you use the same LDAP directory for the application environemnt development and pre-production and defines the following naming convention for application roles:
DEV_&ltApplication>_&ltROLE>_Group and UAT_&ltApplication>_&ltROLE>_GroupThe groupNameScopedFilter will look like this DEV_SCOPE_ROLE_Group (assumption: Different STS instances are deployed for development and pre-production).

The following table lists a few group examples and how the role value will look like in the SAML attribute. The assumption is that the AppliesTo element is urn:org:apache:cxf:fediz:fedizhelloworld which maps to the scope Example (see configuration example above) and the groupNameScopedFilter is configured like DEV_SCOPE_ROLE_Group:

Group CNRole nameDEV_Example_User_GroupUserDEV_Example_Admin_GroupAdminDEV_Example2_User_GroupignoredUAT_Example_User_GroupignoredINFR_Citrix_Accessignored

Last but not least I'd like to comment the default value of userNameAttribute which is CN. As per recommendation (5.4) the CN is typically the person's fullname and therefore doesn't fit for the user id (login name). Due to the reason that the LdapClaimsHandler had the cn as default value I wanted to keep that in sync and change it in the next non-patch release of CXF.

If you face issues or like more functionality send a message to the CXF mailing list or open a JIRA issue.

Categories: FLOSS Project Planets

Isabel Drost: ApacheConNA: Hadoop metrics

Tue, 2013-05-14 15:25


Have you ever measured the general behaviour of your Hadoop jobs? Have you
sized your cluster accordingly? Do you know whether your work load really is IO
bound or CPU bound? Legend has it noone expecpt Allen Wittenauer over at
Linked.In, formerly Y! ever did this analysis for his clusters.


Steve Watt gave a pitch for actually going out into your datacenter measuring
what is going on there and adjusting the deployment accordingly: In small
clusters it may make sense to rely on raided disks instead of additional
storage nodes to guarantee “replication levels”. When going out to vendors to
buy hardware don’t rely on paper calculations only: Standard servers in Hadoop
clusters are 1 or 2u. This is quite unlike beefy boxes being sold otherwise.


Figure out what reference architecture is being used by partners, run your
standard workloads, adjust the configuration. If you want to run the 10TB
Terrasort to benchmark your hardware and system configuration. Make sure to
capture data during all your runs - have Ganglia or SAR, watch out for
intersting behaviour in io rates, cpu utilisation, network traffic. The goal is
to get the cpu busy, not wait for network or disk.


After the instrumentation and trial run look for over- and underprovisionings,
adjust, leather, rinse, repeat.


Also make sure to talk to the datacenter people: There are floor space, power
and cooling constraints to keep in mind. Don’t let the whole datacenter go down
because your cpu intensive job is drawing more power than the DC was designed
for. Ther are also power constraints per floor tile due to cooling issues -
those should dictate the design.


Take a close look at the disks you deploy: SATA vs. SAS can make a 40%
performance difference at a 20% cost difference. Also the number of cores per
machines dictates the number of disks to spread the likelyhood of random read
access. As a rule of thumb - in a 2U machine today there should be at least
twelve large form factor disks.


When it comes to controllers he goal should be to get a dedicated lane to disc,
safe one controller if price is an issue. Trade off compute power against power
consumption.


Designing your network keep in mind that one switch going down means that one
rack will be gone. This may be a non-issue in a Y! size cluster, in your
smaller scale world it might be worth the money investing in a second switch
though: Having 20 nodes go black isn’t a lot of fun if you cannot farm out the
work and re-replication to other nodes and racks. Also make sure to have enough
ports in rack switches for the machines you are planning to provision.


Avoid playing the ops whake-a-mole game by having one large cluster in the
organisation than many different ones where possible. Multi-tenancy in Hadoop is
still pre-mature though.


If you want to play with future deployments - watch out for HP currently
packing 270 servers where today are just two via system on a chip designs.

Categories: FLOSS Project Planets

Shai Erera: The Replicator

Tue, 2013-05-14 15:17
No, as much as I want to, I don't (yet) have a true Replicator at my disposal to blog about. Nor does it look like I will have one in the near future, even though scientists are making great progress towards it. I have recently made my contribution to the global replication effort though, by adding an index replication module to Lucene. It does not convert energy to mass, nor mass to mass, so I'm not sure if scientifically it qualifies as a Replicator at all, but if you are into search indexes replication with Lucene, read on!

In computer science, replication is defined as "sharing information so as to ensure consistency between redundant resources ... to improve reliability, fault-tolerance, or accessibility" (Wikipedia). When you design a search engine, especially at large scales, replication is one approach to achieve these. Index replicas can replace primary nodes that become unavailable (e.g due to planned maintenance or severe hardware failures), as well as to support higher query loads by load-balancing search requests across them. Even if you are not building a large-scale search engine, you can use replication to take hot backups of your primary index, while searches are running on it.

Lucene's replication framework implements a primary/backup approach, where the primary process performs all indexing operations (updating the index, merging segments etc.), and the backup/replica processes copy over the changes in an asynchronous manner. The framework consists of few key components that control the replication actions:
  • Replicator mediates between the clients and server. The primary process publishes Revisions while the replica processes update to the most recent revision following their own. When a new Revision is published, the previous one is released (unless it is currently being replicated), so that the resources it consumes can be reclaimed (e.g. remove unneeded files).

  • Revision describes a list of files and their metadata. The Revision is responsible to ensure that the files are available as long as clients copy its files. For example, IndexRevision takes a snapshot on the index using SnapshotDeletionPolicy, to guarantee the files are not deleted until the snapshot is released.

  • ReplicationClient performs the replication operation on the replica side. It first copies the needed files from the primary server (e.g. the files that the replica is missing) and then invokes the ReplicationHandler to act on the copied files. If any errors occur during the copy process, it restarts the replication session. That way, when the handler is invoked, it is guaranteed that all files were copied safely to the local storage.

  • ReplicationHandler acts on the copied files. IndexReplicationHandler copies the files over to the index directory and then syncs them to ensure the files are on stable storage. If any errors occur, it aborts the replication session and cleans up the index directory. Only after successfully copying and syncing the files, it notifies a callback that the index has been updated, so that e.g. the application can refresh its SearcherManager or perform whatever tasks it needs on the updated index.
The replication framework supports replicating any type of files. It offers built-in support for replicating a single index (as described above), as well as an index and taxonomy pair (for faceted search) via IndexAndTaxonomyRevision and IndexAndTaxonomyReplicationHandler. To replicate other types of files, you need to implement Revision and ReplicationHandler. But be advised, implementing a handler properly is ... tricky!

Following example code shows how to use the replicator on the server side:
// publish a new revision on the server
IndexWriter indexWriter; // the writer used for indexing
Replicator replicator = new LocalReplicator();
replicator.publish(new IndexRevision(indexWriter));
Using the replicator on the client side is a bit more involved but hey, that's where the important stuff happens!
// client can replicate either via LocalReplicator (e.g. for backups)
// or HttpReplicator (e.g. when the server is located on a different
// node.
Replicator replicator;

// refresh SearcherManager after the index is updated
Callable<Boolean> callback = new Callable<Boolean>() {
public void call() throws Exception {
// index was updated, refresh manager
searcherManager.maybeRefresh();
}
}

// initialize a matching handler for the published revisions
ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback);

// where should the client store the files before invoking the handler
SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir);

ReplicationClient client = new ReplicationClient(replicator, handler, factory);

client.updateNow(); // invoke client manually
// -- OR --
client.startUpdateThread(30000); // check for updates every 30 seconds

So What's Next?

The replication framework currently restarts a replication session upon failures. It would be great if it supported resuming a session by e.g. copying only the files that weren't already successfully copied. This can be done by having both server and client e.g. compute a checksum on the files, so the client knows which ones were successfully copied and which weren't. Patches welcome!

Another improvement would be to support resume at the file level, i.e. don't copy parts of the file that were already copied safely. This can be implemented by modifying ReplicationClient to discard the first N bytes of the file it already copied, but would be better if it's supported by the server as well, so that it only sends the required parts. Here too, patches are welcome!

Yet another improvement is to support peer-to-peer replication. Today, the primary server is the bottleneck of the replication process since all clients access it for pulling files. This can be somewhat mitigated by e.g. having a tree network topology, where the primary node is accessed by only a few nodes, which are then accessed by other nodes and so forth. Such topology is however harder to maintain as well as very sensitive to the root node going out of action. In a peer-to-peer replication however, there is no single node from which all clients replicate, but rather each server can broadcast its current revision as well as choose any server that has a newer revision available to replicate from. An example of such implementation over Apache Solr is described here. Hmmm ... did I say patches are welcome?

Lucene has a new replication module. It's only 1 days-old and can already do so much. You are welcome to use it and help us teach it new things!
Categories: FLOSS Project Planets

Dejan Bosanac: Lightweight Messaging For Web And Mobile With Apache ActiveMQ

Tue, 2013-05-14 09:11

Messaging once was a thing of “enterprises” but times are changing fast and devs now want to use it from virtually any environment. I thought it’s important to talk about messaging technologies available for web and mobile, so I’ll give a talk about it at CamelOne and OSCON. If you’re attending one of those give me a ping, so we can have a chat over some beers.

Categories: FLOSS Project Planets

Jeremy Quinn: Intimate [Flickr]

Tue, 2013-05-14 07:07

sharkbait posted a photo:

I love the red flush some plants have on new growth.
An intimate shot of one of my Echeveria.

Categories: FLOSS Project Planets