FLOSS Project Planets

gnuastro @ Savannah: Gnuastro webpage is activated

GNU Planet! - Mon, 2015-04-20 00:00

The Gnuastro webpage ( http://www.gnu.org/software/gnuastro/ ) was activated and the documentation is now available. There is still a lot of work to do until it is ready for release though.

Announcements will be done using the info-gnuastro@gnu.org mailing list.

Categories: FLOSS Project Planets

Russ Allbery: Review: The Girls from Alcyone

Planet Debian - Sun, 2015-04-19 23:28

Review: The Girls from Alcyone, by Cary Caffrey

Series: Girls from Alcyone #1 Publisher: Tealy Copyright: 2011 ISBN: 1-105-33727-8 Format: Kindle Pages: 315

Sigrid is a very special genetic match born to not particularly special parents, deeply in debt in the slums of Earth. That's how she finds herself being purchased by a mercenary corporation at the age of nine, destined for a secret training program involving everything from physical conditioning to computer implants, designed to make her a weapon. Sigrid, her friend Suko, and the rest of their class are a special project of the leader of the Kimura corporation, one that's controversial even among the corporate board, and when the other mercenary companies unite against Kimura's plans, they become wanted contraband.

This sounds like it could be a tense SF thriller, but I'll make my confession at the start of the review: I had great difficulty taking this book seriously. Initially, it had me wondering what horrible alterations and mind control Kimura was going to impose on the girls, but it very quickly turned into, well, boarding school drama, with little of the menace I was expecting. Not that bullying, or the adults who ignore it to see how the girls will handle it themselves, are light-hearted material, but it was very predictable. As was the teenage crush that grows into something deeper, the revenge on the nastiest bully that the protagonist manages to not be responsible for, and the conflict between unexpectedly competent girls and an invasion of hostile mercenaries.

I'm not particularly well-read or informed about the genre, so I'm not the best person to make this comparison, but the main thing The Girls from Alcyone reminded me of was anime or manga. The mix of boarding-school interpersonal relationships, crushes and passionate love, and hypercompetent female action heroes who wear high heels and have constant narrative attention on their beauty had that feel to it. Add in the lesbian romance and the mechs (of sorts) that show up near the end of the story, and it's hard to shake the feeling that one is reading SF yuri as imagined by a North American author.

The other reason why I had a hard time taking this seriously is that it's over-the-top action sequences (it's the Empire Strikes Back rescue scene!) mixed with rather superficial characterization, with one amusing twist: female characters almost always end up being on the side of the angels. Lady Kimura, when she appears, turns into exactly the sort of mentor figure that one would expect given the rest of the story (and the immediate deference she got felt like it was lifted from anime). The villains, meanwhile, are hissable and motivated by greed or control. While there's a board showdown, there's no subtle political maneuvering, just a variety of more or less effective temper tantrums.

I found The Girls from Alcyon amusing, and even fun to read in places, but that was mostly from analyzing how closely it matched anime and laughing at how reliably it delivered characteristic tropes. It thoroughly embraces its action-hero story full of beautiful, deadly women, but it felt more like a novelization of a B-grade sci-fi TV show than serious drama. It's just not well-written or deep enough for me to enjoy it as a novel. None of the characters were particularly engaging, partly because they were so predictable. And the deeper we got into the politics behind the plot, the less believable I found any of it.

I picked this up, along with several other SFF lesbian romances, because sometimes it's nice to read a story with SFF trappings, a positive ending, and a lack of traditional gender roles. The Girls from Alcyone does have most of those things (the gender roles are tweaked but still involve a lot of men looking at beautiful women). But unless you really love anime-style high-tech mercenary boarding-school yuri, want to read it in book form, and don't mind a lot of cliches, I can't recommend it.

Followed by The Machines of Bellatrix.

Rating: 3 out of 10

Categories: FLOSS Project Planets

Richard Hartmann: Release Critical Bug report for Week 16

Planet Debian - Sun, 2015-04-19 16:35

The UDD bugs interface currently knows about the following release critical bugs:

  • In Total: 1031 (Including 146 bugs affecting key packages)
    • Affecting Jessie: 53 (key packages: 42) That's the number we need to get down to zero before the release. They can be split in two big categories:
      • Affecting Jessie and unstable: 49 (key packages: 42) Those need someone to find a fix, or to finish the work to upload a fix to unstable:
        • 12 bugs are tagged 'patch'. (key packages: 9) Please help by reviewing the patches, and (if you are a DD) by uploading them.
        • 3 bugs are marked as done, but still affect unstable. (key packages: 2) This can happen due to missing builds on some architectures, for example. Help investigate!
        • 34 bugs are neither tagged patch, nor marked done. (key packages: 31) Help make a first step towards resolution!
      • Affecting Jessie only: 4 (key packages: 0) Those are already fixed in unstable, but the fix still needs to migrate to Jessie. You can help by submitting unblock requests for fixed packages, by investigating why packages do not migrate, or by reviewing submitted unblock requests.
        • 1 bugs are in packages that are unblocked by the release team. (key packages: 0)
        • 3 bugs are in packages that are not unblocked. (key packages: 0)

How do we compare to the Squeeze and Wheezy release cycles?

Week Squeeze Wheezy Jessie 43 284 (213+71) 468 (332+136) 319 (240+79) 44 261 (201+60) 408 (265+143) 274 (224+50) 45 261 (205+56) 425 (291+134) 295 (229+66) 46 271 (200+71) 401 (258+143) 427 (313+114) 47 283 (209+74) 366 (221+145) 342 (260+82) 48 256 (177+79) 378 (230+148) 274 (189+85) 49 256 (180+76) 360 (216+155) 226 (147+79) 50 204 (148+56) 339 (195+144) ??? 51 178 (124+54) 323 (190+133) 189 (134+55) 52 115 (78+37) 289 (190+99) 147 (112+35) 1 93 (60+33) 287 (171+116) 140 (104+36) 2 82 (46+36) 271 (162+109) 157 (124+33) 3 25 (15+10) 249 (165+84) 172 (128+44) 4 14 (8+6) 244 (176+68) 187 (132+55) 5 2 (0+2) 224 (132+92) 175 (124+51) 6 release! 212 (129+83) 161 (109+52) 7 release+1 194 (128+66) 147 (106+41) 8 release+2 206 (144+62) 147 (96+51) 9 release+3 174 (105+69) 152 (101+51) 10 release+4 120 (72+48) 112 (82+30) 11 release+5 115 (74+41) 97 (68+29) 12 release+6 93 (47+46) 13 release+7 50 (24+26) 14 release+8 51 (32+19) 15 release+9 39 (32+7) 16 release+10 20 (12+8) 17 release+11 24 (19+5) 18 release+12 2 (2+0)

Graphical overview of bug stats thanks to azhag:

Categories: FLOSS Project Planets

Extensive Source Comments or Extensive Commit Messages?

Planet KDE - Sun, 2015-04-19 16:29

If you consider yourself as a serious developer, you know writing good commit messages is important. You don't want to be that guy:

XKCD #1296

This applies to source comments as well: good comments save time, bad comments can be worse than no comments.

For a long time, I usually favored source comments over commit messages: whenever I was about to commit a change which needed some explanations, I would often start to write a long commit message, then pause, go back to the code, write my long explanation as a comment and then commit the changes with a short message. After all, we are told we should not repeat ourselves.

Recently I was listening to Thom Parkin talking about rebasing on Git Minutes #33 (Git Minutes is a great podcast BTW, highly recommended) and he said this: "Commits tell a story". That made me realize one thing: we developers read code a lot, but we also read a lot of commit histories, either when tracking a bug or when reviewing a patchset. Reading code and reading history can be perceived as two different views of a project, and we should strive to make sure both views are readable. Our readers (which often are our future selves...) will thank us. It may require duplicating information from time to time, but that is a reasonable trade-off in my opinion.

So, "Write extensive source comments or extensive commit messages?" I'd say: "Do both".

Categories: FLOSS Project Planets

Turnkey Linux: Getting started with Python and Lisp

Planet Python - Sun, 2015-04-19 16:00

A few weeks ago I talked with a friend studying computer science who I discovered had never experienced the joy of programming with a high level language. Not only that but he didn't have the first clue what he was missing. I feared without my immediate intervention another perfectly good mind would be wasted in programming hell. At his university they were using Java for nearly everything so he had somehow gotten the terribly mistaken idea that it didn't really matter what programming language one used. I carefully explained that:

  1. Programming languages were not equivalent. Not by a longshot.
  2. Java is rarely the language of choice for really good programmers
  3. The two languages I recommend taking a close look at are Python and Lisp.

He had never heard of Python or Lisp before so I helpfully provided a list of links to resources that would hopefully help get him started. Then I figured I'd share those resources with the community for the benefit of other misguided programmers in similar circumstances.

Also, next time this issue comes up I won't have to waste any breath proselytizing. I'll just point to this blog post.

Python resources Lisp resources Paul Graham

Paul Graham has written extensively about this stuff. Listen to this guy, he really gets it.

He's also written a book ("On Lisp") you can buy or download for free:

http://www.paulgraham.com/onlisp.html

Categories: FLOSS Project Planets

Tom White: The Hay Dark Skies Festival, Reverend Thomas William Webb, and Jupiter

Planet Apache - Sun, 2015-04-19 12:09
In 2013, the Brecon Beacons was designated a Dark Sky Reserve, and a year later the first Dark Skies Festival was held in Hay-on-Wye. The second festival took place this weekend, and my family went along to some of the activities.

Young stargazers, Lottie and MillieIn the morning, we found ourselves in a planetarium tent, then we looked at sunspots, and held pieces of meteorite.

The evening event was stargazing at Holy Trinity Church in Hardwicke, just outside Hay. Quite apart from the lack of light pollution, the location was a special one, since the vicar of the parish from 1856 until 1885 was Reverend Thomas William Webb, who in his spare time observed the night sky with telescopes and an observatory he had built himself.
Holy Trinity Church, Hardwicke
In 1859, while at Hardwicke he wrote the classic book, Celestial Objects for the Common Telescope, the object of which was "to furnish the possessors of ordinary telescopes with plain directions for their use, and a list of objects for their advantageous employment".

The book remained in print well into the following century (and was recently republished by Cambridge University Press), and it's probably difficult to overemphasise the importance of this book in encouraging generation after generation of amateur stargazers.

In the words of Janet and Mark Robinson, who used to live in the vicarage and have edited a book about Webb,
Like Patrick Moore, he was an enthusiast who wanted to inspire as many people as possible to look through a telescope. Even at the choir party he "arranged the telescope and acted as showman and all in turn had a look at Saturn".Webb would no doubt have been pleased to see yesterday's gathering of enthusiastic amateurs (including the Robinsons) with an impressive range of telescopes, on a cold but very clear night. The highlight for us was seeing Jupiter and its four brightest moons (Io, Europa, Ganymede and Callisto) through a large reflecting telescope. We could even see the north and south belts, and the Great Red Spot (or Pink Splodge as Lottie named it).

Sunset. Venus is visible top centreThank you to the organisers of the Hay Dark Skies Festival, and the volunteers from the Usk Astronomical Society (the oldest astronomical society in the UK), the Abergavenny Astronomy Society and the Heads of the Valleys Astronomical Society.
Categories: FLOSS Project Planets

Nick Clifton: April 2015 GNU Toolchain Update

GNU Planet! - Sun, 2015-04-19 11:22
Hi Guys,

  There are several things to report this month:

  * The GCC version 5 branch has been created.  No releases have been made from this branch yet, but when one happens it will be GCC 5.1.  Meanwhile the mainline development sources have been switched to calling themselves GCC version 6.

  * Support has been added for configuring targets that use the Nuxi CloudABI.  More details of this ABI can be found here:  https://github.com/NuxiNL/cloudlibc

  * The linker and assembler now support an option to control how DWARF debug sections are compressed:  --compress-debug-sections=[none|zlib|zlib-gnu|zlib-gabi]

    Selecting none disables compression.  This is the default behaviour if this option is not used.  Selecting zlib or zlib-gnu compresses the sections and then renames them to start with a .z.  This is the old method of indicating that a debug section has been compressed.  Selecting zlib-gabi compresses the sections, but rather than renaming them they instead have the new SHF_COMPRESSED bit set in their ELF section header.

    The other binutils tools have been updated to recognise and handle this SHF_COMPRESSED bit.  More information on the new bit can be found here: https://groups.google.com/forum/#!msg/generic-abi/dBOS1H47Q64/PaJvELtaJrsJ

    In another, related change, the binutils will no longer compress a debug section if doing so would actually make it bigger.

    Also the zlib compression/decompression library sources have now been brought in to the binutils git repository and are now a standard part of a binutils release.

    * The linker has a new command line option:  --warn-orphan
      This option tells the linker to generate a warning message whenever it has to guess at the placement of a section in the output file.  This happens when the linker script in use does not specify where the section should go.

    * The compiler has a new option: -fsanitize-sections=sec1,sec2,...
      This tells the address sanitizer to add protection to global variables defined in the named section(s).  By default any globals in sections with user defined names are not sanitized as the compiler does not know what is going to happen to them.  In particular variables in such sections sometimes end up being merged into an array of values, where the presence of address sanitization markers would break the array.

   * The AVR port of the compiler has a new command line option: -nodevicelib
     This tells the compiler not to link against AVR-LibC's device specific library libdev.a.

  * The RX port of GCC has a new command line option to disable the use of RX string instructions (SMOVF, SUNTIL, etc).  This matters because it is unsafe to use these instructions if the program might access the I/O portion of the address space.

  * The RL78 port of GCC now has support the multiply and divide instructions provided by the G14 cpu, and the divide hardware peripheral provided by the G13 core.

  * GDB now honours the content of the file /proc/PID/coredump_filter on GNU/Linux systems.  This file can be used to specify the types of memory mappings that will be include in a corefile.  For more information, please refer to the manual page of "core(5)".  GDB also has a new command: "set use-coredump-filter on|off".  It allows to set whether GDB will read the content of the /proc/PID/coredump_filter file when generating a corefile.

  * GDB's "info os cpus" command on GNU/Linux can now display information on the cpus/cores on the system.

  * GDB has two new commands: "set serial parity odd|even|none" and "show serial parity".  These allows to set or show parity for the remote serial I/O.

Cheers
  Nick
Categories: FLOSS Project Planets

Ardour 4 on Debian Jessie

Planet KDE - Sun, 2015-04-19 10:14

The Ardour project just announced version four of the digital audio workstation. Debian carries version 3, so I decided to build version 4 myself. Here is a summary from what I learned.

First of all, the Ardour people have written a building page and a list of dependencies. The do carry a set of patches towards some of the packages. These seems to be more or less small fixes, apart from the libsndfile that has a bug fix for handling BWF files.

In addition to the patches libs, the requirements list a whole range of gtk and corresponding -mm packages as well as boost, and varous codecs and such. I decided not to care too much about versions for these packages. Instead, I just took whatever I could find in Debian. The packages installed are:

  • libsndfile1-dev
  • libgnomecanvas2-dev
  • libsigc++-2.0-dev
  • libcairo2-dev
  • liblrdf0-dev
  • libfreetype6-dev
  • libboost1.55-all-dev
  • libfftw3-dev
  • libglibmm-2.4-dev
  • libcairomm-1.0-dev
  • libpangomm-1.4-dev
  • libatkmm-1.6-dev
  • libart2.0-cil-dev
  • libgnomecanvasmm-2.6-dev
  • liblo-dev
  • libraptor2-dev
  • librasqal3-dev
  • libogg-dev
  • libflac-dev
  • libvorbis-dev
  • libsamplerate0-dev
  • libaudio-dev
  • liblv2dynparam1-dev
  • libserd-dev
  • libsord-dev
  • libsratom-dev
  • liblilv-dev
  • libsuil-dev
  • librubberband-dev
  • vamp-plugin-sdk
  • libaubio-dev
  • libjack-dev
  • liblilv-dev

Then it is just a matter of configuring using waf.

./waf configure --with-backend=alsa --prefix=/wherever/you/want/it
make
./waf install

My plan is to use ALSA (i.e. not JACK) and installing libjack-dev meant that Skype got kicked out, so the system needed some love to restore the order.

apt-get autoremove
apt-get remove libjack-dev
apt-get remove libjack0
dpkg --install skype-debian_4.3.0.37-1_i386.deb
apt-get install -f

Despite this little hack, Ardour seems to work nicely and record and play back. I still need to test out some more features to see if everything is in place, but it looks hopeful.

Update! As pointed out in the comments, Debian not only carries a really old version but also version 3.

Categories: FLOSS Project Planets

Julien Tayon: So I wrote a Proof of Concept language to address the problem of safe eval

Planet Python - Sun, 2015-04-19 10:07
I told fellow coders: «hey! I know a solution to the safe eval problem: it is right under my eyes». I think I can code it in less than 24 hours from scratch. It will support safe templating... Because That's the primary purpose for it.


TL; DR:


I was told my solution was overengineering because writing a language is so much efforts. Actually it took me less time to write a language without any theorical knowledge than the time I have been loosing in my various jobs every single time to deal with unsafe eval.

Here is the result in python : a forth based templating language that does actually covers 90% of the real used case I have experienced that is a fair balance between time to code and real features people uses.


You don't actually need that much features.

https://github.com/jul/confined (+pypi package)

NB Work in progress
 How I was tortured as a student
When I was a student, I was nicely helped through the hell of my chaotic studies by people in a university called ENS.

In exchange of their help I had to code for data measurement/labs with various language OS, and environment.

I was tortured because I liked programming and I did not have the right to do OOP, malloc, use new language .... Perl, python, new version of C standards...

Even for handling numbers scientifics were despising perl/python because of their inaptitude to safely handle maths. I had to use the «numerical recipies» and/or fortran. (I checked in 2005 they tried and were disappointed by python, I guess since then they might use numpy  that is basically binding on safe ports of numerical recipies in fortran). I was working on chaotic system that are really sensitive to initial conditions ... a small error in the input propagate fast.

The people were saying: we need this code to work and we need to be able to reuse it, and we need our output to be reproducible and verifiable : KISS. Keep It Simple Stupid. And even more stupid.

So I was barred from any unbound resource behaviour, unsafe behaviour with base types.

Actually by curiosity I recompiled code that was using C and piping output to tcl/tk I made at this time to make graphical representation of multi agent simulations and it still works... It was written in 1996.

That's how I learnt programming : by doing the worst possible unfunky programming ever.  I thought they were just stupid grumpy old men.

And I also had to use scientific equipment/softwares. They oddly enough all used forth RPN notations to enable users some basic manipulation.

Like:
  1. ASYST
  2. RRD Tools
  3. pytables NUMEPXR extension http://code.google.com/p/numexpr
And I realized I understood:

FORTH are easy to implement:
  • it is a simple left to right parsing technique: no backtracking/no states;
  • the grammar is easy to write; 
  • the memory model makes it easy to confine in boundaries;
  • it is immutable in its serialization (you can drop exec and data stack and safely resume/start/transport them)
  • it is thus efficient for parallization,
  • it thus can be used in embedded stuff (like measurement instruments that needs to be autonomous AND programmable)
 So I decide to give me one day to code in python a safe confined interpreter.

I was told it was complex to write a language especially when like I do, I never had any lessons/interests in parsing/language theory and I suck at mathematics.


Design choices
Having the minimum dependency requirements: stdlib.
 One number to rule them all I have been beaten so much time in web development by the floating point number especially for monetary values that I wanted a number that could do fixed point calculus. And also I have been beaten so many time by problems were the input were sensitive to initial conditions I wanted a number that would be better than IEEE 754 to potentially control errors.
So I went for the stdlib IEEE 854 officious standard based number : https://docs.python.org/2/library/decimal.html
Other advantages: string representation (IEEE 754) is canonical and the regexp is well known. Thus easy to parse.

In face of ambiguity refuse to guessI will try to see input as (char *) and have the decoding being explicit.
Rationale: if you work with SIP (I do) headers are latin1 and if you work in an international environment you may have to face data incorrectly encoded that can also represent UTF8 and people in this place (Québec love to use accents éverywhere). So I want to use it myself.

It is also the reason I used my check_arg library to enforce type checking of my operators and document stuff by using a KISS approach: function names should be explicit and their args should tell you everything.

Having a modular grammar so that operators/base types can be added/removed easily. 
I evoked in a precedent post how we cannot do safe eval in python because keywords and cannot be controled. So I decided to have a dynamic grammar built at tokenization time (the code has the possibility to do it, it is not yet available through the API).

Avoid nested data structures recursive calls
I wanted to do a language my fellow mentors could use safely. I may implement recursive eval in the future but I will enforce a very limited level of recursion. But, I see a solution to replace nested calls by using the stack.

Stateless and immutables only
I have seen so many times people pickling function that I decided to have something more usable for remote execution. I also wanted my code to be idempotent. If parsing is seen as a function I wanted to guaranty that

parsing(Input, Environment) => output 

would be guaranteed to be always the same
We can also serialize the exec stack the data stack at any given moment to change it later. I want no side effects. As a result there will ne no time related functions.

As a result you can safely execute remote code.

Resource use should be controlled
Stack size, size of the input, recursion level, the initial state of the interpreter (default encoding, precision, number behaviours). I want to control everything (that what context will be for and all parameters WILL have to be mandatory). So that I can guaranty the most I can (I was thinking of writing C extensions to ensure we DONT use atof/atoi but strtol/f ...).

This way I can avoid to use an awful lot of virtual machines/docker/jails whatever.

Grammar should be easy to read
Since I don't know how to parse, but I love damian conway, I looked at Regexp::Grammar and I said: Oh! I want something like this.

There are numerous resource on stackoverflow on  how to parse exactly various base types (floats, strings). How to alternate and patterns... So that it took me 3 hours to imagine a way to do it. So I still know nothing of parsing and stuff, but I knew I would have a result.

I chose a grammar that can be written in a way to avoid backtracking (left to right helped a lot) to avoid the regexp to be uncontrolled.

I am not sure of what it does, but I am pretty sure it can be ported in C or whatever that guarantees NO nested/recursive use of resources. (regexp are not supposed to stay in a hardened version this is just a good enough parser written in 3 hours with my insufficient knowledge).

I still think Perl is right
We should do our unittest before our install. So my module refuse to install if the single actual test I put (as a POC) does not pass.


Conclusion
So it really worths the time spent. And now I may be in the «cour des grands» of the coders that implemented their own language, from scratch and without any prior theorical knowledge of how to write one. So I have been geeking alone in front of my computer and my wife is pissed at me for not enoying the day and behaving like an autist, but I made something good enough for my own use case.

And requirements with python and making tests before install is hellish.

(Arg ... And why my doc does not show up on pypi? )

Categories: FLOSS Project Planets

Ian Ozsvald: PyDataParis 2015 and “Cleaning Confused Collections of Characters”

Planet Python - Sun, 2015-04-19 07:36

I’m at PyDataParis, this is the first PyData in France and we have a 300-strong turn-out. In my talk I asked about the split of academic and industrial folk, we have 70% industrialists here (at least – in my talk of 70 folk). The bulk of the attendees are in the Intro track and maybe the split is different in there. All slides are up, videos are following, see them here.

Here’s a photo of Gael giving a really nice opening keynote on Scikit-Learn:

I spoke on data cleaning with text data, I packed quite a bit into my 40 minutes and got a nice set of questions. The slides are below, it covers:

  • Data extraction from text files, PDF, HTML/XML and images
  • Merging on columns of data
  • Correctly processing datetimes from files and the dangers of relying on the pandas defaults
  • Normalising text columns so we could join on otherwise messy data
  • Automated data transformation using my annotate.io (Python demo)
  • Ideas on automated feature extraction
  • Ideas on automating visualisation for new, messy datasets to get a “bird’s eye view”
  • Tips on getting started – make a Gold Standard!

One question concerned the parsing of datetime strings from unusual sources. I’d mentioned dateutil‘s parser in the talk and a second parser is delorean. In addition I’ve also seen arrow (an extension of the standard datetime) which has a set of parsers including one for ISO8601. The parsedatetime module has an NLP module to convert statements like “tomorrow” into a datetime.

I don’t know of other, better parsers – do you? In particular I want one that’ll take a list of datetimes and return one consistent converter that isn’t confused by individual instances (e.g. “1/1″ is MM/DD or DD/MM ambiguous).

I’m also asking for feedback on the subject of automated feature extraction and automated column-join tools for messy data. If you’ve got ideas on these subjects I’d love to hear from you.

In addition I was reminded of DiffBot, it uses computer vision and NLP to extract meaning from web pages. I’ve never tried it, can any of you comment on its effectiveness? Olivier Grisel mentioned pyquery to me, it is an lxml parser which lets you make jquery-like queries on HTML.

update I should have mentioned chardet, it detects encodings (UTF8, CP1252 etc) from raw text, very useful if you’re trying to figure out the encoding for a collection of bytes off of a random data source! libextract looks like a young but nice tool for extracting text blocks from HTML/XML sources. boltons is a nice collection of bolton-tools to the standard library (e.g. timeutils, strutils, tableutils). Possibly mETL is a useful tool to think about the extract, transform and load process.

update It might also be worth noting some useful data sources from which you can extract semi-structured data, e.g. ‘tech tags’ from stackexchange‘s forums (and I also see a new hackernews dump). Here’s a big list of “awesome public datasets“.

update Peadar Coyle (@springcoil) gave a nice talk at PyConItaly 2015 on “Data Products – how to get models into production” which is related.

Camilla Montonen has just spoken on Rush Hour Dynamics, visualising London Underground behaviour. She noted graph-tool, a nice graphing/viz library I’d not seen before. Fabian has just shown me his new project, it collects NLP IPython Notebooks and lists them, it tries to extract titles or summaries (which is a gnarly sub-problem!). The AXA Data Innovation Lab have a nice talk on explaining machine learned models.

Gilles Loupe’s slides for his ML/sklearn talk on trees and boosting are online, as are Alexandre Gramfort‘s on sklearn linear models.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.
Categories: FLOSS Project Planets

Patrick Schoenfeld: Resources about writing puppet types and providers

Planet Debian - Sun, 2015-04-19 06:52

When doing a lot of devops stuff with Puppet, you might get to a point, where the existing types are not enough. That point is usually reached, when a task at hand becomes extraordinary complex when trying to achieve it with the Puppet DSL. One example of such a case could be if you need to interact with a system binary a lot. In this case, writing your own puppet type might be handy.

Now where to start, if you want to write your own type?

Overview: modeling and providing types

First thing that you should know about puppet types (if you do not already): a puppet resource type consists of a type and one or more providers.

The type is a model of the resource and describes which properties (e.g. the uid of a user resource) and parameters (like the managehome parameter) a resource has. It's a good idea to start with a rough idea of what properties you'll be manage with your resource and what values they will accept, since the type also does the job of validation.

What actually needs to be done on the target system is what the provider is up to. There can be different providers for different implementations (e.g. a native ruby implementation or an implementation using a certain utility), different operating systems and other conditions.

A combination of a type and a matching provider is what forms a (custom) resource type.

Resources

Next I'll show you some resources about puppet provider development, that I found useful:

Official documentation:

Actually types and resources is quiet well documented in the official documentation, although it might not get to much in the details:


Blog posts:
A hands-on tutorial in multiple parts with good explanations are the blog posts by Gary Larizza:

Books:
The probably most complete information, including explanations of the puppet resource model and it's resource abstraction layer (RAL), can be found in the book Puppet Types and providers by Dan Bode and Nan Liu.

The puppet source:
Last but not least, it's always worth a peek at how others did it. The puppet source contains all providers of the official puppet release, as well as the base libraries for puppet types and providers with their api documentation: https://github.com/puppetlabs/puppet/

Categories: FLOSS Project Planets

Yasoob Khalid: Nifty Python tricks

Planet Python - Sun, 2015-04-19 06:31

Hi there folks. It’s been a long time since I last published a post. I have been busy. However in this post I am going to share some really informative tips and tricks which you might not have known about. So without wasting any time lets get straight to them:

Enumerate

Instead of doing:

i = 0 for item in iterable: print i, item i += 1

We can do:

for i, item in enumerate(iterable):     print i, item

Enumerate can also take a second argument. Here is an example:

>>> list(enumerate('abc')) [(0, 'a'), (1, 'b'), (2, 'c')] >>> list(enumerate('abc', 1)) [(1, 'a'), (2, 'b'), (3, 'c')]

Dict/Set comprehensions

You might know about list comprehensions but you might not be aware of dict/set comprehensions. They are simple to use and just as effective. Here is an example:

my_dict = {i: i * i for i in xrange(100)} my_set = {i * 15 for i in xrange(100)} # There is only a difference of ':' in both

Forcing float division:

If we divide whole numbers Python gives us the result as a whole number even if the result was a float. In order to circumvent this issue we have to do something like this:

result = 1.0/2

But there is another way to solve this problem which even I wasn’t aware of. You can do:

from __future__ import division result = 1/2 # print(result) # 0.5

Voila! Now you don’t need to append .0 in order to get an accurate answer. Do note that this trick is for Python 2 only. In Python 3 there is no need to do the import as it handles this case by default.

Simple Server

Do you want to quickly and easily share files from a directory? You can simply do:

# Python2 python -m SimpleHTTPServer # Python 3 python3 -m http.server

This would start up a server.

Evaluating Python expressions

We all know about eval but do we all know about literal_eval? Perhaps not. You can do:

import ast my_list = ast.literal_eval(expr)

Instead of:

expr = "[1, 2, 3]" my_list = eval(expr)

I am sure that it’s something new for most of us but it has been a part of Python for a long time.

Profiling a script

You can easily profile a script by running it like this:

python -m cProfile my_script.py

Object introspection

You can inspect objects in Python by using dir(). Here is a simple example:

>>> foo = [1, 2, 3, 4] >>> dir(foo) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', ... ,  'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

Debugging scripts

You can easily set breakpoints in your script using the pdb module. Here is an example:

import pdb pdb.set_trace()

You can write pdb.set_trace() anywhere in your script and it will set a breakpoint there. Super convenient. You should also read more about pdb as it has a couple of other hidden gems as well.

Simplify if constructs 

If you have to check for several values you can easily do:

if n in [1,4,5,6]:

instead of:

if n==1 or n==4 or n==5 or n==6:

Reversing a list/string

You can quickly reverse a list by using:

>>> a = [1,2,3,4] >>> a[::-1] [4, 3, 2, 1] # This creates a new reversed list. # If you want to reverse a list in place you can do: a.reverse()

and the same can be applied to a string as well:

>>> foo = "yasoob" >>> foo[::-1] 'boosay'

Pretty print

You can print dicts and lists in a beautiful way by doing:

from pprint import pprint pprint(my_dict)

This is more effective on dicts.

Thats all for today! I hope you enjoyed this article and picked up a trick or two along the way. See you in the next article. Make sure that you follow us on Facebook and Twitter!

Do you have any comments or suggestions? You can write a comment or email me on yasoob.khld (at) gmail.com


Categories: FLOSS Project Planets

Wouter Verhelst: Youn Sun Nah 5tet: Light For The People

Planet Debian - Sun, 2015-04-19 04:25

About a decade ago, I played in the (now defunct) "Jozef Pauly ensemble", a flute choir connected to the musical academy where I was taught to play the flute. At the time, this ensemble had the habit of goin on summer trips every year; sometimes these trips were large international concert tours (like our 2001 trip to Australia), but that wasn't always the case; there have also been smaller trips, like the 2002 one to the French Ardennes.

While there, we went on a day trip to the city of Reims. As a city close to the front in the first world war, it has a museum dedicated to that subject that I remembered going to. But the fondest memory of that day was going to a park where a podium was set up, with a few stacks of fold-up chairs standing nearby. I took one and listened to the music.

That was the day when I realized that I kindof like jazz. I had come into contact with Jazz before, but it had always been something to be used as a kind of musical wallpaper; something you put on, but don't consciously listen to. Watching this woman sing, however, was a different kind of experience altogether. I'm still very fond of her rendition of "Besame Mucho".

After having listened to the concert for about two hours, they called it quits, but did tell us that there was a record which you could buy. Of course, after having enjoyed the afternoon so much, I couldn't imagine not buying it, so that happened.

Fast forward several years, in the move from my apartment above my then-office to my current apartment (just around the corner), the record got put into the wrong box, and when I unpacked things again it got lost; permanently, I thought. Since I also hadn't digitized it yet at the time, I haven't listened to it anymore in quite a while.

But that time came to an end today. The record which I thought I'd lost wasn't, it was just in a weird place, and while cleaning yesterday, I found it sitting among a bunch of old stuff that I was going to throw out. Putting on the record today made me realize again how good it really is, and I thought that I might want to see if she was still active, and if she might perhaps have made another album.

It was great to find out that not only had she made six more albums since the one I bought, she'd also become a lot more known in the Jazz world (which I must admit I don't really follow all that well), and won a number of awards.

At the time, Youn Sun Nah was just a (fairly) recent graduate from a particular Jazz school in Paris. Today, she appears to be so much more...

Categories: FLOSS Project Planets

Evolving KDE: Lehman’s Laws of Software Evolution In The Community

Planet KDE - Sat, 2015-04-18 23:51

The board of KDE eV has launched a new initiative to ensure that KDE remains awesome and relevant for the foreseeable future. Unlike previous approaches it is not a point-in-time solution, it is a continuous process of improvement. And it is a good thing. Previously, I have written/spoken a lot about the role of Brooks’ Law in the context of… Read more →

Categories: FLOSS Project Planets

LibreOffice 4.4.2 “Fresh” is available for download

LinuxPlanet - Sat, 2015-04-18 23:20
The Document Foundation announces LibreOffice 4.4.2, the second minor release of the LibreOffice 4.4 “fresh” family, with over 50 fixes over LibreOffice 4.4.0 and 4.4.1. New features introduced by the LibreOffice 4.4 family are listed on this web page: https://wiki.documentfoundation.org/ReleaseNotes/4.4. The Document Foundation suggests to deploy LibreOffice in enterprises and large organizations when backed by […]
Categories: FLOSS Project Planets

Vasudev Ram: asciiflow.com: Draw flowcharts online, in ASCII

Planet Python - Sat, 2015-04-18 22:04
By Vasudev Ram



Saw this today: asciiflow.com

asciiflow.com is a site that allows you to draw flowcharts online, on their site, using the metaphor of a drag-and-drop paint program like MS Paint, but the flowcharts are drawn entirely using ASCII characters.

I tried it out a bit. Innovative.

One point is that to save the flowchart, it requires access to your Google Drive account.

The image at the top of this page, is of a flowchart that I created with asciiflow.com. I did not use the Save feature, but instead took a screenshot and saved it as a PNG file (using MS Paint, ha ha). The flowchart shows a diagram that illustrates the concept of a UNIX command pipeline, where the standard output of a preceding program becomes the standard input of a succeeding one (in the pipeline). (How's that for using web-based and Windows software to illustrate something about UNIX? :)

For another example of the innovative use of ASCII characters, check out this post I wrote somewhat recently, about the Python library called PrettyTable, which lets you generate visually appealing tables of data, bordered and boxed by ASCII characters:

PrettyTable to PDF is pretty easy with xtopdf

Also, since we're talking about standard input and output and UNIX pipelines, these two posts may be of interest:

1) [xtopdf] PDFWriter can create PDF from standard input

(The post at the above link also has an example of eating your own dog food.)

2) Print selected text pages to PDF with Python, selpg and xtopdf on Linux

Generalizing from a fragment of code in post 1) above, I'll also note that making a Python program usable as a component of a UNIX pipeline, can, in some cases, be as simple as having something like this in your code:
import sys
# ...
for lin in sys.stdin:
lin = process(lin)
sys.stdout.write(lin)
which could be shortened to:
for lin in sys.stdin:
sys.stdout.write(process(lin))
Due to this (being able to easily make a Python program into a component of a UNIX pipeline), you can do things like this (and more):

$ foo | bar | baz

where foo may be a built-in UNIX command (a filter) or a shell script, bar may be (for example) a Perl program that leverages some powerful Perl features, and baz may be a Python program that leverages some powerful Python features, thereby leveraging the UNIX philosophy concept of writing small programs, each of which do one thing well, or in this case, leveraging the features of different languages (each of which may do some things better than others), to write individual components in those respective languages. The possibilities are limitless ...

- Enjoy.

- Vasudev Ram - Online Python and Linux training;
freelance Python programming

Dancing Bison Enterprises

Signup to hear about new software products that I create.

Posts about Python  Posts about xtopdf

Contact Page

Share |

Vasudev Ram
Categories: FLOSS Project Planets

Laura Arjona: Six months selfhosting: my userop experiences

Planet Debian - Sat, 2015-04-18 19:06

Note: In this post I mention some problems and ask questions (to myself, like “thinking aloud”). The goal is not to get answers to those questions (I suppose that I will find them soon or later in the internet, manuals and so), but to show the kind of problems and questions that arise in my selfhosting adventures, which I suppose are common to other people trying to administer a home server with some web services.

Am I an userop? Well I’m something in the middle of (GNU/Linux) user and sysadmin: I have studied computer technical engineering but most of my experience has been in helpdesk, providing support for Windows users. I’m running Debian in some LAMP boxes at work (without GUI) since 2008 or so, and in my desktops (with GUI) since 2010. I don’t code nor package, but I don’t mind trying to read code and understand it (or not). I know a bit of C, a bit of Python, of PHP, and enough Perl to open a Perl file and close it after two minutes,  understanding that it’s great, but too much for me :) I translate software, so I’m not scared to clone a repository, edit files, commit or submit a patch. I’m not scared of compiling a program (except if it’s an Android app: I try to avoid setting up the development environment just to try some translation that I made… but I built my Puma before it was the binary available for download or in F-Droid).

In conclusion, I feel more like a “GNU/Linux power user” than a “sysadmin”. Sometimes just a “user” or even a “newbie” (for example, I don’t know very well the Unix/Linux folder tree… where are the wallpapers stored? Does it depend on the desktop that I use?).

Anyway. I won’t stop my free software + free networks digital life because I don’t know many things. I bought a small server for home last September, and I wanted to try to selfhost some services, for me and for my family. I want to be a “home sysadmin” or something like that, so I joined the “userops” mailing list :)

Here you have my experiences on selfhosting/being an userop until now.

Mail

I even didn’t try to setup my mail server, because many people say it’s a pain (although nice articles were published about how to do it, for example this series in ArsTechnica) and I need a static IP which is 14€/month more to my ISP, and Gandi, the place where I rented my domain name, provides mail, and they use Debian and Roundcube, and sponsor Debian too, so I decided to trust on them.

So this is my strategy now, to try to keep mail under my control:

  • Trust my domain provider.
  • Backup my mail and keep local copies, removing sensible stuff from the server.
  • Use and spread the word about GPG encryption.
  • Try not to send photos or videos by mail, just send the link to my MediaGoblin instance (see below).
MediaGoblin

I’ve setup two MediaGoblin instances (yes, two!). I managed to do it in Debian 7 stable (I think NodeJS’ npm was not needed then), but soon later I upgraded to Jessie so now it’s even better.

I installed Nginx and PostgreSQL via apt, to use them for both instances (and probably some more software later).

One instance is public, and I use a Debian user, a PostgreSQL database, and it’s running in http://media.larjona.net
I have requested an SSL cert to Gandi but I still didn’t deployed it (lazy LArjona!!).

The other instance is private, for family photos. I didn’t know very well how much of my existing setup could reuse and how to keep both instances in case of downtimes or attack… I know more or less the concept or “chroot” but I don’t know how to deploy it in my machine. So I decided to use another Debian user, another PostgreSQL database, deploy MediaGoblin in a different folder, and create another virtual server in my Nginx to serve it. I managed to setup that virtual server to http-authenticate and to serve content via a different port, and use a self-signed SSL certificate (it’s only for family, so it does not matter). I created another (unprivileged) Debian user with a password for the nginx authentication, and gave my family the URL in the form https://mediaprivate.larjona.net:PortNumber and the user and password (mediaprivate is a string, and PortNumber is a number). I think they don’t use the instance too much, but at least I upload photos there from time to time and email the link instead of emailing the photos themselves (they don’t use GPG either…).

Upgrades

I upgraded MediaGoblin from 0.7.1 to 0.8.0 successfully, I sent a report about how I did it to the mailing list. First I upgraded the public instance, when I figure out the process, I upgraded the second instance to test my instructions, and then, I sent the report with the instructions to the mailing list.

Static site and LimeSurvey: the power of free software (with instructions)

I wanted to act as a mirror of floss2013.libresoft.es and surveys.libresoft.es since they suffer a downtime and I participated in that project (not in the sysadmin part, but in the research and content creation).

The static site floss2013.libresoft.es offered a zip with the whole website tree (since the website was licensed as AGPL), and I had access to the git repo holding the development copy of the website. So I just cloned the repository and setup another nginx virtual server in my machine, and tuned my DNS zone in Gandi website to serve floss2013.larjona.net from home. 10 minutes setup YAY! #inGitWeTrust #FreeSoftwareFTW :)

For surveys.larjona.net I had to install a LimeSurvey instance. I knew how to do it because we use LimeSurvey at work, but at home I had Nginx instead of Apache, and PostgreSQL instead of MySQL. And no PHP… I searched about how to install PHP in Nginx (I can use apt-get, nice!) and how to install LimeSurvey with Nginx and PostgreSQL (I had documentation about that, so I followed, and it worked).

For making available the data (one survey and its results, so people can login as visitor to query and get statistics), I downloaded the LimeSurvey export dataset that we were providing in the static website, followed the replication instructions (hey, I wrote them!), and they worked #oleole! (And here, dear researchers, gets demonstrated that free software and free culture really empower your research and help spreading your results).

Etherpad: not so easy, it seems!

I’m trying to install Etherpad-Lite, but I’m suffering a bit. I think I did everything ok according to some guides but I get “Bad Gateway” and these kind of errors when trying to browse with Lynx in the host:

[error] 3615#0: *24 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: pad.larjona.net, request: "GET / HTTP/1.0", upstream: "http://127.0.0.1:9001/", host: "pad.larjona.net" 2015/04/17 20:52:56 [error] 3615#0: *24 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: pad.larjona.net, request: "GET / HTTP/1.0", upstream: "http://[::1]:9001/", host: "pad.larjona.net"

I’m not sure if I need to open some port in iptables, my router, or change my nginx configuration because the guides assume you’re only serving one website in the port 80 (and I have several of them, now…), or what… I’ve spent three chunks of time (maybe ~2h each?) on this, in different days, and couldn’t figure it out, so I decided to round-robin in my TODO list.

Userops thoughts Debian brings peace of mind (for me)

On one side, maintaining a Debian box it’s quite easy, and the more software that it’s packaged, the less time that I spend installing or upgrading. I like being in stable, I’m in Jessie now (I migrated when it was frozen), but I’ll stay in stable as much as I can.

I like that I can use the software that I installed via apt-get for several services (nginx, PostgreSQL…). About the software that is not packaged (MediaGoblin, LimeSurvey, EtherPad, maybe others later), I wonder how dependencies and updates are handled. And maybe (probably) I have installed some components several times, one for each service (this sounds like a Windows box #grr).

For example MediaGoblin uses PyPump. PyPump 0.5 is packaged in Debian Jessie. MediaGoblin uses PyPump 0.7+. What if PyPump 0.7+ gets, let’s say, into Jessie-backports? Can I benefit from that?

I know that MediaGoblin upgrade instructions includes upgrading the dependencies, but what about some security patch in one dependency? Should I upgrade the pip modules periodically? How to know if some upgrade is recommended because patches a vulnerability, or it’s just new features (and maybe breaking my setup)?

This kind of things are the “peace of mind” that Debian packaging brings to me: when some piece of software is packaged, I know maybe I need to care about proper setup and configuration, but later, it’s kind-of-easy to maintain (since the Debian maintainers care about the rest). I don’t mind about cloning a repo and compiling, I mind about later, or coexistance with other program/services. I trust in the MediaGoblin community and I’m an active member (I’m not developer, but hang on IRC, follow the mailing list, etc) but for example I don’t know anything about the EtherPad project. And I don’t feel like joining the community (I’m already an active member in Debian, MediaGoblin, F-Droid, Pump.io, translator of LimeSurvey and many other small apps that I use, and in the future will use more services, like OwnCloud, XMPP…), joining the community of each software that I use is becoming not sustainable :s

Free software is more than software

I follow the userop mailing list, and it’s becoming very technical. I mostly understand the problems (which are similar to the problems that I face: how to isolate different services, how to easily-configure them, how to make them installable by average user…) But I don’t understand most of the solutions proposed, and I think that probably we need technical solutions, but in the meanwhile, some issues can be addressed not with software, but with other means: good documentation, community support, translations, beta-testers…

This is my conclusion until now. When a project is well documented, I think I can find my way to selfhost no matter if the software is packaged (or “contained”) or not. MediaGoblin, and LimeSurvey, are well documented, and the user support channels are very responsive.

I find lots of instructions that assume that you will use a whole machine for their service (and not for other things). And lots of documentation for the LAMP stack, but not for Nginx + PostgreSQL and Node instead of PHP… So, for each “particularity” of my setup, I search the internet and try to pick good sources to help me do what I wanted to do.

I’m kind of privileged

Some elements, not software related, to take into account as “pre-requisites for succeed” selfhosting services:

  • I knew what to search.
  • I knew which sites to visit from the results (arch wiki, debian wiki, stack overflow, etc: some of them were not the Top1 in the results).
  • I had time to read several sources and make my mind about what to do and how.
  • I can read, understand, and write in English.
  • I have no fear about my broken English.
  • I have no impostor syndrome.
  • I felt welcome in the FLOSS communities where I hanged out.

These aspects are not present in a lot of people. If I look around to the “computer users” that I know (mostly Windows+Android, some GNU/Linux users, some Mac OSX users, some iOS users), I find that they search things like “X does not work” or they cannot write a proper search query in English. Or they trust some random person writing a recipe in their blog, without trying first to understand what the recipe does. Other people just say “I’m not a professional sysadmin, I’ll just do what «everybody» does (aka use Google services or whatever). What if I try and I don’t succeed?”. Things like that.

We may need some technical solutions (and hackers are thinking about that, and working on that). But I feel that we need (more) a huge group of beta-testers, dogfooding people, aventurers that try the half-cooked solutions and provide successful and unsuccessful experiences, to guide the research and make software technologies advance. I’m not sure if I am an userop, but I feel part of that “vanguard force”, I want to be part of the future of free software and free networks.

Comments?

You can comment about this post on this pump.io thread.


Filed under: My experiences and opinion Tagged: Communities, Contributing to libre software, Debian, Developer motivations, English, free networks, Free Software, Freedom, innovation, MediaGoblin, Moving into free software, Project Management, selfhosting, sysadmin
Categories: FLOSS Project Planets

Catalin George Festila: Upgrading all packages with pip using python script.

Planet Python - Sat, 2015-04-18 19:00
Can be done under Windows OS with this python script :
C:\Python27>python.exe
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> from subprocess import call
>>>
>>> for dist in pip.get_installed_distributions():
... call("pip install --upgrade " + dist.project_name, shell=True)
...
You are using pip version 6.0.8, however version 6.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already up-to-date: appdirs in c:\python34\lib\site-packages
0
You are using pip version 6.0.8, however version 6.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting beautifulsoup4
Using cached beautifulsoup4-4.3.2.tar.gz
...If you use Linux OS shell then you can use this :
pip freeze --local | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip install -U
Categories: FLOSS Project Planets

ABlog for Sphinx: Watch Yourself Blogging

Planet Python - Sat, 2015-04-18 19:00

Wouldn’t you like your blog being rebuilt and served to you automatically as you are blogging on a sunny Sunday afternoon? It’s now possible with the improved ablog serve command.

First, you need to install Watchdog Python package, e.g. pip install watchdog. Then, you need to run ablog serve -r. Regardless of the weather being sunny or the day of the week, your project will be rebuilt when you change a page or add a new one. This won’t refresh your browser page though. Unless you want to hit refresh once in a while, you can easily find an auto refresher extension for you browser.

Categories: FLOSS Project Planets
Syndicate content