MediaWiki now accepts out of the box RDFa and Microdata semantic markup

Semantic web

Since MediaWiki 1.16, the software has supported — as an option — RDFa and Microdata HTML semantic attributes.

This commit, integrated to the next release on MediaWiki, 1.27, will embrace more the semantic Web making these attributes always available.

If you wish to use it today, this is already available in our Git repository.

This also simplify slightly the cyclomatic complexity of our parser sanitizer code.

Microdata support will so be available on Wikipedia Thursday, 24 March 2016 and on other projects Thursday, 23 March 2016.

If you already use RDFa today on MediaWiki

First, we would be happy to get feedback, as we’re currently considering an update to RDFa  1.1 and we would like to know who is still in favour to keep RDFa 1.0.

Secondly, there is a small effort of configuration to do: open the source code of your wiki and look the <html> tag.

Copy the content of the version attribute: you should see something like like <html version=HTML+RDFa 1.0">.

Now, edit InitialiseSettings.php (or your wiki farm configuration) and set the $wgHtml5Version setting. For example here, this would be:
$wgHtml5Version="=HTML+RDFa 1.0";

For the microdata, there is nothing special to do.

 

Let’s encrypt lifts quota by domain for renewal

There is currently a limitation of how many certificates you can register per week: a quota of 5 per domain per week.

The same limitation applied for renewal, which would have forced to maintain a schedule.

This is not the case anymore: if a certificate has already been generated for a specific  FQDN, you can renew it regardless of your quota use. Thank you for Roland Bracewell Shoemaker for this change, which solves this issue.

image

Tasacora-announce

Follow-up: a BASH script to split a MySQL dump by database

In this post, we’ve seen how to split a large MySQL dump by database.

I’ve been asked a script to automate the process. Here you are.

Note: On FreeBSD, replace AWK=awk by AWK=gawk and install lang/gawk port, so we can use GNU awk.

#!/usr/bin/env bash

AWK=awk
REGEX_NAME="Current Database: \`(.*)\`"

# Checks argument and prints usage if needed
if [ "$#" -lt "1" ]
then
    echo "Usage: $0 <dump.sql>"
    exit 1
fi

# Splits dump into temporary files
$AWK '/Current Database\: .*/{g++} { print $0 > g".tmpsql" }' $1

# Renames files or appends to existing one (to handle views)
for f in *.tmpsql
do
    DATABASE_LINE=`head -n1 $f`
    [[ $DATABASE_LINE =~ $REGEX_NAME ]]
    TARGET_FILE="${BASH_REMATCH[1]}.sql"
    if [ -f $TARGET_FILE ]; then
        cat $f >> $TARGET_FILE
        rm $f
    else
        mv $f ${BASH_REMATCH[1]}.sql
    fi
done

Split a large SQL dump by database

You created a MySQL backup of a large server installation with dozens of databases and wish to get the schema and data for one of them. You now have to deal with a file of hundreds of MB in a text editor. How convenient.

Split a dump into several files

You can quickly split this dump in several files (one per database) with awk or csplit. With GNU awk (gawk on FreeBSD), this is a oneliner:

awk '/Current Database\: .*/{g++} { print $0 > g".sql" }' yourdump.sql

Get database.sql files

To rename these files with actual database names, the following bash script could be useful. It assumes you don’t have the main dump in the same directory.

#!/usr/bin/env bash

regex="Current Database: \`(.*)\`"

for f in *.sql
do
    DATABASE_LINE=`head -n1 $f`
    [[ $DATABASE_LINE =~ $regex ]]
    mv $f ${BASH_REMATCH[1]}.sql
done

Chromebook: run a SSH server on Chrome OS

In this post, we’ll cover how to run a SSH server directly on Chrome OS (ie not into a Crouton chroot).

One of the first things I do on any machine (FreeBSD, Linux, Mac OS X or Windows) is to install, run and configure the SSH server. It’s always convenient to be able to scp from and to a computer, or to log in remotely. Even for workstations.

Chrome OS is a reasonable if minimal standard Linux installation offering access to iptables and sshd (and openvpn by the way), so it’s as easy to run sshd and to allow incoming traffic on port 22.

Setup

1. If it’s not already done, switch your chromebook in developer mode, so you can execute commands as root.

Do a backup of your data, as you’ll wipe your current Chrome OS partitions.

On most recent machines, restart in recovery mode (ESC + REFRESH + POWER), then when it boots, CTRL + D to enter the developer mode.

Hit enter to turn off OS verification. It will then restart. Now and everytime after, you’ll need to do a CTRL + D to boot.

It will then wipe your chromebook and reinstall a fresh Chrome OS version. The process takes 6 to 7 minutes.

Former machines require to use an hardware switch, generally located below the battery. Be gentle with this switch, it breaks easily.

2. Launch a console with the shorcut ctrl + alt + t, then write shell to open a full bash shell (if the shell command isn’t available, you aren’t in developer mode).

Become root with sudo su.

3. Setup SSH keys :

mkdir -m 0711 /mnt/stateful_partition/etc/ssh
cd /mnt/stateful_partition/etc/ssh
ssh-keygen -t rsa -f ssh_host_rsa_key
ssh-keygen -t dsa -f ssh_host_dsa_key

4. Run SSH:

/usr/sbin/sshd

5. Allow world to connect to port 22:

iptables -I INPUT -p tcp --dport 22 -j ACCEPT

6. Add your public keys to ~chronos/.ssh/authorized_keys file. Authentication by password isn’t available.

7. You’re now able to log in from the world to your chromebook.

ssh chronos@yourmachine

Sources

Andrew Sutherland, cr-48 chromium os ssh server, 14 January 2011.

CentOS wiki contributors, IPTables, CentOS wiki.

December 2014 links

Some links of stuff I appreciated this month. Links to French content are in a separate post. You can also take the time machine to November 2014.

AI

What if instead to understand how the brain works, we copy the neural connections as is? This is what the OpenWorm project tries to do with C. elegans. And, big surprise, that works and allows a bot to move.

Wikipedia

An infographics of the locality of Wikipedia participants shows without any surprise they are mainly from Europe and North America.

If you’re into dumps, the Wikipedia / MediaWiki XML dump grepper will help you to find a particular piece of data, like the text of one article.

Tools

Dev / search. The silver searcher, ag, offers a faster approach than ack to search your code.

Fun / autogenerator. Some years ago, cgMusic offered an implementation on how a computer program could create music. Add some image generation techniques and a word generators, and you can have a fake music generator offering full albums. Ælfgar has stumbled upon Liquified Death by Income Yield.

GIS. Turf is a new open source JavaScript GIS library. This post explains the capabilities and features, including its great offline support.

Electronics

What if an Arduino embeds a web server and allows programmation from the web browser? This is exactly what the Photon by Spark does.

Quartz

An infographics showing satellites orbiting Earth and a point of view of the Uber economy.

Literature

The GoT series offer some comprehensive scenes of torture. Did you ask yourself their interest or need for the plot? Marie Brennan offers a great opinion in « Welcome to the Desert of the Real ».

November 2014 links

Some links of stuff I appreciated this month. Links to French content are in a separate post. You can also take the time machine to October 2014.

November is the Philae landing on the Comet Churyumov-Gerasimenko month and the ESA photo release under CC-BY-SA (one of them here) month. Mainly DevOps links in this post, a Wikidata tool and an algorithm visualisation.

Churyumov-Gerasimenko 67P, 20 November 2014
ESA/Rosetta/NAVCAM, CC-BY-SA 3.0 IGO

Dev

Craft. Jeroen de Dauw has prepared interesting slides about clean functions. Your function should do one task, not be a class disguised in procedural code.

Raft. In a distributed environment, how do you achieve a similar state? Raft is an answer to this question, as a distributed consensus algorithm.  To understand how it works, The Secret Lives of Data offers a visual guide.

Wikidata

Wikidata no labels. Harmonia Amanda and Hsarrazin wanted to find items without labels in French, respectively about the Tolkien’s Legendarium or Russians persons to translate. This tool allows you to get some Wikidata items through a WDQ query or to encode them directly, and print a table with the part of these items without label in the specified language.

DevOps

Once upon a time there were a Linux theme park. As a Cobbler / SpaceWalk alternative, we start to see new software to appear: katello/foreman. It’s a part of Katello, the upstream of Satellite 6, and a replacement for SpaceWalk. You want to dive into the Linux theme park? Build images, deploy, manage resources? You’ll be served. Thank you to jnix for these software recommendation.

And now, near the sea. ShipYard allows you to manage Docker instances and containers.

But what is more interesting is the alpha release of OpenShift Origin, the third generation of  OpenShift, with a new system design. It relies on Docker and the following technologies:

  • Kubernetes, an active controller to orchestrate and ensure the desired state of the containers;
  • An etcd server (which uses the Raft algorithm described above);

With that concepts, you’re ready for the introduction hands-on tutorial available.

The puppetmaster becomes old. Ryan Lane, formerly in Wikimedia ops team,  blogged this summer about a Puppet alternative at his new job: Moving away from Puppet: SaltStack or Ansible? For Ryan, 10K+ lines of Puppet codes is now only 1K of SaltStack or Ansible code. The winner of their test to port the Puppet infrastructure into both is SaltStack. It’s a pity, I would have loved to merge yet another fictional universe into the Nasqueron project and add the Ursula K. Guin ansible in the mix.

Sysadmin

FreeBSD 10.1. The first new version of FreeBSD after the SSL bugs is out, and will immediately be deployed on Ysul and Sirius machines as test. Bhyve can use a pure ZFS filesystem and UDP-Lite protocol is finally here.

TCL and the SSL security issues: sslv3 alert handshake failure

Update 2016-01-15: With tcl-tls 1.6.7, it works out of the box without any need to configure cyphers.

If you have reconfigured your OpenSSL to take care of the current security issues, you’ve disabled SSLv3 since POODLE discovery.

Then, you could find unexpected behavior of TCL code. The package http isn’t the best to intercept and report errors, so it could be as non descriptive as software caused connection abort. If you’re luck you’ll get the actual cause of the error sslv3 alert handshake failure.

So, without any surprise, we disabled SSLv3, code still want to use SSLv3, and… that fails:

/home/dereckson ] tclsh8.6
% package require http
2.8.8
% package require tls
1.6
% http::register https 443 ::tls::socket
443 ::tls::socket
% http::geturl https://fr.wikipedia.org/
SSL channel "sock801eacd10": error: sslv3 alert handshake failure
error reading "sock801eacd10": software caused connection abort

The solution is to explicitly request to use TLS.

% /home/dereckson ] tclsh8.6
% package require http
2.8.8
% package require tls
1.6
% tls::init -tls1 true -ssl2 false -ssl3 false
-tls1 true -ssl2 false -ssl3 false
% http::register https 443 ::tls::socket
443 ::tls::socket
% http::geturl https://fr.wikipedia.org/
::http::1
% http::cleanup ::http::1
%

In your TCL application, register once for all the https as preconfigured TLS socket sounds a good idea:

#
# HTTP support
#

package require http
package require tls
::tls::init -ssl2 false -ssl3 false -tls1 true
::http::register https 443 ::tls::socket

Thank you to rkeene from Freenode #tcl for his help to track this issue.