Older blog entries for mako (starting at number 364)

30 Dec 2014 (updated 10 Feb 2015 at 19:04 UTC) »

Consider the Redirect

In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia’s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.

Over the years, I’ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!

I’ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time — where any outcome is invisible — you understand what it means to be a Wikipedian.

That said, wiki researchers rarely take redirects into account. For years, I’ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven’t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.

It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.

Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don’t. After all, half of articles are really just redirects and, and because they are just redirects, these “articles” are almost never edited.

Another puzzling finding that’s been reported in a few places — and that I repeated myself several times — is that edits and views are surprisingly uncorrelated. I’ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.

We’ve published our code and data and the article itself is online because we paid the ACM’s open access fee to ransom the article.

Syndicated 2014-12-30 03:05:38 (Updated 2015-02-10 18:32:35) from copyrighteous

27 Dec 2014 »

My Government Portrait

A friend recently commented on my rather unusual portrait on my (out of date) page on the Berkman website. Here’s the story.

I joined Berkman as a fellow with a fantastic class of fellows that included, among many other incredibly accomplished people, Vivek Kundra: first Chief Information Officer of the United States. At Berkman, all the fellows are all asked for photos and Vivek apparently sent in his official government portrait.

You are probably familiar with the genre. In the US at least, official government portraits are mostly pictures of men in dark suits, light shirts, and red or blue ties with flags draped blurrily in the background.

Not unaware of the fact that Vivek sat right below me on the alphabetically sorted Berkman fellows page, a small group that included Paul Tagliamonte — very familiar with the genre from his work with government photos in Open States — decided to create a government portrait of me using the only flag we had on hand late one night.

The result — shown in the screenshot above and in the WayBack Machine — was almost entirely unnoticed (at least to my knowledge) but was hopefully appreciated by those who did see it.

Syndicated 2014-12-27 23:01:56 (Updated 2014-12-27 23:08:43) from copyrighteous

24 Dec 2014 »

Images of Japan

Going through some photos, I was able to revisit some of the more memorable moments of my trip to Japan earlier this year.

For example, the time I visited Genkai Quasi National Park a beautiful spot in Fukuoka that had a strong resemblance to, but may not actually have been, a national park.

There was the time that I saw a “Saw a curry fault bread.”

And a shrine one could pray at in a barcalounger.

There was the also the fact that we had record snowfall while in Tokyo which left the cities drainage system in a rather unhappy state.

Syndicated 2014-12-24 01:05:35 (Updated 2014-12-24 01:11:44) from copyrighteous

19 Oct 2014 (updated 10 Feb 2015 at 19:04 UTC) »

Another Round of Community Data Science Workshops in Seattle

Pictures from the CDSW sessions in Spring 2014

I am helping coordinate three and a half day-long workshops in November for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, free and open source software, Twitter, civic media, etc. This will be a new and improved version of the workshops run successfully earlier this year.

The workshops are for people with no previous programming experience and will be free of charge and open to anyone.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
Who are the most active or influential users of a particular Twitter hashtag?
Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here before October 30th. We were heavily oversubscribed last time so registering may help.

If you already know how to program in Python, it would be really awesome if you would volunteer as a mentor! Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. If you’re interested, send me an email.

Syndicated 2014-10-19 01:19:52 (Updated 2015-02-10 18:33:09) from copyrighteous

28 Sep 2014 (updated 10 Feb 2015 at 19:04 UTC) »

Community Data Science Workshops Post-Mortem

Earlier this year, I helped plan and run the Community Data Science Workshops: a series of three (and a half) day-long workshops designed to help people learn basic programming and tools for data science tools in order to ask and answer questions about online communities like Wikipedia and Twitter. You can read our initial announcement for more about the vision.

The workshops were organized by myself, Jonathan Morgan from the Wikimedia Foundation, long-time Software Carpentry teacher Tommy Guy, and a group of 15 volunteer “mentors” who taught project-based afternoon sessions and worked one-on-one with more than 50 participants. With overwhelming interest, we were ultimately constrained by the number of mentors who volunteered. Unfortunately, this meant that we had to turn away most of the people who applied. Although it was not emphasized in recruiting or used as a selection criteria, a majority of the participants were women.

The workshops were all free of charge and sponsored by the UW Department of Communication, who provided space, and the eScience Institute, who provided food.

The curriculum for all four session session is online:

Friday April 4th: Setup and Programming Practice
Saturday April 5th: Introduction to Python
Saturday May 3rd: Building data sets using web APIs
Saturday May 31st: Data analysis and visualization

The workshops were designed for people with no previous programming experience. Although most our participants were from the University of Washington, we had non-UW participants from as far away as Vancouver, BC.

Feedback we collected suggests that the sessions were a huge success, that participants learned enormously, and that the workshops filled a real need in the Seattle community. Between workshops, participants organized meet-ups to practice their programming skills.

Most excitingly, just as we based our curriculum for the first session on the Boston Python Workshop’s, others have been building off our curriculum. Elana Hashman, who was a mentor at the CDSW, is coordinating a set of Python Workshops for Beginners with a group at the University of Waterloo and with sponsorship from the Python Software Foundation using curriculum based on ours. I also know of two university classes that are tentatively being planned around the curriculum.

Because a growing number of groups have been contacting us about running their own events based on the CDSW — and because we are currently making plans to run another round of workshops in Seattle late this fall — I coordinated with a number of other mentors to go over participant feedback and to put together a long write-up of our reflections in the form of a post-mortem. Although our emphasis is on things we might do differently, we provide a broad range of information that might be useful to people running a CDSW (e.g., our budget). Please let me know if you are planning to run an event so we can coordinate going forward.

Syndicated 2014-09-28 05:02:19 (Updated 2015-02-10 18:31:50) from copyrighteous

19 May 2014 (updated 23 Aug 2014 at 22:05 UTC) »

Installing GNU/Linux on a 2014 Lenovo Thinkpad X1 Carbon

I recently bought a new Lenovo X1 Carbon. It is the new second-generation, type “20A7″ laptop, based on Intel’s Haswell microarchiteture with the adaptive keyboard. It is the version released in 2014. I also ordered the Thinkpad OneLink Dock which I have returned for the OneLink Pro Dock which I have not yet received.

The system is still very new, challenging, and different, but seems to support GNU/Linux reasonably well if you are willing to run a bleeding edge version and/or patch your kernel and if you are not afraid to spend an afternoon or two tweaking things. What follows are my installation notes for Debian testing (jessie) when I installed it in early May 2014. My general impressions about the laptop as a GNU/Linux system — and overall — are at the end of this write-up.

System Description

The X1 Carbon I ordered included the 512GB SSD, the 14.0 inch WQHD (2560×1440) 260 nit touchscreen, and the maximum 8GB of memory. I believe the rest is not particularly negotiable but includes a 720p HD Camera, a 45.2Wh battery, and an Intel Dual Band Wireless 7260AC with Bluetooth 4.0.

For those that are curious Here is the output of lspci on the system:

00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:14.0 USB controller: Intel Corporation Lynx Point-LP USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation Lynx Point-LP HECI #0 (rev 04)
00:16.3 Serial controller: Intel Corporation Lynx Point-LP HECI KT (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM (rev 04)
00:1b.0 Audio device: Intel Corporation Lynx Point-LP HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 6 (rev e4)
00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4)
00:1d.0 USB controller: Intel Corporation Lynx Point-LP USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Lynx Point-LP LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Lynx Point-LP SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation Lynx Point-LP SMBus Controller (rev 04)

BIOS/Firmware

The BIOS firmware is non-free and proprietary as it the case with all ThinkPads and nearly all laptops. According to this thread there is a bug in the default BIOS that means that suspend to RAM is broken in GNU/Linux.

You can get updated BIOS at the Lenovo’s ThinkPad X1 Carbon (Type 20A7, 20A8) Drivers and software page by looking in the the “BIOS” section. Honestly, the easiest approach is probably to download the Windows BIOS Update utility (documentation is here) which you can use to run the BIOS update from within Windows before you install GNU/Linux.

If that’s not an option (e.g., if you’ve already installed GNU/Linux) the best method is to download the bootable CD ISO from the same page. Of course, since the X1 Carbon has no optical media, you have to find another way to boot the CD image. I struggled to get the ISO to boot from USB using the usually reliable dd method. This message suggest that the issue had to do with the El Torito wrapper:

“I had to dump the eltorito image from the ISO they provide, after that I was able to dd the resulting image to a flash drive and the bios update went well, no cdrom needed.”

I updated to version 1.13 of the BIOS which fixes the suspend/resume bug. By the time you read this, there may be newer versions that fix other things so check the Lenovo website.

Installing Debian

I installed Debian testing using the March 19, 2014 “Alpha 1″ release of the Debian Installer for Jessie (currently testing). I installed in graphical mode. With the WQHD screen, everything was extremely tiny but it worked flawlessly.

I downloaded the amd64 net install image from the normal place and installed the rest of the system using the built-in Ethernet port which required no firmware or extra drivers. I did the normal dd if=FILENAME.iso of=/dev/sdX method of getting the installer onto the a USB stick to boot. I turned off restricted boot in BIOS first. In general, the latest version of the Debian installation guide is always a good source of guidance on installing Debian.

I used the Debian installer wizard to partition and selected “Use entire disk and partition it for LVM and encrypted data” which kept the UEFI partitions around. The system installed with no errors or issues and booted up normally afterward. The grub menu is hilariously narrow on the WQHD screen.

If you want to use the built-in wireless and/or Bluetooth, you will need to install the non-free iwlwifi firmware package. It is very lame that we still have to do this to use hardware we have purchased.

What Works and Doesn’t

The following stuff works the first time I booted into the GNOME 3 desktop and logged in:

The WQHD 2560×1440 screen
The touchscreen
Both the TrackPoint and the touchpad
Built-in e1000e Ethernet using the dongle
The keyboard plus the “adaptive” row of F1-F12 keys.
External monitor using the full HDMI or mini-DisplayPort connectors
Audio (both speakers and microphone)
The camera/webcam

The following stuff works if you install non-free firmware:

Internal Wireless
Bluetooth 4.0

The following stuff works with qualifications:

Suspend to RAM — Works once you have updated the firmware.
The adaptive keyboard — The F1-F12 keys work but the “button” that theoretically lets you switch to different sets of function buttons (e.g., volume, brightness) does nothing.
Disabling the touchpad — There is a BIOS option to disable the touchpad. It works in Windows and does nothing at all in GNU/Linux.

I have not tried:

The fingerprint reader

OneLink Dock

I also ordered the OneLink Dock. It’s pretty cool and plugs into the side of the laptop with a wide plug that includes the the standard power plug inside a big connector for everything else. Everything on the dock worked out of the box including the:

HDMI port
USB hub
USB audio device in the dock
USB Ethernet device in the dock

One thing to note. The HDMI output on the standard OneLink has a maximum resolution of 1920×1080 which is not stated on its material on Lenovo’s website. Moreover, if you plug in the OneLink cable, the laptop HDMI port is disabled. This means if you plug in the dock, you simply cannot drive an external monitor over HDMI at anything higher than 1920×1080. Instead, you will need to find a mini-DisplayPort to DisplayPort cord or adapter which will work. Of course, this will also mean that it’s two connectors instead of one.

Alternatively, you can buy the OneLink Pro dock which apparently works with higher resolutions over its DisplayPort connector. I have exchanged docks but have not received the Pro version so I cannot verify this.

Disabling the touchpad

As a long-term ThinkPad user, I love the TrackPoint pointing stick. If you plan on using this, the built-in touchpad is incredibly aggravating because it is very easy to brush against it while using the TrackPoint.

In BIOS, there is an option to disable the touchpad. Although this works in Windows, it does absolutely nothing in GNU/Linux. Part of the issue is that, unlike the older X1 Carbon and other ThinkPads, there are no TrackPoint buttons. Instead of buttons, there are regions at the top of the touchpad which are configured, in software, to act like buttons. If you want to be able to click, the touchpad can never be truly turned off.

This is not problem unique to the Haswell X1 Carbon and a number of people have been struggling with this issue on other Lenovo laptops. Essentially, what you need to do is configure your touchpad so that the buttons are where you want them and so that it ignores any input for the purposes of cursor movement.

There are a few ways of doing this but this answer from an askubuntu.com question has the solution I ended up using:

Open file /etc/X11/xorg.conf.

Add a section “InputClass” with identifier “Default clickpad buttons”.

Create an option for SoftButtonAreas to values 70% 0 1 42% 36% 70% 1 42%, this is size of the right and middle button.

Enable option AreaBottomEdge and change value to 1, this will disable touchpad movement.

If everything done right, your class should looks like:

Section "InputClass"
     Identifier "Default clickpad buttons"
     MatchDriver "synaptics"
     Option "SoftButtonAreas" "70% 0 1 42% 36% 70% 1 42%"
     Option "AreaBottomEdge" "1"
EndSection

Essentially, the first Option line will create a middle button that is 36% of the width and 42% of the height, and a right button that is 34% of the width and 42% of the height. The synaptics manpage (man synpatics) will give you more detail on the general way this works.

Fixing the Adaptive Keyboard

The most wild feature of the laptop is the adaptive keyboard strip. The strip is a back-lit LCD that looks almost like E Ink screen and acts as a touchscreen keyboard. The default mode gives you the F1-F12 keys. If you “press” the keys (since they aren’t buttons, you just put your finger on top of them) they act like normal F-keys. You can Ctrl-Alt-F1, etc., to switch to virtual terminals out of the box. There are four modes: “Function” (i.e., normal F-keys), Home, Web, and Chat. The last three overlap quite a bit (e.g., they all have brightness and volume). You can play with an example on the Lenovo homepage.

In Windows, switching programs will apparently change these “keys” so that an appropriate set of buttons is shown for the application you are using. You can also change these keys manually with a big “Fn” button at the far left of the adaptive keyboard strip.

As I write this this, released kernels do not support the adaptive keyboard Fn button which means you cannot use anything other than the F-keys out of the box. I believe it also means that resuming from suspend to RAM breaks these keys.

That said, Shuduo Sang from Canonical has released several versions of a patch to to the thinkpad_acpi kernel module which adds support for the Home mode. The other modes (web and chat) do not seem to be supported. The latest version of the patch is on on the Linux Kernel Mailing List and the relevant commits are:

330947b save and restore adaptive keyboard mode for suspend and,resume
3a9d20b support Thinkpad X1 Carbon 2nd generation's adaptive keyboard

Although this is not supported in Debian testing at the time of writing, a bug was filed in Debian and quickly fixed by Ben Hutchings in Debian kernel version 3.14.2-1 which is currently in sid/unstable. As a result, if you install the latest version kernel from Debian unstable (3.14.2-1 or later), the adaptive keyboard just works.

If you aren’t using Debian and if kernel you are using does not have support, you might be patching your kernel.

Ethernet in the OneLink Dock

This was not an issue using the latest kernel in Debian but apparently people have struggled with getting the USB Ethernet device in the OneLink dock to work. For example, this bug suggests:

Many new Thinkpad laptops have a dock (Thinkpad OneLink Dock) containing a usb ethernet chip that is supported by the ax88179 driver. However its USB ID is not included in the driver shipped with the 3.13 kernel used in Trusty. A patch to add this ID has been sent to the LKML (see https://lkml.org/lkml/2014/2/24/649 ) and it would be very convenient for all users of the dock if it could be applied to the Trusty kernel.

If your kernel does not support the USB Ethernet device in the dock, and a newer kernel doesn’t fix it, the patch is straightforward.

General Impressions

As I have described in my interview with The Setup, I have been a user of ThinkPad X-series laptops for many years. This is my sixth X-series ThinkPad.

Overall, I quite like the hardware! Once things mature a little bit, I think that this will be a great laptop for running GNU/Linux. That said, I ordered the laptop without realizing that the X1 Carbon had gone through a major revision! The keyboard was quite a suprise. I think that changing a system so radically without changing the model name/number is a very bad move on Lenovo’s part.

There are two remaining issues with the system I’m still struggling with: (1) the keyboard layout is freaky and weird, and (2) the super high resolution screen breaks many things.

The quality of the keyboard itself is great and worthy of the ThinkPad name. That said, there are two ways in which it is strange. The first is the adaptive keyboard strip. Overall, it works surprisingly well and I think it is a clever idea. My sense is that the strip is more annoying in Windows because it changes out from under you all the time. In GNU/Linux, only manual changing of modes is supported. This, in my opinion, is a feature. I do miss the real feedback you get from pressing keys but for F-keys and volume-keys that I don’t use often this isn’t too important. On the downside, I have realized several times that I had been holding down a “button” for several seconds and not noticed.

The more annoying issue with the keyboard is the way that the other keys have moved around. Getting rid of the CapsLock is wonderful! How has this taken so long? Replacing it with a split Home and End keys is nuts. I’ve remapped the Home and End to put Control back where it should be. My right Control to now Home but I still don’t have an End key. The split Backspace and Delete is not a problem for me. The tilde/apostrophe is in a very bad place. There is no Insert, Print Screen/SysRq, Scroll Lock, Pause/Break or NumLock. They are all just gone. Surprisingly, I haven’t missed any of them.

The second issue is the 2560×1440 resolution on the 14 inch screen. I use a 27 inch external monitor with the same native resolution laptop but, by my arithmetic, the pixel density on the laptop is 210 DPI instead 109 DPI on the external monitor. The result is “the scaling problem” and it’s a huge pain that seems mostly unsolved on any operating system.

Fonts and widgets that look good on the laptop look huge on my external monitor. Stuff that looks good on my external monitor looks minuscule on the laptop. I routinely move windows between my laptop screen and my large monitor. Until I find a display system that can handle this kind of scaling effectively, this requires changing font size and zooming all the time. At the moment, I’m shrinking and expanding my font size using the built in hot keys in Emacs, Gnome Terminal, and Firefox/Iceweasel. I love the high resolution screen but the current situation is crazy-making.

Finally, this setup will not get you into the Church of Emacs and it’s not about to find its way onto the FSF’s list of endorsed hardware. For one, I paid the Windows tax. Beyond that, there is the non-free BIOS and the need for non-free firmware to use the wireless and Bluetooth. This is standard for ThinkPads but it isn’t getting any easier to swallow. There are alternatives in the form of Gluglug’s X60 laptops running CoreBoot, Lemote Yeelong laptops, Bunnie Huang’s Novena and others that are better in these regards. I am very excited for these projects but, for a number of reasons, these just weren’t an option for the laptop I use for my research computing.

Update: I’ve changed he configuration option for the synaptics touchpad to match what I’m now actually doing.

Syndicated 2014-05-18 22:58:16 (Updated 2014-08-23 21:39:22) from copyrighteous

12 May 2014 (updated 24 Dec 2014 at 01:03 UTC) »

Google Has Most of My Email Because It Has All of Yours

Republished by Slate. Translations available in French (Français), Spanish (Español), Chinese (中文)

For almost 15 years, I have run my own email server which I use for all of my non-work correspondence. I do so to keep autonomy, control, and privacy over my email and so that no big company has copies of all of my personal email.

A few years ago, I was surprised to find out that my friend Peter Eckersley — a very privacy conscious person who is Technology Projects Director at the EFF — used Gmail. I asked him why he would willingly give Google copies of all his email. Peter pointed out that if all of your friends use Gmail, Google has your email anyway. Any time I email somebody who uses Gmail — and anytime they email me — Google has that email.

Since our conversation, I have often wondered just how much of my email Google really has. This weekend, I wrote a small program to go through all the email I have kept in my personal inbox since April 2004 (when Gmail was started) to find out.

One challenge with answering the question is that many people, like Peter, use Gmail to read, compose, and send email but they configure Gmail to send email from a non-gmail.com “From” address. To catch these, my program looks through each message’s headers that record which computers handled the message on its way to my server and to pick out messages that have traveled through google.com, gmail.com, or googlemail.com. Although I usually filter them, my personal mailbox contains emails sent through a number of mailing lists. Since these mailing lists often “hide” the true provenance of a message, I exclude all messages that are marked as coming from lists using the (usually invisible) “Precedence” header.

The following graph shows the numbers of emails in my personal inbox each week in red and the subset from Google in blue. Because the number of emails I receive week-to-week tends to vary quite a bit, I’ve included a LOESS “smoother” which shows a moving average over several weeks.

From eyeballing the graph, the answer to seems to be that, although it varies, about a third of the email in my inbox comes from Google!

Keep in mind that this is all of my personal email and includes automatic and computer generated mail from banks and retailers, etc. Although it is true that Google doesn’t have these messages, it suggests that the proportion of my truly “personal” email that comes via Google is probably much higher.

I would also like to know how much of the email I send goes to Google. I can do this by looking at emails in my inbox that I have replied to. This works if I am willing to assume that if I reply to an email sent from Google, it ends up back at Google. In some ways, doing this addresses the problem with the emails from retailers and banks since I am very unlikely to reply to those emails. In this sense, it also reflects a measure of more truly personal email.

I’ve broken down the proportions of emails I received that come from Google in the graph below for all email (top) and for emails I have replied to (bottom). In the graphs, the size of the dots represents the total number of emails counted to make that proportion. Once again, I’ve included the LOESS moving average.

The answer is surprisingly large. Despite the fact that I spend hundreds of dollars a year and hours of work to host my own email server, Google has about half of my personal email! Last year, Google delivered 57% of the emails in my inbox that I replied to. They have delivered more than a third of all the email I’ve replied to every year since 2006 and more than half since 2010. On the upside, there is some indication that the proportion is going down. So far this year, only 51% of the emails I’ve replied to arrived from Google.

The numbers are higher than I imagined and reflect somewhat depressing news. They show how it’s complicated to think about privacy and autonomy for communication between parties. I’m not sure what to do except encourage others to consider, in the wake of the Snowden revelations and everything else, whether you really want Google to have all your email. And half of mine.

If you want to run the analysis on your own, you’re welcome to the Python and R code I used to produce the numbers and graphs.

Syndicated 2014-05-12 02:11:02 (Updated 2014-12-24 00:47:41) from copyrighteous

16 Mar 2014 (updated 10 Feb 2015 at 19:04 UTC) »

Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media.

The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities. The workshops will all be free of charge and open to the public given availability of space.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
Who are the most active or influential users of a particular Twitter hashtag?
Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th. We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. If you’re interested, send me an email.

Syndicated 2014-03-16 18:41:20 (Updated 2015-02-10 18:32:05) from copyrighteous

8 Mar 2014 »

V-Day

My friend Noah mentioned the game VVVVVV. I was confused because I thought he was talking about the visual programming language vvvv. I went to Wikipedia to clear up my confusion but ended up on the article on VVVVV which is about the Latin phrase “vi veri universum vivus vici” meaning, “by the power of truth, I, while living, have conquered the universe”.

There is no Wikipedia article on VVVVVVV. That would be ridiculous.

Syndicated 2014-03-08 00:50:02 (Updated 2014-03-08 00:50:20) from copyrighteous

4 Feb 2014 »

Admiral Ackbar on Persian Governors

Q: The title for a governor in ancient Persia?

A: It’s satrap!

Syndicated 2014-02-03 23:45:43 (Updated 2014-02-03 22:17:16) from copyrighteous

355 older entries...