gentoo mirror stats: master distfiles distribution.
Now for the second set of statistics. These aren't directly useful to mirrors in estimating their traffic, but instead gives a good overview of how our mirroring setup works internally, and now much traffic is involved in the fan-out stage. Distfiles are the main content moved around by this system, but it is also used for the other directories for releases, experimental and snapshots.
A very quick overview of the existing setup:
- Developer uploads new distfile directly to dev.gentoo.org.
- The master-distfiles box pulls from dev.gentoo.org hourly.
- The master-distfiles box checks every ebuild, and downloads missing distfiles from their primary URI if they do not exist. The daily distfile report is also created at this point.
- Every hour, the cluster master of ftp.osuosl.org pulls the latest content from master-distfiles. (Averages 240MB/day of traffic).
- The OSL FTP cluster master (in Oregon) pushes to it's slave locations in Atlanta and Chicago.
- All distfiles mirrors pick up their content from one of the FTP nodes - Internet2-connected hosts are directed via DNS to an Internet2-connected slave for performance.
Each of the distfiles mirrors has about 140-160MB of upstream traffic every day (including both the new files and the rsync overhead for scanning). If there are no files changed, the rsync traffic for a directory scan is 1-2MB. While this isn't a lot of traffic, it's very spiky, as mirrors tend to be on fast links.
The new weekly builds from the Release Engineering team will probably be adding another 1.3GB per week, staggered as one arch per day.
I got a small subset of the logs from the OSU FTP cluster for processing some of these statistics. They cover the 24 hour period of 2008/08/07 UTC. It does not have data of which traffic went via Internet2, and I've grouped the sources by country code (using IP::Country::Fast from CPAN).
 One Greek mirror was excluded from the traffic and counts, as this was their catchup sync with 7Gb of traffic after some hardware-related downtime.
As a bit of analysis, I think that more than half of our mirrors (Europe, Middle East, RU) would benefit from having a box to sync against in Europe.