Evil hack of the week: Speeding up locale-gen

The locale-gen script can take some time, especially when trying to generate all supported locales.

Part of the problem is that it only runs the actual localedef tool for one locale at a time, so utilizing only one CPU core at a time.

So what can be done to improve the situation, preferably without touching the distributions script itself?

This is what I ended up doing when provisioning virtual machines:

# create a unique temporary directory based on current PID
mkdir /tmp/locales.$$

# now do everything else in a sub shell to avoid changing
# the current shells environment
(
  cd /tmp/locales.$$

  # create a local "localedef" wrapper script that justs
  # prints the actual command to be run
  echo 'echo; echo /usr/bin/localedef "$@"' > localedef
  chmod a+x localedef

  # make sure our wrapper script is found first
  export PATH=.:$PATH

  # just make sure that we are not going to get warnings
  # about "locale not found" while running "locale-gen"
  # the "C" locale should always be there
  export LANG=C
  export LC_ALL=C

  # now run "locale-gen", filter out the "localedef"
  # commands only (there is no way to otherwise silence
  # local-gen progress messages) and feed the printed
  # commands into GNU parallel to utilize all CPU cores
  locale-gen | egrep "^/usr" | parallel
)

# cleanup
rm -rf /tmp/locales.$$

The prerequisite to this is obviously to have the GNU parallel tool installed.

With that in place the wall clock time needed to (re-)generate all supported locales is drastically reduced, and all CPU cores can be seen working close to 100% while running this instead of just one.

Leave a Reply

Your email address will not be published. Required fields are marked *