Installing Gentoo on a Thinkpad T42p
This is a log file of my attempt to install gentoo on a new
laptop. The information and reasoning below is not necessarily
correct, and almost certainly does not represent the most efficient
strategy for solving the problems i encountered; it's just what i did,
and is perhaps useful to someone else setting up a similar machine or
configuration. Corrections and/or clarifications are
welcomed.
Preparing the machine
Details: gentoo-2.6.7-r11 on a new Thinkpad T42p, 2373-KTU.
2GHz Pentium-M, 14.1" 1400x1050 screen, 128M ATI FireGL T2.
The Thinkpad comes with WindowsXP taking up most of the 60Gig disk, and
the IBM service partition takes the remaining 5 or 6Gig. To make
a dual boot i needed to shrink the XP partition to make room for
linux. Following advice on the web, i booted XP, let it do its
install stuff and get setup (ick, XP feels like a fisher price toy, all
big buttons and bright colours). Next step, defrag to move files closer
to the beginning of the partition. That's where it get's
difficult.
XP's defrag shows unmoveable files located in the middle and end of the
partition. Hmm, wonder if that'll be a problem [yes].
Next step is to try and shrink the partition. I'd bought
PartitionMagic several years ago for setting up my first laptop, but it
turns out my ancient version 6.0 won't touch NTFS partitions (IBM's
installer turns the XP FAT partition into NTFS as the very first thing
that happens when you first boot the machine, no options).
Upgrading PM doesn't seem worthwhile if i only get one usage of it each
time, so i look for free alternatives. The nicely bootable
Knoppix cd apparently has one, so i download and burn knoppix 3.4, boot
and run qtparted.
Alas, after all that it will only
let me shrink the 55Gig partition by 500M (see unmoveable files
earlier). Fine, XP is too much trouble then, i'll just delete the
whole partition in the install and have a linux-only machine (but keep
the IBM service partition in case for some reason i ever want to
reinstall XP). Reboot with the Gentoo universal livecd.
Installing Gentoo
The Gentoo cd boots nicely, and starts to work. First attempt at
installing ended quickly, after the livecd boot gave up on my network
as i fumbled to connect the network cable, failing to do so in
time. Although i have the universal cd & don't need network,
lack of network seems sub-optimal. But it's not clear from the
manual how or what to do get it to try again. So, reboot and
start from scratch.
Using fdisk
is old hat. Of course it can't be that
simple. The XP partition i deleted was hda1. Since i didn't want to
delete the IBM service partition, hda2
remains. Thus my boot, swap and root partitions ended up being hda1, hda3 and hda4 respectively, non-consecutive
in the partition list. fdisk
dutifully complains that the partition list isn't ordered, but the
internet
think's that won't be a problem for most programs.
I opted to install Gentoo's stage3, hoping to minimize the pain, and
besides i can rebuild things later if i'm happy with how it turns
out. Since the packages on the cd were 2 weeks old, i decided to
download everything anyway, and encountered an early problem with
mirrors. The Gentoo disk comes with mirrorselect
,
so i ran that according to the handbook
instructions, which piped the
output to the config file. This took an extremely long time, and
mysteriously produced a list of servers in Germany (i'm in
Canada). Worse, it spat out reams of html, which had to be
manually edited from the config file. The mirrors it selected did
work,
for a while, but the next morning i kept getting connection
failures. Deleting the mirrors line made it all better, and a
later
use of mirrorselect
in the install process went
perfectly, with more sane results.
At some point during the long sequence of emerges a configuration file
gets changed, and emerge tells me
that i need to do something about it. The help command it
recommends says to find out what they are i should run a particular find
command
to locate the new config files. I try that command, and in fact a
variety of permutations as well. but can find no files with the given
name pattern. Still, emerge complains at the end of every new
package i install that a config file needs updating. After a
while i managed to figure out that running etc-update
fixes that problem. The new config file was indeed named as
described, and in /etc
, in fact in /etc/fonts
;
still not sure why find
didn't find it.
Note that wireless networking isn't supported by any of the kernel
drivers i could see. Apparently it can be made to work with mad wifi. To
be investigated later, for now no wireless.
The pros and cons
for choices of syslog aren't
real clear, so i install syslog-ng
. Weirdly, rc-update
complains that
syslogkd
is already supplying logging facilities.
Guess that came as a dependency from something installed earlier
...? Found
some pages describing how to fix that too, and now syslog-ng
is happy.
Finally, a few unclear, somewhat arbitrary choices later, and indeed it
boots and seem to
be functioning as a bare system.
System Setup
First thing i notice is that my host name is the one i gave it, but the
domain name is "(none)"--- neither the domain name i specified in the
install procedure, nor anything returned from the dhcp server. I
edited /etc/conf.d/net
and changed it so the host and
domain names are
not taken from the dhcp info, rebooted, and observed no difference at
all. Liveable for now (see below).
But the main step is to to get X working. The FireGL isn't
supported by default yet, but there are separately available ATI
drivers. Following the handbook, i installed Xorg, and ran the
configure option to generate the config file. It switches briefly
to a graphic screen, and crashes with the informative error "Caught
signal 11", and no useful text after. The log files are no more
help---lots of unsatisfied links for drivers i wouldn't use anyway,
followed by some advice that the unsatisfied links may not be
responsible for it failing (may not but may be?), followed by the
signal 11 stuff. Now, the handbook seems to say that one installs
Xorg and gets that working, and afterward installs the ati-drivers, but
since that's obviously not working i
went to see what the ati-drivers involved. Confusingly, much of
the other documentation i found on getting ATI cards to work refers to
XFree86 rather than Xorg. I know there's a split based on
licensing, but i wasn't following it much---are they binary compatible,
interchangeable? What exactly is different in terms of use or
installation? I finally
decide that i'm going to pretend it's all equivalent, leave Xorg
installed and not worry
about perhaps needing to install XFree86 as well or instead.
[Seems to have been the correct choice.]
Installing the ATI drivers is trivial. But it needs the right
kernel setup which of course i did not have. Ok, rebuilding
the kernel is easy, but one option now requires knowing my AGP
chipset. I couldn't find any actual info on what the
T42p uses, so i enabled everything with ATI or Intel in the title.
The phonetically awkward fglrxconfig
function generates a new config file. Questions are easy at
first, and at one point it even lets me select something that looks
like my ideal setup: laptop: switch
between internal and external screens. But then it asks me
to choose or enter the horizontal and vertical frequency of the
monitor, with the caution that an incorrect choice could be
disastrous. I guess it means the internal monitor, not the
external..? None of the offered settings go up to
1400x1050. Hmm. Google turns up little, but it seems the
maximum resolutions listed may not be accurate for LCDs since they run
at lower refresh rates anyway, so i try options 5 and 3 for the
horizontal and vertical choices (almost arbitrary---seemed to work for
someone else on a related system according to a web page i failed to
record). Happily, 1400x1050 indeed shows up in the list of
available resolutions next (and higher too). I chose resolutions
of 1400x1050 1024x768 1600x1200 to be available. I thought maybe
the last one might allow easier use of an external monitor; so far when
booting to the
laptop screen and from the X log files it disables 1600x1200 so no harm
done for internal LCD i guess. I never did find IBM specs,
but i did eventually find this
document. Although it looks similar it isn't necessarily the one in
the T42p and what i have works, so i've no intention of investigating
that further. [From what i can tell it pretty much ignores most
of these settings and gets the values from the hardware or system
anyway]
The generated config file still needed manual tuning (and to be
renamed, the only difference between Xorg and XFree86 i've
encountered so far). The FontPath
listings, for
instance, were still pointing at /usr/lib/X11/fonts
instead of /usr/share/fonts
(is that just an Xorg
thing? I noticed that change from
listings shown when i manually merged the one config file that needed
updating during the earlier emerges). X now starts but just gives
me that ugly root backdrop, a functioning mouse, and absolutely nothing
else, not even the stripped down twm environment the handbook talks
about. Further down in the handbook it talks about using startx
instead of calling Xorg
directly, which seems obvious on
retrospect. Now i get an
utterly bare, but working twm session. Nicely, i notice
that not only does the track-nipple version of the mouse work, but the
touchpad too [and as i discover later plugging in an optical usb mouse
also works seamlessly].
More Applications
A lot of things need to be installed. Most work just fine, but
java gives me a few problems. I initially installed IBM's sdk
(~x86), and that goes well. But then when i go to install open
office i find open office insists on (well, strongly suggests) having
blackdown's vm. So i install that---apparently each jdk installed
will switch itself to be the default vm during install, so i don't need
to do anything else to switch vm's, and the open office install then
goes smoothly. Blackdown as a default is ok, but it turns out it
didn't switch everything. The java plugin links in thunderbird's
directories get messed up---the javaplugin_jni.so
file is
linked to the same file in blackdown's binaries, but the libjavaplugin_jni.so
is linked to IBM's binaries, and unsurprisingly thunderbird won't start
(complains about missing symbols). I correct that by switching
the libjavaplugin_jni.so
link to point to blackdown's libjavaplugin_jni.so
l
-- now thunderbird starts, though with (different) complaints about
missing symbols. Somewhere i read that thunderbird wants jdk
1.4.2 and blackdown is 1.4.1, so maybe that's why. IBM is 1.4.2
-- perhaps switching to IBM's plugin instead would make it work better,
but for now at least the web browser itself works, and i don't need
java for most of my web surfing so i'll investigate more when/if i
encounter a real problem.
I also tried to
install eclipse-sdk
,
risking the confusion of using blackdown instead of IBM's own jdk for
that. The ebuild works fine, but oddly after it's done i don't
have an eclipse
function that i can find, and i can't figure out how to launch it.
I
have done a couple of user-level installs of eclipse, never a system
install, so perhaps there's a difference in how it's accessed, or maybe
i need to download more pieces of it separately? Anyway,
i also notice that it installed version 2.*. I've been using
version 3.0 on
other machines and while it spontaneously crashes it's generally quite
useable, so i unmerged the current version and tried to
install the 3.0.0-rc3 (marked) version. emerge
rejects my
command
line with some cryptic message implying a missing '=' somewhere.
Eventually i figure out that for emerge
to install a
specific version
number i need to prepend the package name with an '='
character.
Still have to finish trying that...
By initially just
installing IBM's jdk, i was hoping to postpone having to add java-config
to the learning curve, however trivial it may be. Now i have to
use it if i want to go back to IBM's vm. It's not clear if open
office still needs blackdown after the install, so to be safe i decide
to just switch to IBM's jdk for my user account, and leave the system
default
to blackdown. I tried java-config --set-user-vm=...
,
but oddly that has no effect -- blackdown is still my default. I
looked in my .bashrc,
and it does not source the .gentoo/java-env
file as the handbook suggests it should, so i add a command to do that
and try again. But it still doesn't work. There's a
file named java
in that directory too; turns out i need
to source both of them in my .bashrc,
and then IBM's jdk
works as my user default.
Customizing gnome is easy, if rather long. One bit of weirdness
shows up when i customize the panel by adding a number of things to
it. At the same time i also add the ms corefonts, which require i
restart X (or reinitialize lots of things separately), so there were
multiple potential causes. I'm not entirely sure how to restart X
without
rebooting when i use gdm
, but surely i don't need to
actually reboot. I logout & login hoping the screen
flickering as gdm
gives me a new login indicates X
restarting, and am surprised to find my panel has now completely
disappeared, both the top & the bottom ones. Kill X with
ctrl-alt-bkspace, relogin, same thing. Doesn't seem like a font
problem really, so i didn't think the corefonts installation was the
issue, and did some internet searching to see if it's a known problem,
perhaps one of the panel utilities i added has a bug or is incompatible
somehow. The panel seems to die for a variety of people in
various situations on disimilar and older systems, but in each case it
was related to adding things to the panel. The solution looks
like wiping the ~/.gnome
(and i guess ~/.gnome2
)
directories, but that also wipes out all customization, and having
spent a long time already on that i really didn't want to do
that. So, taking some inspiration from windows-xp on how to fix
things i decide to reboot the system just to be sure. Surprisingly it
works, the panel is back, and no customizations lost.
The font install
also worked. Sadly, the fonts in open office still look real
crappy, and the new fonts don't list as options there. Turns out
that there are extra steps to
getting the fonts into open office, including in my case the need
to first install spadmin
. But now it looks great.
Am i getting good graphic acceleration? On the laptop screen
(1400x1050), tuxracer
runs very smoothly. fgl_glxgears
tells me my framerate with the default window size is about
400fps. glxgears
gives me almost 1900fps, again at
its default window size. Near as i can tell that's about
what i should expect.
I have some further notes about trying to
fix X to work in the replicator and use an external monitor below.
Odds and Ends
Privoxy. The install failed, complaining that privoxy was not a known user. I
did useradd privoxy
and the install then failed
complaining that privoxy was
not a known group. groupadd privoxy
fixed that.
Wireless. This was one of
the more complex parts so far. I needed to ensure wireless-tools
was installed, as well as emerge the madwifi drivers
(masked), and wireless-config.
I also needed a /etc/init.d/net.ath0
file The Gentoo
docs say you can create a link to net.eth0
in order
to create say net.eth1
, but i wasn't certain if ath0
required any customization being wireless, so i copied the file
instead. Turns out i could've used a link. I edited the /etc/conf.d/wireless
file as best i could figure out. I also created a new
runlevel and boot option as
per the Gentoo
handbook to use /etc/init.d/net.ath0
instead of net.eth0
,
and rebooted.
The network of course failed to load, and calling /etc/init.d/net.ath0
start
directly confirmed that iwconfig
was complaining
about missing wireless extensions. The madwifi FAQ
(Sec 3.4) says that means that wireless support was not compiled into
the kernel. I checked my kernel config, and i had already
compiled it into the kernel, so that obviously wasn't it. Finally
decided that maybe i needed to install some modules for the wireless
drivers i had emerged---i'd sort of
assumed the install took care of that, but apparently not.
I
first tried modprobe ath_hal
then modprobe
ath_pci
. The dmesg
command shows a
successful installation (including wlan
, i guess that was
there already or is automatically brought in), so i added ath_hal
and ath_pci
to /etc/modules.autoload.d/kernel-2.6
,
and rebooted.
Wireless still did not work, but seemed to fail further into the
process. The messages during boot showed it scanning for a
signal, finding it, extracting the correct essid, and then failing to
associate with the access point. There were a couple more lines
of complaint, something about forcing things "incase" they were hidden,
but that looked like just the error cascade. I searched for more
info, but /var/log/messages
had nothing more, none of the
other log files in /var/log
looked appropriate, and
basically i could find no specific details. The format of the WEP
key was not so clear in the /etc/conf.d/wireless
file, so
i decided to turn off WEP at least temporarily to see if i could get to
connect. Nope, same thing. I did a number more
iterations of editing the wireless file, changing options and
restarting /etc/init.d/net.ath0
, but nothing
helped. I did figure out that i did not need the "host
roaming 2
" or "host roaming 0
" options, nor did i
need to set it to Ad-hoc
for scanning. I also added
"mode 2
" to the iwpriv_ath0
setting at some
point to ensure it uses 802.11b mode instead of 802.11a/g (modes 1 and
3), though i don't think that's actually necessary (and perhaps not
even desireable, but i don't use any 802.11a/g networks yet).
Examining the scripts to see if i could figure out either where some
form of log file was going or what substeps were involved was not
working out; there are apparently some implicit inclusions that define
functions i just couldn't find. After much searching i
found this
description of the steps involved to do manually more or less what net.ath0
was doing, except for dhcp. I tried those steps, and everything
worked flawlessly. I could ping other machines by IP
address, and even slogin to other machines. So, the network was
working, it must be that dhcp was not loading. Figuring out how
to start dhcp manually to continue looking for the problem didn't seem
like a trivial task, so i gave up temporarily and shut everything down.
When i later restarted the wireless node and then the laptop, to my
great surprise everything worked. Wireless
loaded, found my network, associated, and even dhcp worked!
Healing reboots are not a
real satisfying solution, but it's hard to argue with
success. WEP is
still disabled, but i expect that will work too, once i summon up the
interest to try that again.
Interestingly, when i tried to use wireless in another domain (at
work),
i experience exactly the same behaviour---it initially didn't work,
tried the
substeps by hand, it worked, and now works automatically. Weird.
DVD. I emerged ogle-gui
to get a dvd player. The home page for ogle suggests that /dev/dvd
should exist. I didn't have such a thing, but i did have a /dev/cdrom
--- i think i may have specified that somewhere. Anyway, i
created /dev/dvd
as a symbolic link to /dev/cdrom
and started ogle. Ogle gave an X complaint about a BadAlloc; the
FAQ says this is due to lack of memory, and indeed lowering the
resolution to 1024x768 allows it to work. Many complaints about
lost sync, but i tried with 2 different dvds and it plays them
adequately---there
are single-frame streaks across the bottom of the screen every so
often,
but it wasn't distracting.
X, Internal &
external monitors,
Port Replicator. At work i use a port replicator connected
to a
CRT, and this also gives me an external keyboard and mouse. In general
i also
wanted the external monitor to work.
X is of course working fine on the internal monitor now. Just to see
how it would fail, i booted the machine within the replicator.
Everything boots, external mouse & keyboard work nicely but when
gdb kicks in the screen goes completely blank, and the monitor turns
off. Text login screens work fine of course. I disable gdm by removing
xdm from the default runlevel so i can play with X more easily.
Ok. The replicator may be complicated, so i thought i'd try just
trying to get output to the external monitor. I plugged in the monitor
and booted. The external shows all the text stuff, but as
soon as X starts it switches to the laptop screen. Pressing Fn-F7
to toggle internal/external/both output modes used to be a necessary
step, but no longer works---well,
it crashes the system reliably, but does not improve anything.
After some investigation and lots of redone fglrxconfig
sessions, i
finally conclude that although i cannot figure out how to achieve a
switching internal/external display mode (despite the fglxrconfig
menu option that choice is never described further in any
documentation, so
i was unable to figure out how to use it, or even how it differs from
single monitor mode), perhaps "clone mode" would work nicely. I tried
that, and indeed something happens on the external monitor, but it
isn't pretty. Clone mode requires the monitors to be at the same
resolution, so the 1400x1050 internal monitor represents a maximum
resolution. Of course no sane CRT supports 1400x1050, so it backs
down to 1024x768. That would be acceptable, for some situations anyway,
except for the fact that the monitor output is squeezed into one corner
of the screen, and has colourful diagonal streaks across it. Like i
said, not pretty. Perhaps the 50Hz refresh rate of the internal LCD
is also an upper bound, and the CRT is trying its best.
Dual head mode works nicely though. I get two X servers going, both
seem to be happy and coexisting nicely at 2 totally different
resolutions. Mysteriously, my gnome toolbar panel is missing all its
customization on the external monitor, as well as the default folders
on the desktop (internal monitor view has all of these). But
everything works, and the mouse moves nicely between the 2 screens.
Unfortunately, even in the replicator with the internal monitor
inactive i have to move the mouse across the nonexistent internal
screen first before it will show up on the external monitor. Worse, i
can't transfer windows or clipboard data from one monitor to the other
this way, and i'm pretty sure that's going to get annoying.
I somehow stumbled upon someone's
XF86Config file for my monitor that had modelines for a 1400x1050
mode, reproduced here:
Section "Monitor"
Identifier "Monitor0"
VendorName "Monitor Vendor"
ModelName "ViewSonic PF790"
DisplaySize 360 270
HorizSync 30.0 - 97.0
VertRefresh 50.0 - 160.0
ModeLine "1400x1050" 129.0 1400 1464 1656 1960 1050 1051 1054 1100
+hsync +vsync
ModeLine "1400x1050" 151.0 1400 1464 1656 1960 1050 1051 1054 1100
+hsync +vsync
ModeLine "1400x1050" 162.0 1400 1464 1656 1960 1050 1051 1054 1100
+hsync +vsync
ModeLine "1400x1050" 184.0 1400 1464 1656 1960 1050 1051 1054 1100
+hsync +vsync
Option "dpms"
EndSection
I thought it might help things, and tried it while in the replicator.
To my surprise it kind of works. It looks like 1400x1050, though it
flickers
very badly. Perhaps still being constrained by that 50Hz refresh
rate. I'd switched back to experimenting in the replicator in hopes
that
having the lid closed and internal monitor off may help avoid any
constraints being imposed by the internal monitor capabilities, but
obviously
that didn't help.
I hadn't done much playing with the existing driver options in the
xorg.conf
(XF86Config
) file. I had tried a
number of other things found by perusing the web, all of which were
ignored (PanelOff
, disp_internal
and more).
I finally tried uncommenting
the NoDDC
option, and that turns out to be the magic
fix. I now have 1600x1200 at a reasonable refresh rate on the
external monitor while in the replicator. I still haven't sorted out
external monitor usage outside of the replicator, but the replicator
is much more important to my usage and dual head mode would work fine
with a regular external monitor situation anyway, so it's low priority
for the moment.
Note that at least so far the external monitor imposes a hefty
performance penalty.
Now for both fgl_glxgears
and glxgears
i
get about 50fps. Perhaps because i turned off so many things when
trying to get it to work at all.
I also discovered that the screen is not repainted
properly when i switch to a virtual terminal, and then back to X. The
screen
saver also has issues---it comes on, but occupies only a portion of the
screen,
maybe a 1024x768 block of the 1600x1200 screen. So many things
still to
fix...
Domain
name. I still have the problem with the "(none)" text for my
domain
name in the pre-login message, and empty being returned by /bin/domainname
.
Reading the /etc/init.d/domainname
script, it
looks for /etc/dnsdomainname
(yes i have that), and then
just inserts a domain line into the
/etc/resolv.conf
file based on the content of that file.
Quite simply really, so why isn't it working? Running it manually
after booting adds a domain somedomain.com line to the resolv.conf
file,
but neither /bin/domainname
nor the pre-login message is
any better.
I then tried running /bin/domainname
to set the domain
name
manually, and it works nicely in that now domainname
returns the name i specified. The pre-login message is still unchanged
though. I noticed the dnsdomainname (as
opposed to insdomainname) portion of the
/etc/init.d/domainname
script does not actually call
/bin/domainname
to set the domain name. So i added that.
Now, changing those scripts more than just to set defined options or
change
a directory name has never been necessary in other installations, so i
have strong
doubts about this approach as a general fix, but i wanted to see if
that would help.
Alas, it provides only an incomplete solution. I tried rebooting and /bin/domainname
is happy but i still get the same missing domain name in the pre-login
message: This is somemachine.(none) (Linux i686
2.6.7-gentoo-r11). The /etc/resolv.conf
script
does not have a
domain somedomain.com line. After i run /etc/init.d/domainname
manually it
is added properly, but as before that doesn't help the pre-login
message.
There's a puzzle as to why running /etc/init.d/domainname
during boot fails to insert the domain line, but does so fine when run
manually after booting. But does that really have anything to do
with the pre-login message? The pre-login message comes
from the
/etc/issue
file; sadly there is no help with the syntax
found in there: the man page for issue
refers me to the
man page for getty
, but there is no man page or info for getty
.
More searching; turns out it's agetty
syntax i need
anyway.
Aha, yes indeed, that's the problem: the /etc/issue
file
has the string "\n.\O
" to refer to host (node) name
(period) domain name. The agetty
syntax, however,
specifies a lower case "\o
" not an upper case "\O
".
Change that and now the login message is all better.
I rebooted, and irritatingly while the pre-login message is still fine,
the domain line in the /etc/resolv.conf
file remains
absent. I added some debug code to /etc/init.d/domainname
and rebooted. It is in fact creating the correct file; something must
indeed be overwriting it. dhcp seems like a likely culprit, but
continuing this investigation is turning into a lot of work for a very
small and increasingly unclear reward. The network works fine, domainname
returns the right name, and the aesthetic problem with the pre-login
message is fixed, so until the lack of an entry in resolv.conf
turns out to be a problem the current state will do.
Update: Thanks to Brant Gurganus for pointing out a much
better and simpler solution. The problem actually stems from the
ordering of entries in the /etc/hosts
file, where i had
an entry like "127.0.0.1 localhost host.domain host
".
The first entry on this line should be the canonical one, so changing
that to "127.0.0.1 host.domain host localhost
" neatly
fixes the problem without need for any of my hacks above!
Another update: Thanks to
Aad-Jan Couwenhoven for pointing out another solution that
leaves localhost as the canonical name for 127.0.0.1, found
here
Updating the System
Gentoo has great install documentation. But there were a few
post-install maintenance issues that were just not clear to me, and
which i couldn't find anything else about. Here are a few notes i
took.
Updating the kernel. I've
been doing relatively frequent emerge sync
commands and emerge
-u world
to keep everything up to date, and that's all the
gentoo documentation suggests i need to do. However, at some
point i noticed that a couple new kernel sources were installed, and in
fact it was up to kernel 2.6.7-r14 from my initial r11
release. I'm reasonably sure it didn't install any new kernels
automatically, so i presume i need to install those manually.
Of course instructions on rebuilding the kernel are everywhere, and
i've already done a number of rebuilds anyway. Instructions on
how to transfer a configuration from an old kernel to a new kernel
source tree and then rebuild it, however, are less ubiquitous.
The steps i finally found are: 1) copy the .config
file
from the root of the old source tree to the root of the new source tree
2) run make oldconfig
to get it sync-ed with any new
kernel options, and then just build & install as usual. BTW,
the actual
(2.6) kernel build steps are even simpler than the Gentoo documentation
suggests; simply 1) make && make modules_install
2) mount /boot
and then 3) make install
to copy the new kernel & related files to the /boot
directory. Note that that last step eliminates a lot of
manual substeps --- it copies the kernel & renames it according to
the kernel version, does the same with the System.map
etc, creates symbolic links without version numbers to point to the new
kernel, and renames any old conflicting kernels. I still
had to edit /boot/grub/grub.conf
to fix up the boot menu,
but that was it. umount /boot
, reboot, and it works.
Almost. The ati drivers don't load now; there's a message during
boot, and while X still works the acceleration is gone.
Fortunately a simple reinstall of the drivers fixes that and it's back
to normal. The madwifi-drivers also don't load, and a simple
emerge of them also fixes that.
Kernel 2.6.8 and wireless (madwifi).
When kernel 2.6.8 showed up i followed the same steps, only to discover
that now the madwifi-driver
ebuild fails (won't
compile). Looking at the compile errors and examining the
different linux source trees, the problem seems to stem from a change
in the definition of the proc_dointvec
kernel function,
which now takes an extra parameter. I didn't find much useful
info on exactly what that parameter should be, but other patches i
found for different projects that had encountered a similar problem
just seemed to add an extra parameter loff_t *ppos
to the
calling function to pass along to proc_dointvec
.
Seems worth a try, but how do i edit source for an ebuild and recompile
it to test it? The closest/easiest instructions i found were these
instructions on how to add custom ebuilds. It is missing a
few important things for my purposes, like how to actually create a
patch (what diff
options?), where to put and how to name
the patch, and how to get it use the old ebuild source files as a
base. The actual steps i took are below (nb i already had portage
overlap stuff setup from installing wireless-utils so i didn't need to
do that). I've included the files i made below, but i hope it's
obvious i'm doing a lot of guessing and supposition: my patch & ebuild are not official or
vetted in any way; if you want to use them it's at your own risk!
- Patches are just
diff
outputs, so creating a custom
patch to add the new parameter shouldn't be hard. There's even
already a patch in the /usr/portage/net-wireless/madwifi-driver/files
directory to look at as an example of what diff options to use and what
the end product should look like. I created a patch by editing
the file that had the compile error, /var/tmp/portage/madwifi-driver-0.1_pre20040726/work/driver/if_ath.c
which had remained after the earlier failed ebuild, and used diff
to create a file i imaginatively called madwifi-driver-0.1-fix.patch
.
- I then created a corresponding directory in
/usr/local/portage
and copied the existing madwifi-0.1_pre20040726.ebuild
ebuild to there, renaming it to madwifi-0.1_pre20040820.ebuild
.
I also created a corresponding files
subdirectory, copied
over the existing patch file (don't need the digests), and added my new
patch file there too.
- I then edited the
madwifi-0.1_pre20040820.ebuild
and changed "$P
" in the SRC_URI
definition
to be the explicit old madwifi build, "madwifi-driver-0.1_pre20040726
",
and added a line "epatch ${FILESDIR}/${PN}-0.1-fix.patch
"
after the existing patch at the end of the src_unpack
function. Here's
the resulting file.
- Finally, i created the digest, and emerged the driver. It
works!
Reboot, and the modules now load, and yes indeed wireless works
again. Hope that sort of thing doesn't happen every minor
kernel version upgrade though.