A look inside
There’s one major difference between trying to repair a computer and trying to repair an alarm clock, a car, or nearly anything else.
- Normal devices have physical working parts which you can see interacting- camshafts, gears, bits of metal and moving components. They make noises. They vibrate, hum, emit smoke, and generally give off many different signals which give you information about them.
- Computers only interact with their environment over the prescribed input/output channels. One one level, this means the only way they can talk to you is via the screen and the speakers. On another level, it means the only things you can see is what the operating system wants you to. It’s quite happy to let you know the processor load or the amount of free memory- but only when it wants you to, and it’s often very dumbed-down.
This can be annoying when you’re trying to fix things.
- If something goes wrong with a piece of machinery, you can usually open it and have a look. There’s usually some kind of cover which you can easily remove, and then you get an overview of the workings of the machine. If you’re lucky you will be able to scope out vaguely how it works, and if you’re even more lucky the problem will be obvious. There may be smoke or something obviously not moving which ought to be.
- A computer is basically a black box. If something goes wrong, you have no way of peering inside and seeing what’s going on. You can take apart the machine, fine. But you can’t take apart the operating system, the contents of memory, the webcam drivers or the bootloader. They’re just patterns of bits on a disc.
So it’s usually completely impossible to see what’s wrong.
Yes, the OS is meant to keep you informed about things like this. But since when did it ever do a good job of this?
The only information OS’s generally give you is
- A list of processes and the resources they’re using
- How much memory is free
- How much processor is being used.
Wouldn’t it be brilliant if, when you had a wifi problem, you could just peer into the TCP/IP stack, watch the packets flowing backwards and forwards, and spot that they weren’t getting past the router?
Wouldn’t it be awesome if your printer wasn’t responding- but you could just take the cover off the USB controller and see that one of the driver instances had crashed?
Well, you could.
Imagine a separate monitor program that runs alongside the OS, as weakly connected to it as possible (so that the OS can go down without affecting the monitor). Its job is to give you a visual, schematic-type picture of everything going on inside your system.
List of running processes? Bah. Now you can see each process as a box on the screen, with lines connecting it to the resources it’s using: network sockets, hard disc, memory, files it has open, DLLs or libraries it has loaded.
Along each line you can see the flow of data: how much, and in what direction. Look at your Firefox instance and you can see it pulling data from different servers over wifi or ethernet, loading it into memory and, in the case of Firefox, forgetting about it there.
Processes are not the only thing you can monitor. When you open a bonnet, you can see your engine working: there are moving parts, obvious interconnections and obvious problems. Our monitor program would watch the OS and display a schematic of its various parts (kernel, graphics, I/O, network, device drivers…), how they were connected to each other, whether each one was responsive, and how data and dependencies existed between them.
One of the most annoying things about Windows is when it won’t let you eject a drive because it is “in use.” Even if nothing appears to be using it. No problem any more. Just check the monitor; see the lines (or curves, or excting flashing trails) leading from the drive to all the programs which are using it. Check the data flow down the lines, and if if nothing important is being written, you can kill the programs.
Hard drive on the blink? Not sure whether it’s the drive or the motherboard? Check the monitor. Data flow seems fine along the line from the motherboard to the drive. Crap, I need a new motherboard.
Of course, there are downsides…
- Someone needs to write all this shit and make it able to keep an eye on parts of the OS. Don’t expect Microsoft to do it. PCs should be fixed by PC World, not the USER!
- Watching the whole system all the time might have a performance hit. But probably not. You can spare some of those 3 billion cycles per second to log data rates, I think.
- Even if you know exactly where the problem is, the only solution might be to restart the computer anyway…