Difference between revisions of "Gkrellm replacement"
(2 intermediate revisions by the same user not shown) | |||
Line 28: | Line 28: | ||
**temp#_label(I) temp#_input(o) | **temp#_label(I) temp#_input(o) | ||
**asume X# for label, and assume nothing for pwm | **asume X# for label, and assume nothing for pwm | ||
***hwmon-init: <dev><type><num> <label> | |||
***hwmon: <dev><type><num> <input> <pwm> | |||
*/sys/class/net/*/statistics/ | */sys/class/net/*/statistics/ | ||
**rx_bytes(o) rx_packets(o) | **rx_bytes(o) rx_packets(o) | ||
**tx_bytes(o) tx_packets(o) | **tx_bytes(o) tx_packets(o) | ||
***net: <name> <rxb> <rxp> <txb> <txp> | |||
*/sys/devices/system/cpu/*/cpufreq/ (for cpu frequencies) | */sys/devices/system/cpu/*/cpufreq/ (for cpu frequencies) | ||
**cpuinfo_max_freq(I) scaling_max_freq(I) | **cpuinfo_max_freq(I) scaling_max_freq(I) | ||
**cpuinfo_cur_freq(o) scaling_cur_freq(o) | **cpuinfo_cur_freq(o) scaling_cur_freq(o) | ||
**cpuinfo_min_freq(I) scaling_min_freq(I) | **cpuinfo_min_freq(I) scaling_min_freq(I) | ||
***freq-init: <cpu> <max> <min> | |||
***freq: <cpu> <cur> | |||
*/proc/acpi/thermal_zone/*/temperature | */proc/acpi/thermal_zone/*/temperature | ||
*/proc/acpi/battery/*/info | */proc/acpi/battery/*/info | ||
Line 48: | Line 53: | ||
**remaining capacity(o) | **remaining capacity(o) | ||
**present voltage(o) | **present voltage(o) | ||
***battery-init: <name> <d cap> <d vlt> <d cap wrn> <d cap low> | |||
***battery: <name> <cap state> <last full> <cur cap> <cur vlt> <chg state> <cur rate> | |||
*`hostname`(I) | *`hostname`(I) | ||
*`date`(o) | *`date`(o) | ||
***date: system <date> <time> | |||
*/proc/version(I) | */proc/version(I) | ||
***system-init: system <host> <version> | |||
*/proc/loadavg(o) | */proc/loadavg(o) | ||
**<1min> <5min> <15m> <cur>/<total> *blah* | **<1min> <5min> <15m> <cur>/<total> *blah* | ||
***load: system <1m> <5m> <15m> | |||
***proc: system <cur> <total> | |||
*/proc/uptime(o) | */proc/uptime(o) | ||
**<up> <idle> | **<up> <idle> | ||
***uptime: system <total uptime> | |||
*/proc/mdstat(I) | */proc/mdstat(I) | ||
**md# : active raidN <discs> | **md# : active raidN <discs> | ||
** | **<blocks> <chunk size> [4/4] UUUU | ||
***md-init: <disk> <raidtype> <status> <disks active> <disks total> | |||
*/proc/meminfo | */proc/meminfo | ||
**MemTotal(I) | **MemTotal(I) | ||
Line 66: | Line 79: | ||
**SwapTotal(I) | **SwapTotal(I) | ||
**SwapFree(o) | **SwapFree(o) | ||
***mem-init: mem <total> <swap total> | |||
***mem: mem <free> <buf> <cache> <swap free> <swap cache> | |||
*/proc/diskstats(o) | */proc/diskstats(o) | ||
**<maj> <min> <dev> x x <blks read> x x x <blks written> x x x x | **<maj> <min> <dev> x x <blks read> x x x <blks written> x x x x | ||
***disk: <name> <blocks read> <blocks written> | |||
*/proc/stat(o) | */proc/stat(o) | ||
**cpu[#] <usr> <nice> <sys> <idle> *blah* | **cpu[#] <usr> <nice> <sys> <idle> *blah* | ||
***cpu: <name> <usr> <nice> <sys> <idle> | |||
Each class of thing should be distilled down to a small number of numbers that can all go on one line for each sample time, and each gets their own log file and data header in the served stream. Each time the daemon starts itself, it should write to each log file the current date and time, the hostname, the kernel version, and what it is keeping track of, and what each column of numbers means. Also, what the time interval between samples is going to be (though some error must be assumed to exist). | Each class of thing should be distilled down to a small number of numbers that can all go on one line for each sample time, and each gets their own log file and data header in the served stream. Each time the daemon starts itself, it should write to each log file the current date and time, the hostname, the kernel version, and what it is keeping track of, and what each column of numbers means. Also, what the time interval between samples is going to be (though some error must be assumed to exist). | ||
On each interval, the date should be read, and hostname, then all data from /proc/*, then /proc/acpi/*, then /sys/devices/* and /sys/class/*. | |||
The date should be sent, and then the init stuff should be compared against the previous versions of each, and then any difference should be sent at once. Finally, all normal per interval data should be sent. | |||
By 'sent', that means that all data should be put in the local log file, and sent to the SQL server of the network for later dissemination. The SQL server keeps the data in mostly the same format. There should be two tables, one for init class events, and one for all other events. Each event is logged with a timestamp, hostname, event class, resource name (some resource names might be total, some might be system), and then the rest of the data. Note that some init events might change the system hostname. If this occurs, there will appear to be a discontinuity in the stored data, but this will have to be allowed. |
Latest revision as of 22:08, 8 June 2008
gkrellm/gkrellmd is the suxors.
Would be good to have a replacement in place. A couple of components would be nice.
- Logging daemons. These would sit on each host and collect data every interval and save the data to (network available and local) log files so I could keep track of how things have gone over time. Could also serve the collected data over a socket interface.
- Server side aggregator. This would sit on my webserver, get data from the logging daemons and the old log files, and make it available for a web client (or whatever) to display.
- Client side viewers. This might be a ajaxy program thingy that displays all the realtime data and allows navigation of the historical data as well.
Things to keep track of:
- Temperatures
- fan speeds
- Voltages
- Processor speeds
- Processor load
- Number of processes
- memory utilization
- network traffic
- disk traffic
- battery state, rates, and remaining time estimates
- local date/time
- local hostname
- local uptime
- raid data
Where to find these things:
- /sys/class/hwmon/*/device/ (for temps, fan speeds, voltages)
- in#_label(I) in#_input(o)
- fan#_label(I) fan#_input(o) pwm#(o)
- temp#_label(I) temp#_input(o)
- asume X# for label, and assume nothing for pwm
- hwmon-init: <dev><type><num> <label>
- hwmon: <dev><type><num> <input> <pwm>
- /sys/class/net/*/statistics/
- rx_bytes(o) rx_packets(o)
- tx_bytes(o) tx_packets(o)
- net: <name> <rxb> <rxp> <txb> <txp>
- /sys/devices/system/cpu/*/cpufreq/ (for cpu frequencies)
- cpuinfo_max_freq(I) scaling_max_freq(I)
- cpuinfo_cur_freq(o) scaling_cur_freq(o)
- cpuinfo_min_freq(I) scaling_min_freq(I)
- freq-init: <cpu> <max> <min>
- freq: <cpu> <cur>
- /proc/acpi/thermal_zone/*/temperature
- /proc/acpi/battery/*/info
- design capacity(I)
- last full capacity(o)
- design voltage(I)
- design capacity warning(I)
- design capacity low(I)
- /proc/acpi/battery/*/state
- capacity state(o)
- charging state(o)
- present rate(o)
- remaining capacity(o)
- present voltage(o)
- battery-init: <name> <d cap> <d vlt> <d cap wrn> <d cap low>
- battery: <name> <cap state> <last full> <cur cap> <cur vlt> <chg state> <cur rate>
- `hostname`(I)
- `date`(o)
- date: system <date>
- /proc/version(I)
- system-init: system <host> <version>
- /proc/loadavg(o)
- <1min> <5min> <15m> <cur>/<total> *blah*
- load: system <1m> <5m> <15m>
- proc: system <cur> <total>
- <1min> <5min> <15m> <cur>/<total> *blah*
- /proc/uptime(o)
- <up> <idle>
- uptime: system <total uptime>
- <up> <idle>
- /proc/mdstat(I)
- md# : active raidN <discs>
- <blocks> <chunk size> [4/4] UUUU
- md-init: <disk> <raidtype> <status> <disks active> <disks total>
- /proc/meminfo
- MemTotal(I)
- MemFree(o)
- Buffers(o)
- Cached(o)
- SwapCached(o)
- SwapTotal(I)
- SwapFree(o)
- mem-init: mem <total> <swap total>
- mem: mem <free> <buf> <cache> <swap free> <swap cache>
- /proc/diskstats(o)
- <maj> <min> <dev> x x <blks read> x x x <blks written> x x x x
- disk: <name> <blocks read> <blocks written>
- <maj> <min> <dev> x x <blks read> x x x <blks written> x x x x
- /proc/stat(o)
- cpu[#] <usr> <nice> <sys> <idle> *blah*
- cpu: <name> <usr> <nice> <sys> <idle>
- cpu[#] <usr> <nice> <sys> <idle> *blah*
Each class of thing should be distilled down to a small number of numbers that can all go on one line for each sample time, and each gets their own log file and data header in the served stream. Each time the daemon starts itself, it should write to each log file the current date and time, the hostname, the kernel version, and what it is keeping track of, and what each column of numbers means. Also, what the time interval between samples is going to be (though some error must be assumed to exist).
On each interval, the date should be read, and hostname, then all data from /proc/*, then /proc/acpi/*, then /sys/devices/* and /sys/class/*. The date should be sent, and then the init stuff should be compared against the previous versions of each, and then any difference should be sent at once. Finally, all normal per interval data should be sent. By 'sent', that means that all data should be put in the local log file, and sent to the SQL server of the network for later dissemination. The SQL server keeps the data in mostly the same format. There should be two tables, one for init class events, and one for all other events. Each event is logged with a timestamp, hostname, event class, resource name (some resource names might be total, some might be system), and then the rest of the data. Note that some init events might change the system hostname. If this occurs, there will appear to be a discontinuity in the stored data, but this will have to be allowed.