Thomas Jarosch [Wed, 7 Apr 2021 10:33:43 +0000]
Merge branch 'improve-safety'
Thomas Jarosch [Wed, 7 Apr 2021 10:10:06 +0000]
Increase pingcheck version to 0.8
Thomas Jarosch [Wed, 7 Apr 2021 10:27:45 +0000]
Unbreak unit test
The on-disk cache structure changed in 2015 and we didn't notice.
-> add new field.
Thomas Jarosch [Wed, 7 Apr 2021 10:05:36 +0000]
Write out cache file to temporary file + atomic replace
Just in case multiple processes might race for the writeout.
Actually seen in production as pingcheck currently seems
to get restarted by connd for every new INTRACLIENT.
Thomas Jarosch [Wed, 7 Apr 2021 09:29:02 +0000]
Add exception safety to load of the DNS disk cache
A corrupted cache might throw an exception in the middle
of an unserialize operation. This might leave our DnsCache class
with an undefined data state.
Solution: Unserialize into local variables first, then do a
atomic swap() once everything could be parsed correctly.
Thomas Jarosch [Sat, 23 May 2020 10:37:57 +0000]
Fix 'occurred' typo
Thomas Jarosch [Fri, 8 Sep 2017 12:15:34 +0000]
Explicitly link pthread
Needed by newer binutils, they are more strict.
Thomas Jarosch [Wed, 23 Dec 2015 18:14:58 +0000]
Switch to Intra2net rpm group
Christian Herdtweck [Wed, 11 Nov 2015 17:08:34 +0000]
fixed CMakeLists: do not make debug the default, remove compile flags
Christian Herdtweck [Wed, 22 Jul 2015 08:02:38 +0000]
moved tests
Christian Herdtweck [Fri, 17 Jul 2015 15:53:43 +0000]
updated connd --status output parser to include log level; mode it pep8 and partially pylint-compatible
Christian Herdtweck [Thu, 16 Jul 2015 07:53:04 +0000]
revision 1 of version 0.7
Christian Herdtweck [Wed, 15 Jul 2015 11:27:25 +0000]
remove consistency check for empty callback list in finalize_resolve
empty list possible and ok if resolving was initiated by long-term timer
Christian Herdtweck [Fri, 19 Jun 2015 15:27:37 +0000]
increase version to 0.7
main feature: tested time warp compatibility (or lack thereof, see file TODO)
Christian Herdtweck [Fri, 19 Jun 2015 15:24:06 +0000]
revision 0.6r5 so we are in sync with other revision numbers; make compiler happier
Thomas Jarosch [Fri, 19 Jun 2015 15:19:24 +0000]
Fix trivial typo
Christian Herdtweck [Fri, 19 Jun 2015 15:16:41 +0000]
downgraded icmp parsing error log messages and notice when not saving packets
Christian Herdtweck [Fri, 19 Jun 2015 15:13:03 +0000]
bugfixed erase and re-set of TTLs; changed time warp thresh from 24h to 10mins
increased revision to 0.6r4
Christian Herdtweck [Fri, 19 Jun 2015 13:22:39 +0000]
added clean-up to DnsCache: will remove very old entries after 60 days
Christian Herdtweck [Fri, 19 Jun 2015 13:19:46 +0000]
moved 3 lines of Cname constructors from dnscache.cpp to cname.h
(why would anybody search for Cname code in dnscache.cpp?)
Christian Herdtweck [Fri, 19 Jun 2015 12:25:54 +0000]
secured DnsCache against time warp-related errors
Christian Herdtweck [Fri, 19 Jun 2015 11:52:03 +0000]
made a note in TODO and signal handler about SIGHUP and boost asio in case of time warp
Christian Herdtweck [Fri, 12 Jun 2015 08:20:11 +0000]
prevent pingcheck from repeating report of same status; show notification status in log
Christian Herdtweck [Thu, 11 Jun 2015 16:45:46 +0000]
changed value of link_up/down_interval from minutes to seconds
more work than expected!
Christian Herdtweck [Wed, 10 Jun 2015 14:57:30 +0000]
removed debug output (to be precise: reduced level from notice to debug)
Christian Herdtweck [Wed, 10 Jun 2015 14:47:43 +0000]
save float comparison; fixed syntax error
Christian Herdtweck [Wed, 10 Jun 2015 14:33:11 +0000]
increased revision --> v0.6r2
Christian Herdtweck [Wed, 10 Jun 2015 14:06:52 +0000]
paranoia: ensure we never divide by 0 althrough logic should prevent that
Christian Herdtweck [Wed, 10 Jun 2015 14:01:19 +0000]
added delay to pingers with random interval; delay all first pings
So far the mostly had no delay at all, now the delay is random in [0, interval].
For all pingers set a little delay before first ping
Christian Herdtweck [Fri, 5 Jun 2015 16:29:45 +0000]
changed forgotten link status messages to format used for other messages when reporting link up/down
Christian Herdtweck [Wed, 3 Jun 2015 12:53:16 +0000]
fixed memory access violations in test_dns: cancelled a timer on dying io_service
changed testing log level to debug
Christian Herdtweck [Wed, 3 Jun 2015 08:00:27 +0000]
increse revision to 0.6.1 and log level of missing-dump-dir-message to notice
Christian Herdtweck [Fri, 29 May 2015 09:00:17 +0000]
made counts in link status analyzer log messages clearer as promised quite a while ago
Christian Herdtweck [Fri, 29 May 2015 08:54:51 +0000]
added missing counter increment in backup loop termination
Christian Herdtweck [Fri, 29 May 2015 08:30:51 +0000]
improved information content of logs: in LinkAnalyzer messages add cname chain
unfortunately this means a dependency of LinkAnalzer from DNS, so had to change
CMake for test_linkstatus quite a bit
Christian Herdtweck [Thu, 28 May 2015 14:04:59 +0000]
fixed and re-enabled unit tests
Christian Herdtweck [Thu, 28 May 2015 13:49:50 +0000]
changed default path for log file to /var/log
Christian Herdtweck [Thu, 28 May 2015 13:05:24 +0000]
removed debug option I had set for testing (DUMP_ALWAYS)
Christian Herdtweck [Thu, 28 May 2015 13:01:51 +0000]
changed paths of DNS cache (now in /var/cache) and dumped packets; dump only if /datastore/pingcheck.broken exists
Christian Herdtweck [Thu, 28 May 2015 12:49:21 +0000]
moved adjustment of PingTimeout to right place in PingScheduler;
re-added line I had deleted erroneously; compiles again now
Christian Herdtweck [Thu, 28 May 2015 12:47:15 +0000]
made limits in HostStatus floats; reduce logging
Christian Herdtweck [Thu, 28 May 2015 12:41:38 +0000]
fixed bug that caused HostStatus to never leave BurstMode
Christian Herdtweck [Thu, 28 May 2015 12:40:03 +0000]
added variable for threshold for switching from "all congested" --> "connection failed"
Christian Herdtweck [Thu, 28 May 2015 12:37:23 +0000]
made congestion/offline behaviour more stable: do no declare online right after going offline
Christian Herdtweck [Thu, 28 May 2015 12:35:41 +0000]
fixed bug that caused congestion flag to stay on when all is congested and fail flag is set in HostStatus
Thomas Jarosch [Wed, 27 May 2015 09:01:13 +0000]
Re-enable unit test during rpm build. Ignore exit status for now
Thomas Jarosch [Wed, 27 May 2015 08:51:22 +0000]
Release version 0.6
Christian Herdtweck [Tue, 26 May 2015 16:19:33 +0000]
to avoid going down in congested line scenario, also need longer ping timeout
also bugfix in HostStatus
Christian Herdtweck [Tue, 26 May 2015 15:59:38 +0000]
added option to limit number of IPs per host that are saved in cache
Christian Herdtweck [Tue, 26 May 2015 15:54:06 +0000]
congestion detection now working; also add case that if all IPs get timeout despite higher ping numbers, then declare host down
added more detailed failure returns from IcmpPinger
Christian Herdtweck [Tue, 26 May 2015 15:50:33 +0000]
removed assertion in HostStatus that PingsPerformedCount <= ResolvedIpCount*NParallelPingers
because that required some hacks in case we have no IP and call the ping_done_handler
Christian Herdtweck [Tue, 26 May 2015 13:10:43 +0000]
use line digestion recognition in PingScheduler;
* created new PingNumber based on PingInterval
* moved creation of pingers just before calling Pinger->ping
* moved creation of DnsResolver to prepare_next_ping
* removed unnecessary arg to update_dns_resolver
* fixed bug in HostStatus::log_prefix
Christian Herdtweck [Tue, 26 May 2015 09:35:00 +0000]
added congestion analysis to HostStatus
(make cast from time difference to long explicit in pinger callback)
Christian Herdtweck [Tue, 26 May 2015 08:17:12 +0000]
if sending several pings in parallel, delay them in scheduler
Christian Herdtweck [Tue, 26 May 2015 08:16:39 +0000]
moved time duration measurement of ping from scheduler to pingers
Christian Herdtweck [Tue, 26 May 2015 08:09:50 +0000]
removed the main while loop because if catch an exception cannot know state of pingers, so better to fail and re-start binary completely
Christian Herdtweck [Fri, 22 May 2015 16:19:25 +0000]
started parallel pings, not quite done yet since need to delay them
Christian Herdtweck [Fri, 22 May 2015 13:28:47 +0000]
give HostStatus analyzer more info: details on ping success/failure and ping duration
preparing HostStatus to decide other ping timeouts/frequencies/numbers and DNS timeout
Christian Herdtweck [Fri, 22 May 2015 10:01:24 +0000]
deal with case that have no IP from DNS nor from Cache
Christian Herdtweck [Fri, 22 May 2015 08:16:36 +0000]
completed partial IPv6 compatibility in DNS; does retrieve and Cache IPv6 IPs
also added test for this and updated cache test for future cases of cache structure changes
Christian Herdtweck [Thu, 21 May 2015 09:33:44 +0000]
use I2n::tmpfstream to write pcap dump files
Christian Herdtweck [Thu, 21 May 2015 08:00:37 +0000]
remove debug output or more precisely: changed its log level from notice to debug; revision 1 of v0.5
Christian Herdtweck [Thu, 21 May 2015 07:39:15 +0000]
fixed a bug causing failed assertions from inconsistent counts in HostStatus
reason was that number of IPs may change if target host is a cname that
has changed to a different end host; this is very hard to predict so
always reset counters if number of IPs changes
Christian Herdtweck [Wed, 20 May 2015 16:09:23 +0000]
improved simplified IcmpPinger logging (using LogPrefix like in DNS and PingScheduler)
Christian Herdtweck [Wed, 20 May 2015 15:22:47 +0000]
add some debugging output at NOTICE level to HostStatus so can debug on production machine
Christian Herdtweck [Wed, 20 May 2015 15:21:34 +0000]
changed default value for time between resolves (min TTL) from 10s to 60s
Christian Herdtweck [Wed, 20 May 2015 15:20:50 +0000]
fixed bug: correctly handle case when never had any IP and DNS fails
Christian Herdtweck [Wed, 20 May 2015 10:38:16 +0000]
extended ICMP packet dumping to parts after packet creation
in detail:
* in IcmpPinger::handle_receive_icmp_packet, may get exceptions from boost assertions
in to_string and print and match_xy functions --> put into a try-catch
* created a dump_packet(packet) and moved dump_packet functions into IcmpPacketFactory
* to dump an IcmpPacket, need to also dump its IP header --> created IpHeader::write
* for this moved the Payload variable from Ipv4/6Header to IpHeader super class
* during testing in feed_packet_data, add skipped byte size to log output
Christian Herdtweck [Wed, 6 May 2015 16:12:57 +0000]
re-raise a little output in cache: state newly acquired IPs and CNAMEs
Christian Herdtweck [Wed, 6 May 2015 15:25:32 +0000]
version 0.5
Christian Herdtweck [Wed, 6 May 2015 15:02:56 +0000]
logging update
Christian Herdtweck [Wed, 6 May 2015 15:02:33 +0000]
added logic to deal with dns replies to old dns requests --> no more consistency check failures for now
Christian Herdtweck [Wed, 6 May 2015 13:32:35 +0000]
made DNS much less talkative
Christian Herdtweck [Wed, 6 May 2015 13:27:19 +0000]
made link status analyzer more talkative
Christian Herdtweck [Tue, 5 May 2015 13:44:10 +0000]
added log output target UNDEFINED; return if no hosts defined
* added the target so can warn user in case of unrecognized command line option
* also updated connd_state (there is a new connd subsys)
* avoid warning from cmake by not adding test subdir in test_dns cmake file
Christian Herdtweck [Mon, 4 May 2015 16:27:31 +0000]
made nicer static variables of packet dump mode and location
Christian Herdtweck [Mon, 4 May 2015 15:23:56 +0000]
increased version to 0.4 revision 0: new DNS, merged Scheduler with Rotate and made it use async DNS
Christian Herdtweck [Mon, 4 May 2015 15:09:04 +0000]
remove the footer saying that vim is the best editor -- anyone knows anyway
Christian Herdtweck [Mon, 4 May 2015 14:09:10 +0000]
created arg recursion_count to async_resolve and many other to avoid infinite loops
* arg recursion_count replaces cname_count which up to now identified problems in post-processing
* created simple loop test
* added option to configure recursion depth limit to DnsMaster
minor other changes:
* corrected docu for resolved-ip-ttl-threshold option: is not milliseconds but seconds
* add test to ResolverBase: check number of callbacks and warn if suspiciously many
Christian Herdtweck [Mon, 4 May 2015 08:47:53 +0000]
added option min-time-between-resolves-option and tests for it
Christian Herdtweck [Mon, 4 May 2015 07:12:09 +0000]
define this as revision 3 of version 0.3 and add to intranator source master
Christian Herdtweck [Mon, 4 May 2015 07:08:44 +0000]
had forgotten to unset debug option (max number of acceptable errors was 1)
Christian Herdtweck [Thu, 30 Apr 2015 16:34:55 +0000]
did minor changes; used this version for testing over week-end
Christian Herdtweck [Thu, 30 Apr 2015 16:10:46 +0000]
fixed bug that caused outdated IPsto be returned from cache; added test for that
Christian Herdtweck [Thu, 30 Apr 2015 16:09:07 +0000]
found reason for 0.0.0.0 IPs in logs: IcmpPingers are created without IP but register immediately with PacketDistributor
adjust log message, should probably remove debug output some time
Christian Herdtweck [Thu, 30 Apr 2015 14:13:08 +0000]
added option log-file and option FILE to log-output; use in main and adjusted unit tests
Christian Herdtweck [Thu, 30 Apr 2015 13:28:51 +0000]
tested new DNS with internal server, make more robust against caching; works nicely now
Christian Herdtweck [Thu, 30 Apr 2015 07:39:11 +0000]
added test to DNS: load cache from file
Christian Herdtweck [Thu, 30 Apr 2015 07:38:23 +0000]
fixed bug in TimeToLive: get huge TTLs from cast to uint32_t of negative values
Christian Herdtweck [Wed, 29 Apr 2015 15:45:16 +0000]
fixed 2 bugs and made clearer that Long-term timer in DnsResolver is not affected by finalize_resolve
bugs:
* HostAddress::is_valid returned opposite
* use the returned IP to ping and do not request another in PingScheduler::ping
* move reset of ContinueOnOutdatedIp within PingScheduler to after ping
long-term timer:
* added a variable DnsResolver: LongtermTimerIsActive
* added a function to ResolverBase: is_waiting_to_resolve
* added a function cancel_resolve to PingScheduler that decides whether to cancel or not
* added an arg to DnsResolver::stop_trying
Christian Herdtweck [Wed, 29 Apr 2015 13:38:45 +0000]
removed the HostStatus::report_dns_resolution_failure (not used any more)
Christian Herdtweck [Wed, 29 Apr 2015 13:36:04 +0000]
ensured LogPrefix is used in DNS and PingScheduler; shortened lines; remove vim end note
Christian Herdtweck [Wed, 29 Apr 2015 13:18:34 +0000]
more cname-skip unit-tests and simplification of skip-finder function
Christian Herdtweck [Wed, 29 Apr 2015 12:38:50 +0000]
created and passed first unit tests for DNS; finished recovery from PingScheduler::ContinueOnOutdatedIPs
Also:
* renamed PingScheduler::try_to_ping to PingScheduler::ping_when_ready
* fixed string constant DnsCache::DoNotUseCacheFile
* moved Cname to own header file
* created constructor of DnsMaster with DnsCacheItem so can test cache offline
* fixed retrieval of outdated CName
Christian Herdtweck [Fri, 24 Apr 2015 16:42:05 +0000]
remote PingRotate
Christian Herdtweck [Fri, 24 Apr 2015 16:41:29 +0000]
merged PingRotate into PingScheduler; fixed save/load of cache to/from file; started creating DNS unittest
code now compiles and links and pings again!
merged because through Asynchronicity it is more difficult to give feedback from Rotate to Scheduler
further changes
* in cache ensure that there is not cname and ip for same hostname
* in cache remove trailing dots from hostnames
* by default ask only for [number of] updatd IPs in retrieval from resolver
* remove some left-overs from recursive resolving (e.g. Recursor creation in master)
Christian Herdtweck [Thu, 23 Apr 2015 16:06:45 +0000]
simplified dns (no self-made recursion); merge PingScheduler and PingRotate; make them use Async DNS
does not compile yet, work in progress!
Christian Herdtweck [Thu, 23 Apr 2015 08:38:20 +0000]
finished self-implementation of DNS resolver recursion; will now remove all that!
implemented resolving that avoids caching by creating non-recursive queries
that are directly talking to different name servers; will remove this now
because firewalls might prevent direct communication and do not want to
self-implement dnssec some time.
--> now remove own implementation of recursion in resolving and try to deal
with cached results
Christian Herdtweck [Fri, 17 Apr 2015 16:52:35 +0000]
continue implementation; first tests with recursion returned IPs but then added cancel code
added option to cancel Recursor if timeout is reached
sometimes re-try resolving more quickly instead of calling handle_unavailable
changed saving of ttls: save original value and creation time by adding friend class HostAddress
cache can do the recursive ip retrieval, better place than ResolverBase
Christian Herdtweck [Thu, 16 Apr 2015 16:08:32 +0000]
changed how dns deals with cnames and recursion: remember cnames and implement recursive lookup here using DnsResolvers that are not saved in DnsMaster.
Experiments with own name server and recursive cnames showed that TTLs are not always the minimum.
This way we have better control about resulting TTLs
and debugging is easier since DNS caching is avoided.
Also create unique id for each dns message and check reply
and in main warn if using debug option max_exceptions.