[Bug 265320] diff program runs long when running diff on large files
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 265320] diff program runs long when running diff on large files"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 265320] diff program runs long when running diff on large files"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 265320] diff program runs long when running diff on large files"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 265320] diff program runs long when running diff on large files"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 19 Jul 2022 23:53:28 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265320 Bug ID: 265320 Summary: diff program runs long when running diff on large files Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: shrikanth07@gmail.com The issue was found when running diff on a set of files that are nearly identical (1865319 lines / ~76MB in size) except having differences in the first line and line 1865315. # uname -a FreeBSD BSD14vm 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n255938-326a8d3e085: Fri Jun 3 08:30:41 UTC 2022 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The diff program ran for more than 3172.21 seconds (or even more) I stopped monitoring with CTRL-T after some point but here is the capture... root@BSD14vm:# ls -ltrhG total 469632 -rw-r--r-- 1 root wheel 76M Jul 19 11:46 config_edb_pre -rw-r--r-- 1 root wheel 76M Jul 19 11:46 config_edb_post root@BSD14vm:# diff config_edb_p* load: 0.21 cmd: diff 972 [running] 3.60r 3.44u 0.15s 29% 40480k load: 0.21 cmd: diff 972 [running] 4.90r 4.74u 0.15s 39% 40576k load: 1.10 cmd: diff 972 [running] 156.52r 156.31u 0.15s 100% 46564k load: 1.19 cmd: diff 972 [running] 462.20r 461.86u 0.18s 100% 49140k load: 1.19 cmd: diff 972 [running] 462.64r 462.30u 0.18s 100% 49140k load: 1.19 cmd: diff 972 [running] 462.85r 462.51u 0.18s 100% 49140k load: 1.09 cmd: diff 972 [running] 503.74r 503.39u 0.18s 100% 49140k load: 1.09 cmd: diff 972 [running] 503.96r 503.60u 0.18s 100% 49140k load: 1.09 cmd: diff 972 [running] 504.14r 503.78u 0.18s 100% 49140k load: 1.09 cmd: diff 972 [running] 504.30r 503.94u 0.18s 100% 49140k load: 1.07 cmd: diff 972 [running] 766.07r 765.61u 0.21s 98% 51196k load: 1.17 cmd: diff 972 [running] 919.11r 918.61u 0.21s 100% 51196k load: 1.13 cmd: diff 972 [running] 1014.49r 1013.96u 0.22s 100% 53252k load: 1.13 cmd: diff 972 [running] 1014.75r 1014.21u 0.22s 100% 53252k load: 1.13 cmd: diff 972 [running] 1015.05r 1014.52u 0.22s 98% 53252k load: 1.13 cmd: diff 972 [running] 1015.31r 1014.77u 0.22s 100% 53252k load: 1.13 cmd: diff 972 [running] 1015.49r 1014.96u 0.22s 100% 53252k load: 1.04 cmd: diff 972 [running] 1087.73r 1087.17u 0.22s 100% 53252k load: 1.13 cmd: diff 972 [running] 1266.02r 1265.41u 0.22s 98% 53252k load: 1.04 cmd: diff 972 [running] 1468.09r 1467.41u 0.22s 100% 59176k load: 1.02 cmd: diff 972 [running] 1499.33r 1498.63u 0.22s 100% 59176k load: 1.08 cmd: diff 972 [running] 2095.98r 2095.10u 0.23s 100% 61232k load: 1.43 cmd: diff 972 [running] 2146.53r 2145.62u 0.23s 100% 61232k load: 1.39 cmd: diff 972 [running] 2350.22r 2349.22u 0.23s 100% 63288k load: 1.24 cmd: diff 972 [running] 2874.17r 2873.05u 0.25s 100% 67392k load: 1.12 cmd: diff 972 [running] 2932.42r 2931.28u 0.25s 100% 67392k load: 1.12 cmd: diff 972 [running] 2932.67r 2931.52u 0.25s 100% 67392k load: 1.10 cmd: diff 972 [running] 2980.07r 2978.90u 0.26s 98% 85608k load: 1.19 cmd: diff 972 [running] 3173.49r 3172.21u 0.26s 100% 85608k ... 1c1 < <configuration changed-seconds="1658220346" changed-localtime="2022-07-19 10:45:46 CEST"> --- > <configuration changed-seconds="1658219983" changed-localtime="2022-07-19 10:39:43 CEST"> 1865315c1865315 < <description>2022-07-19 10:43:02.391784</description> --- > <description>2022-07-18 16:19:44.594290</description> As you see there is difference on line 1 and line 1865315 for the entire file. If I use 'head' or 'tail' and retain only one of the difference in the file 'diff' is able to complete in less than a second. The below files have only the diff on line 1 #head -n 1865310 config_edb_pre > p1_headn1865310 #head -n 1865310 config_edb_post > p2_headn1865310 # time diff p1_headn1865310 p2_headn1865310 1c1 < <configuration changed-seconds="1658219983" changed-localtime="2022-07-19 10:39:43 CEST"> --- > <configuration changed-seconds="1658220346" changed-localtime="2022-07-19 10:45:46 CEST"> 0.83 real 0.74 user 0.09 sys The below files have only the diff on line 1865306 # tail -n 1865310 config_edb_pre > p1_tailn1865310 # tail -n 1865310 config_edb_post > p2_tailn1865310 # time diff p1_tailn1865310 p2_tailn1865310 1865306c1865306 < <description>2022-07-18 16:19:44.594290</description> --- > <description>2022-07-19 10:43:02.391784</description> 0.84 real 0.75 user 0.09 sys # which diff /usr/bin/diff # diff --version FreeBSD diff 20220309 # file config_edb_p* config_edb_post: ASCII text config_edb_pre: ASCII text -- You are receiving this mail because: You are the assignee for the bug.