Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
539 views
in Technique[技术] by (71.8m points)

Log file size calculated using len(_raw) in Splunk does not match even close to the actual file size on the host?

I am using a Splunk query to calculate the size of logs files sent to Splunk. This is the Splunk query I have used:

index="<my_index>" path="/<my_path>/<my_log_file>" 
| eval raw_len=len(_raw) 
| eval raw_len_kb = raw_len/1024 
| eval raw_len_mb = raw_len/1024/1024 
| eval raw_len_gb = raw_len/1024/1024/1024 
| stats sum(raw_len) as Bytes sum(raw_len_kb) as KB sum(raw_len_mb) as MB sum(raw_len_gb) as GB by source 
| addcoltotals

Splunk reports the size as 17 GB. On the other hand, when I do this on the Unix host:

ls -l /<my_path>/<my_log_file>

the value is just a few MB.

Any idea why there is so much difference?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

One should not expect the size of data indexed in Splunk to exactly match the size reported by an OS. This is because Splunk by default removes line ends and because the len function counts characters rather than bytes.

Also, the query shown does not account for multiple hosts sending data to Splunk. There's no time window indicated so we don't know if the file may have been truncated at time point while Splunk still retains all of the data the file ever held.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...