add ansible role journal-postfix (a log parser for Postfix) with playbook and doc

This commit is contained in:
iburadempa 2019-12-15 19:10:28 +01:00
parent 713372c850
commit e5a8025064
14 changed files with 3570 additions and 0 deletions

View file

@ -0,0 +1,378 @@
<h1>journal-postfix - A log parser for Postfix</h1>
<p>Experiences from applying Python to the domain of bad old email.</p>
<h2>Email ✉</h2>
<ul>
<li>old technology (starting in the 70ies)</li>
<li><a href="https://en.wikipedia.org/wiki/Store_and_forward">store-and-forward</a>: sent != delivered to recipient</li>
<li>non-delivery reasons:
<ul>
<li>recipient over quota</li>
<li>inexistent destination</li>
<li>malware</li>
<li>spam</li>
<li>server problem</li>
<li>...</li>
</ul></li>
<li>permanent / non-permanent failure (<a href="https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml">DSN ~ 5.X.Y / 4.X.Y</a>)</li>
<li>non-delivery modes
<ul>
<li>immediate reject on SMTP level</li>
<li>delayed <a href="https://en.wikipedia.org/wiki/Bounce_message">bounce messages</a> by <a href="https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png">reporting MTA</a> - queueing (e.g., ~5d) before delivery failure notification</li>
<li>discarding</li>
</ul></li>
<li>read receipts</li>
<li><a href="https://en.wikipedia.org/wiki/Email_tracking">Wikipedia: email tracking</a></li>
</ul>
<h2><a href="https://en.wikipedia.org/wiki/SMTP">SMTP</a></h2>
<p><a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example">SMTP session example</a>: envelope sender, envelope recipient may differ from From:, To:</p>
<p>Lists of error codes:</p>
<ul>
<li><a href="https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list">SMTP and ESMTP</a></li>
<li><a href="https://serversmtp.com/smtp-error/">SMTP</a></li>
<li><a href="https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx">SMTP</a></li>
</ul>
<p>Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)</p>
<pre><code>SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
</code></pre>
<ul>
<li>email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)</li>
</ul>
<h2><a href="http://www.postfix.org">Postfix</a></h2>
<ul>
<li>popular <a href="https://en.wikipedia.org/wiki/Message_transfer_agent">MTA</a></li>
<li>written in C</li>
<li>logging to files / journald</li>
<li>example log messages for a (non-)delivery + stats</li>
</ul>
<pre><code>Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 &lt;spameri@tiscali.it&gt;: Relay access denied; from=&lt;spameri@tiscali.it&gt; to=&lt;spameri@tiscali.it&gt; proto=ESMTP helo=&lt;WIN-G7CPHCGK247&gt;
Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=&lt;ibu@cosmopool.net&gt; to=&lt;ibu@multiname.org&gt; proto=ESMTP helo=&lt;mail.cosmopool.net&gt;
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=&lt;d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net&gt;
Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=&lt;ibu@cosmopool.net&gt;, size=1365, nrcpt=2 (queue active)
Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu2@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu2@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg Saved)
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
</code></pre>
<ul>
<li><a href="http://www.postfix.org/OVERVIEW.html">involved postfix components</a>
<ul>
<li>smtpd (port 25: smtp, port 587: submission)</li>
<li>cleanup</li>
<li>smtp/lmtp</li>
</ul></li>
<li>missing log parser</li>
</ul>
<h2>Idea</h2>
<ul>
<li>follow log stream and write summarized delivery information to a database</li>
<li>goal: spot delivery problems, collect delivery stats</li>
<li>a GUI could then display the current delivery status to users</li>
</ul>
<h2>Why Python?</h2>
<ul>
<li>simple and fun language, clear and concise</li>
<li>well suited for text processing</li>
<li>libs available for systemd, PostgreSQL</li>
<li>huge standard library (used here: datetime, re, yaml, argparse, select)</li>
<li>speed sufficient?</li>
</ul>
<h2>Development iterations</h2>
<ul>
<li>hmm, easy task, might take a few days</li>
<li>PoC: reading and polling from journal works as expected</li>
<li>used postfix logfiles in syslog format and wrote regexps matching them iteratively</li>
<li>separated parsing messages from extracting delivery information</li>
<li>created a delivery table</li>
<li>hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)</li>
<li>introduced polling timeout and SQL transactions handling several messages at once</li>
<li>... much faster</li>
<li>looks fine, but wait... did I catch all syntax variants of Postfix log messages?</li>
<li>looked into Postfix sources and almost got lost</li>
<li>weeks of hard work identifying relevant log output directives</li>
<li>completely rewrote parser to deal with the rich log msg syntax, e.g.:<br> <code>def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)</code></li>
<li>oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones</li>
<li>mails may have multiple recipients... split delivery table into delivery_from and delivery_to</li>
<li>decide which delivery information is relevant</li>
<li>cleanup and polish (config mgmt, logging)</li>
<li>write ansible role</li>
</ul>
<h2>Structure</h2>
<svg viewBox="0 0 1216 400" xmlns="http://www.w3.org/2000/svg" xmlns:inkspace="http://www.inkscape.org/namespaces/inkscape" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs id="defs_block">
<filter height="1.504" id="filter_blur" inkspace:collect="always" width="1.1575" x="-0.07875" y="-0.252">
<feGaussianBlur id="feGaussianBlur3780" inkspace:collect="always" stdDeviation="4.2" />
</filter>
</defs>
<title>blockdiag</title>
<desc>blockdiag {
default_fontsize = 20;
node_height = 80;
journal_since -&gt; run_loop;
journal_follow -&gt; run_loop;
logfile -&gt; run_loop;
run_loop -&gt; parse -&gt; extract_delivery -&gt; store;
store -&gt; delivery_from;
store -&gt; delivery_to;
store -&gt; noqueue;
group { label="input iterables"; journal_since; journal_follow; logfile; };
group { label="output tables"; delivery_from; delivery_to; noqueue; };
}
</desc>
<rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="56" y="30" />
<rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="1016" y="30" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="259" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="166" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="286" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="451" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="643" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="835" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="166" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="286" />
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="256" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="87" x="320.5" y="90">run_loop</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="79">journal_sin</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="128.5" y="101">ce</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="160" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="199">journal_fol</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="32" x="128.0" y="221">low</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="280" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="128.0" y="330">logfile</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="448" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="512.0" y="90">parse</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="640" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="704.0" y="79">extract_del</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="704.0" y="101">ivery</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="832" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="896.0" y="90">store</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="79">delivery_fr</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="1088.5" y="101">om</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="160" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="210">delivery_to</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="280" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="1088.0" y="330">noqueue</text>
<path d="M 384 80 L 440 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="447,80 440,76 440,84 447,80" stroke="rgb(0,0,0)" />
<path d="M 576 80 L 632 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="639,80 632,76 632,84 639,80" stroke="rgb(0,0,0)" />
<path d="M 768 80 L 824 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="831,80 824,76 824,84 831,80" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 1016 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,80 1016,76 1016,84 1023,80" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 80 L 992 200" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 200 L 1016 200" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,200 1016,196 1016,204 1023,200" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 80 L 992 320" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 320 L 1016 320" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,320 1016,316 1016,324 1023,320" stroke="rgb(0,0,0)" />
<path d="M 192 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<path d="M 192 200 L 240 200" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 200 L 240 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<path d="M 192 320 L 240 320" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 320 L 240 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="122" x="128.0" y="38">input iter ...</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="113" x="1088.5" y="38">output tables</text>
</svg>
<h2>Iterables</h2>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">def</span> iter_journal_messages_since(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
<a class="sourceLine" id="cb3-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-3" title="3"><span class="co"> Yield False and message details from the journal since *timestamp*.</span></a>
<a class="sourceLine" id="cb3-4" title="4"></a>
<a class="sourceLine" id="cb3-5" title="5"><span class="co"> This is the loading phase (loading messages that already existed</span></a>
<a class="sourceLine" id="cb3-6" title="6"><span class="co"> when we start).</span></a>
<a class="sourceLine" id="cb3-7" title="7"></a>
<a class="sourceLine" id="cb3-8" title="8"><span class="co"> Argument *timestamp* is a UNIX timestamp.</span></a>
<a class="sourceLine" id="cb3-9" title="9"></a>
<a class="sourceLine" id="cb3-10" title="10"><span class="co"> Only journal entries for systemd unit UNITNAME with loglevel</span></a>
<a class="sourceLine" id="cb3-11" title="11"><span class="co"> INFO and above are retrieved.</span></a>
<a class="sourceLine" id="cb3-12" title="12"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-13" title="13"> ...</a>
<a class="sourceLine" id="cb3-14" title="14"></a>
<a class="sourceLine" id="cb3-15" title="15"><span class="kw">def</span> iter_journal_messages_follow(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
<a class="sourceLine" id="cb3-16" title="16"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-17" title="17"><span class="co"> Yield commit and message details from the journal through polling.</span></a>
<a class="sourceLine" id="cb3-18" title="18"></a>
<a class="sourceLine" id="cb3-19" title="19"><span class="co"> This is the polling phase (after we have read pre-existing messages</span></a>
<a class="sourceLine" id="cb3-20" title="20"><span class="co"> in the loading phase).</span></a>
<a class="sourceLine" id="cb3-21" title="21"></a>
<a class="sourceLine" id="cb3-22" title="22"><span class="co"> Argument *timestamp* is a UNIX timestamp.</span></a>
<a class="sourceLine" id="cb3-23" title="23"></a>
<a class="sourceLine" id="cb3-24" title="24"><span class="co"> Only journal entries for systemd unit UNITNAME with loglevel</span></a>
<a class="sourceLine" id="cb3-25" title="25"><span class="co"> INFO and above are retrieved.</span></a>
<a class="sourceLine" id="cb3-26" title="26"></a>
<a class="sourceLine" id="cb3-27" title="27"><span class="co"> *commit* (bool) tells whether it is time to store the delivery</span></a>
<a class="sourceLine" id="cb3-28" title="28"><span class="co"> information obtained from the messages yielded by us.</span></a>
<a class="sourceLine" id="cb3-29" title="29"><span class="co"> It is set to True if max_delay_before_commit has elapsed.</span></a>
<a class="sourceLine" id="cb3-30" title="30"><span class="co"> After this delay delivery information will be written; to be exact:</span></a>
<a class="sourceLine" id="cb3-31" title="31"><span class="co"> the delay may increase by up to one journal_poll_interval.</span></a>
<a class="sourceLine" id="cb3-32" title="32"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-33" title="33"> ...</a>
<a class="sourceLine" id="cb3-34" title="34"></a>
<a class="sourceLine" id="cb3-35" title="35"><span class="kw">def</span> iter_logfile_messages(filepath: <span class="bu">str</span>, year: <span class="bu">int</span>,</a>
<a class="sourceLine" id="cb3-36" title="36"> commit_after_lines<span class="op">=</span>max_messages_per_commit):</a>
<a class="sourceLine" id="cb3-37" title="37"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-38" title="38"><span class="co"> Yield messages and a commit flag from a logfile.</span></a>
<a class="sourceLine" id="cb3-39" title="39"></a>
<a class="sourceLine" id="cb3-40" title="40"><span class="co"> Loop through all lines of the file with given *filepath* and</span></a>
<a class="sourceLine" id="cb3-41" title="41"><span class="co"> extract the time and log message. If the log message starts</span></a>
<a class="sourceLine" id="cb3-42" title="42"><span class="co"> with &#39;postfix/&#39;, then extract the syslog_identifier, pid and</span></a>
<a class="sourceLine" id="cb3-43" title="43"><span class="co"> message text.</span></a>
<a class="sourceLine" id="cb3-44" title="44"></a>
<a class="sourceLine" id="cb3-45" title="45"><span class="co"> Since syslog lines do not contain the year, the *year* to which</span></a>
<a class="sourceLine" id="cb3-46" title="46"><span class="co"> the first log line belongs must be given.</span></a>
<a class="sourceLine" id="cb3-47" title="47"></a>
<a class="sourceLine" id="cb3-48" title="48"><span class="co"> Return a commit flag and a dict with these keys:</span></a>
<a class="sourceLine" id="cb3-49" title="49"><span class="co"> &#39;t&#39;: timestamp</span></a>
<a class="sourceLine" id="cb3-50" title="50"><span class="co"> &#39;message&#39;: message text</span></a>
<a class="sourceLine" id="cb3-51" title="51"><span class="co"> &#39;identifier&#39;: syslog identifier (e.g., &#39;postfix/smtpd&#39;)</span></a>
<a class="sourceLine" id="cb3-52" title="52"><span class="co"> &#39;pid&#39;: process id</span></a>
<a class="sourceLine" id="cb3-53" title="53"></a>
<a class="sourceLine" id="cb3-54" title="54"><span class="co"> The commit flag will be set to True for every</span></a>
<a class="sourceLine" id="cb3-55" title="55"><span class="co"> (commit_after_lines)-th filtered message and serves</span></a>
<a class="sourceLine" id="cb3-56" title="56"><span class="co"> as a signal to the caller to commit this chunk of data</span></a>
<a class="sourceLine" id="cb3-57" title="57"><span class="co"> to the database.</span></a>
<a class="sourceLine" id="cb3-58" title="58"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-59" title="59"> ...</a></code></pre></div>
<h2>Running loops</h2>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">def</span> run(dsn, verp_marker<span class="op">=</span><span class="va">False</span>, filepath<span class="op">=</span><span class="va">None</span>, year<span class="op">=</span><span class="va">None</span>, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb4-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-3" title="3"><span class="co"> Determine loop(s) and run them within a database context.</span></a>
<a class="sourceLine" id="cb4-4" title="4"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-5" title="5"> init(verp_marker<span class="op">=</span>verp_marker)</a>
<a class="sourceLine" id="cb4-6" title="6"> <span class="cf">with</span> psycopg2.<span class="ex">connect</span>(dsn) <span class="im">as</span> conn:</a>
<a class="sourceLine" id="cb4-7" title="7"> <span class="cf">with</span> conn.cursor(cursor_factory<span class="op">=</span>psycopg2.extras.RealDictCursor) <span class="im">as</span> curs:</a>
<a class="sourceLine" id="cb4-8" title="8"> <span class="cf">if</span> filepath:</a>
<a class="sourceLine" id="cb4-9" title="9"> run_loop(iter_logfile_messages(filepath, year), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-10" title="10"> <span class="cf">else</span>:</a>
<a class="sourceLine" id="cb4-11" title="11"> begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
<a class="sourceLine" id="cb4-12" title="12"> run_loop(iter_journal_messages_since(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-13" title="13"> begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
<a class="sourceLine" id="cb4-14" title="14"> run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-15" title="15"></a>
<a class="sourceLine" id="cb4-16" title="16"><span class="kw">def</span> run_loop(iterable, curs, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb4-17" title="17"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-18" title="18"><span class="co"> Loop over log messages obtained from *iterable*.</span></a>
<a class="sourceLine" id="cb4-19" title="19"></a>
<a class="sourceLine" id="cb4-20" title="20"><span class="co"> Parse the message, extract delivery information from it and store</span></a>
<a class="sourceLine" id="cb4-21" title="21"><span class="co"> that delivery information.</span></a>
<a class="sourceLine" id="cb4-22" title="22"></a>
<a class="sourceLine" id="cb4-23" title="23"><span class="co"> For performance reasons delivery items are collected in a cache</span></a>
<a class="sourceLine" id="cb4-24" title="24"><span class="co"> before writing them (i.e., committing a database transaction).</span></a>
<a class="sourceLine" id="cb4-25" title="25"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-26" title="26"> cache <span class="op">=</span> []</a>
<a class="sourceLine" id="cb4-27" title="27"> msg_count <span class="op">=</span> max_messages_per_commit</a>
<a class="sourceLine" id="cb4-28" title="28"> <span class="cf">for</span> commit, msg_details <span class="kw">in</span> iterable:</a>
<a class="sourceLine" id="cb4-29" title="29"> ...</a></code></pre></div>
<h2>Parsing</h2>
<p>Parse what you can. (But only msg_info in Postfix, and only relevant components.)</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">def</span> parse(msg_details, debug<span class="op">=</span><span class="va">False</span>):</a>
<a class="sourceLine" id="cb5-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-3" title="3"><span class="co"> Parse a log message returning a dict.</span></a>
<a class="sourceLine" id="cb5-4" title="4"></a>
<a class="sourceLine" id="cb5-5" title="5"><span class="co"> *msg_details* is assumed to be a dict with these keys:</span></a>
<a class="sourceLine" id="cb5-6" title="6"></a>
<a class="sourceLine" id="cb5-7" title="7"><span class="co"> * &#39;identifier&#39; (syslog identifier),</span></a>
<a class="sourceLine" id="cb5-8" title="8"><span class="co"> * &#39;pid&#39; (process id),</span></a>
<a class="sourceLine" id="cb5-9" title="9"><span class="co"> * &#39;message&#39; (message text)</span></a>
<a class="sourceLine" id="cb5-10" title="10"></a>
<a class="sourceLine" id="cb5-11" title="11"><span class="co"> The syslog identifier and process id are copied to the resulting dict.</span></a>
<a class="sourceLine" id="cb5-12" title="12"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-13" title="13"> ...</a>
<a class="sourceLine" id="cb5-14" title="14"></a>
<a class="sourceLine" id="cb5-15" title="15"><span class="kw">def</span> _parse_branch(comp, msg, res):</a>
<a class="sourceLine" id="cb5-16" title="16"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-17" title="17"><span class="co"> Parse a log message string *msg*, adding results to dict *res*.</span></a>
<a class="sourceLine" id="cb5-18" title="18"></a>
<a class="sourceLine" id="cb5-19" title="19"><span class="co"> Depending on the component *comp* we branch to functions</span></a>
<a class="sourceLine" id="cb5-20" title="20"><span class="co"> named _parse_{comp}.</span></a>
<a class="sourceLine" id="cb5-21" title="21"></a>
<a class="sourceLine" id="cb5-22" title="22"><span class="co"> Add parsing results to dict *res*. Always add key &#39;action&#39;.</span></a>
<a class="sourceLine" id="cb5-23" title="23"><span class="co"> Try to parse every syntactical element.</span></a>
<a class="sourceLine" id="cb5-24" title="24"><span class="co"> Note: We parse what we can. Assessment of parsing results relevant</span></a>
<a class="sourceLine" id="cb5-25" title="25"><span class="co"> for delivery is done in :func:`extract_delivery`.</span></a>
<a class="sourceLine" id="cb5-26" title="26"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-27" title="27"> ...</a></code></pre></div>
<h2>Extracting</h2>
<p>Extract what is relevant.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">def</span> extract_delivery(msg_details, parsed):</a>
<a class="sourceLine" id="cb6-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb6-3" title="3"><span class="co"> Compute delivery information from parsing results.</span></a>
<a class="sourceLine" id="cb6-4" title="4"></a>
<a class="sourceLine" id="cb6-5" title="5"><span class="co"> Basically this means that we map the parsed fields to</span></a>
<a class="sourceLine" id="cb6-6" title="6"><span class="co"> a type (&#39;from&#39; or &#39;to&#39;) and to the database</span></a>
<a class="sourceLine" id="cb6-7" title="7"><span class="co"> fields for table &#39;delivery_from&#39; or &#39;delivery_to&#39;.</span></a>
<a class="sourceLine" id="cb6-8" title="8"></a>
<a class="sourceLine" id="cb6-9" title="9"><span class="co"> We branch to functions _extract_{comp} where comp is the</span></a>
<a class="sourceLine" id="cb6-10" title="10"><span class="co"> name of a Postfix component.</span></a>
<a class="sourceLine" id="cb6-11" title="11"></a>
<a class="sourceLine" id="cb6-12" title="12"><span class="co"> Return a list of error strings and a dict with the</span></a>
<a class="sourceLine" id="cb6-13" title="13"><span class="co"> extracted information. Keys with None values are removed</span></a>
<a class="sourceLine" id="cb6-14" title="14"><span class="co"> from the dict.</span></a>
<a class="sourceLine" id="cb6-15" title="15"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb6-16" title="16"> ...</a></code></pre></div>
<h2>Regular expressions</h2>
<ul>
<li><p>see sources</p></li>
<li><p><a href="https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression">Stackoverflow: How to validate an email address</a> <a href="https://i.stack.imgur.com/YI6KR.png">FSM</a></p></li>
</ul>
<h3>BTW: <a href="https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr">email.utils.parseaddr</a></h3>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb7-1" title="1"><span class="op">&gt;&gt;&gt;</span> <span class="im">from</span> email.utils <span class="im">import</span> parseaddr</a>
<a class="sourceLine" id="cb7-2" title="2"><span class="op">&gt;&gt;&gt;</span> parseaddr(<span class="st">&#39;Ghost &lt;&quot;hello@nowhere&quot;@pyug.at&gt;&#39;</span>)</a>
<a class="sourceLine" id="cb7-3" title="3">(<span class="st">&#39;Ghost&#39;</span>, <span class="st">&#39;&quot;hello@nowhere&quot;@pyug.at&#39;</span>)</a>
<a class="sourceLine" id="cb7-4" title="4"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;more</span><span class="ch">\&quot;</span><span class="st">fun</span><span class="ch">\&quot;\\</span><span class="st">&quot;hello</span><span class="ch">\\</span><span class="st">&quot;@nowhere&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
<a class="sourceLine" id="cb7-5" title="5"><span class="co">&quot;more&quot;</span>fun<span class="st">&quot;</span><span class="ch">\&quot;</span><span class="st">hello</span><span class="ch">\&quot;</span><span class="st">@nowhere&quot;</span><span class="op">@</span>pyug.at</a>
<a class="sourceLine" id="cb7-6" title="6"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
<a class="sourceLine" id="cb7-7" title="7"><span class="co">&quot;&quot;</span><span class="op">@</span>pyug.at</a></code></pre></div>
<h2>Storing</h2>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">def</span> store_deliveries(cursor, cache, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb8-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb8-3" title="3"><span class="co"> Store cached delivery information into the database.</span></a>
<a class="sourceLine" id="cb8-4" title="4"></a>
<a class="sourceLine" id="cb8-5" title="5"><span class="co"> Find queue_ids in *cache* and group delivery items by</span></a>
<a class="sourceLine" id="cb8-6" title="6"><span class="co"> them, but separately for delivery types &#39;from&#39; and &#39;to&#39;.</span></a>
<a class="sourceLine" id="cb8-7" title="7"><span class="co"> In addition, collect delivery items with queue_id is None.</span></a>
<a class="sourceLine" id="cb8-8" title="8"></a>
<a class="sourceLine" id="cb8-9" title="9"><span class="co"> After grouping we merge all items withing a group into a</span></a>
<a class="sourceLine" id="cb8-10" title="10"><span class="co"> single item. So we can combine several SQL queries into </span></a>
<a class="sourceLine" id="cb8-11" title="11"><span class="co"> a single one, which improves performance significantly.</span></a>
<a class="sourceLine" id="cb8-12" title="12"></a>
<a class="sourceLine" id="cb8-13" title="13"><span class="co"> Then store the merged items and the deliveries with</span></a>
<a class="sourceLine" id="cb8-14" title="14"><span class="co"> queue_id is None.</span></a>
<a class="sourceLine" id="cb8-15" title="15"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb8-16" title="16"> ...</a></code></pre></div>
<p>Database schema: 3 tables:</p>
<ul>
<li>delivery_from: smtpd, milters, qmgr</li>
<li>delivery_to: smtp, virtual, bounce, error</li>
<li>noqueue: rejected by smtpd before even getting a queue_id</li>
</ul>
<p>Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.</p>
<h2>Demo</h2>
<pre><code>...
</code></pre>
<h2>Questions / Suggestions</h2>
<ul>
<li>Could you enhance speed by using prepared statements?</li>
<li>Will old data be deleted (as required by GDPR)?</li>
</ul>
<p>Both were implemented after the talk.</p>

View file

@ -0,0 +1,340 @@
# journal-postfix - A log parser for Postfix
Experiences from applying Python to the domain of bad old email.
## Email ✉
* old technology (starting in the 70ies)
* [store-and-forward](https://en.wikipedia.org/wiki/Store_and_forward): sent != delivered to recipient
* non-delivery reasons:
* recipient over quota
* inexistent destination
* malware
* spam
* server problem
* ...
* permanent / non-permanent failure ([DSN ~ 5.X.Y / 4.X.Y](https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml))
* non-delivery modes
* immediate reject on SMTP level
* delayed [bounce messages](https://en.wikipedia.org/wiki/Bounce_message) by [reporting MTA](https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png) - queueing (e.g., ~5d) before delivery failure notification
* discarding
* read receipts
* [Wikipedia: email tracking](https://en.wikipedia.org/wiki/Email_tracking)
## [SMTP](https://en.wikipedia.org/wiki/SMTP)
[SMTP session example](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example):
envelope sender, envelope recipient may differ from From:, To:
Lists of error codes:
* [SMTP and ESMTP](https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list)
* [SMTP](https://serversmtp.com/smtp-error/)
* [SMTP](https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx)
Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)
SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
* email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)
## [Postfix](http://www.postfix.org)
* popular [MTA](https://en.wikipedia.org/wiki/Message_transfer_agent)
* written in C
* logging to files / journald
* example log messages for a (non-)delivery + stats
```
Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 <spameri@tiscali.it>: Relay access denied; from=<spameri@tiscali.it> to=<spameri@tiscali.it> proto=ESMTP helo=<WIN-G7CPHCGK247>
Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=<ibu@cosmopool.net> to=<ibu@multiname.org> proto=ESMTP helo=<mail.cosmopool.net>
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=<d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net>
Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=<ibu@cosmopool.net>, size=1365, nrcpt=2 (queue active)
Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu2@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu2@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg Saved)
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
```
* [involved postfix components](http://www.postfix.org/OVERVIEW.html)
* smtpd (port 25: smtp, port 587: submission)
* cleanup
* smtp/lmtp
* missing log parser
## Idea
* follow log stream and write summarized delivery information to a database
* goal: spot delivery problems, collect delivery stats
* a GUI could then display the current delivery status to users
## Why Python?
* simple and fun language, clear and concise
* well suited for text processing
* libs available for systemd, PostgreSQL
* huge standard library (used here: datetime, re, yaml, argparse, select)
* speed sufficient?
## Development iterations
* hmm, easy task, might take a few days
* PoC: reading and polling from journal works as expected
* used postfix logfiles in syslog format and wrote regexps matching them iteratively
* separated parsing messages from extracting delivery information
* created a delivery table
* hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)
* introduced polling timeout and SQL transactions handling several messages at once
* ... much faster
* looks fine, but wait... did I catch all syntax variants of Postfix log messages?
* looked into Postfix sources and almost got lost
* weeks of hard work identifying relevant log output directives
* completely rewrote parser to deal with the rich log msg syntax, e.g.:<br>
`def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)`
* oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones
* mails may have multiple recipients... split delivery table into delivery_from and delivery_to
* decide which delivery information is relevant
* cleanup and polish (config mgmt, logging)
* write ansible role
## Structure
```blockdiag
blockdiag {
default_fontsize = 20;
node_height = 80;
journal_since -> run_loop;
journal_follow -> run_loop;
logfile -> run_loop;
run_loop -> parse -> extract_delivery -> store;
store -> delivery_from;
store -> delivery_to;
store -> noqueue;
group { label="input iterables"; journal_since; journal_follow; logfile; };
group { label="output tables"; delivery_from; delivery_to; noqueue; };
}
```
## Iterables
```python
def iter_journal_messages_since(timestamp: Union[int, float]):
"""
Yield False and message details from the journal since *timestamp*.
This is the loading phase (loading messages that already existed
when we start).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit UNITNAME with loglevel
INFO and above are retrieved.
"""
...
def iter_journal_messages_follow(timestamp: Union[int, float]):
"""
Yield commit and message details from the journal through polling.
This is the polling phase (after we have read pre-existing messages
in the loading phase).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit UNITNAME with loglevel
INFO and above are retrieved.
*commit* (bool) tells whether it is time to store the delivery
information obtained from the messages yielded by us.
It is set to True if max_delay_before_commit has elapsed.
After this delay delivery information will be written; to be exact:
the delay may increase by up to one journal_poll_interval.
"""
...
def iter_logfile_messages(filepath: str, year: int,
commit_after_lines=max_messages_per_commit):
"""
Yield messages and a commit flag from a logfile.
Loop through all lines of the file with given *filepath* and
extract the time and log message. If the log message starts
with 'postfix/', then extract the syslog_identifier, pid and
message text.
Since syslog lines do not contain the year, the *year* to which
the first log line belongs must be given.
Return a commit flag and a dict with these keys:
't': timestamp
'message': message text
'identifier': syslog identifier (e.g., 'postfix/smtpd')
'pid': process id
The commit flag will be set to True for every
(commit_after_lines)-th filtered message and serves
as a signal to the caller to commit this chunk of data
to the database.
"""
...
```
## Running loops
```python
def run(dsn, verp_marker=False, filepath=None, year=None, debug=[]):
"""
Determine loop(s) and run them within a database context.
"""
init(verp_marker=verp_marker)
with psycopg2.connect(dsn) as conn:
with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as curs:
if filepath:
run_loop(iter_logfile_messages(filepath, year), curs, debug=debug)
else:
begin_timestamp = get_latest_timestamp(curs)
run_loop(iter_journal_messages_since(begin_timestamp), curs, debug=debug)
begin_timestamp = get_latest_timestamp(curs)
run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug=debug)
def run_loop(iterable, curs, debug=[]):
"""
Loop over log messages obtained from *iterable*.
Parse the message, extract delivery information from it and store
that delivery information.
For performance reasons delivery items are collected in a cache
before writing them (i.e., committing a database transaction).
"""
cache = []
msg_count = max_messages_per_commit
for commit, msg_details in iterable:
...
```
## Parsing
Parse what you can. (But only msg_info in Postfix, and only relevant components.)
```python
def parse(msg_details, debug=False):
"""
Parse a log message returning a dict.
*msg_details* is assumed to be a dict with these keys:
* 'identifier' (syslog identifier),
* 'pid' (process id),
* 'message' (message text)
The syslog identifier and process id are copied to the resulting dict.
"""
...
def _parse_branch(comp, msg, res):
"""
Parse a log message string *msg*, adding results to dict *res*.
Depending on the component *comp* we branch to functions
named _parse_{comp}.
Add parsing results to dict *res*. Always add key 'action'.
Try to parse every syntactical element.
Note: We parse what we can. Assessment of parsing results relevant
for delivery is done in :func:`extract_delivery`.
"""
...
```
## Extracting
Extract what is relevant.
```python
def extract_delivery(msg_details, parsed):
"""
Compute delivery information from parsing results.
Basically this means that we map the parsed fields to
a type ('from' or 'to') and to the database
fields for table 'delivery_from' or 'delivery_to'.
We branch to functions _extract_{comp} where comp is the
name of a Postfix component.
Return a list of error strings and a dict with the
extracted information. Keys with None values are removed
from the dict.
"""
...
```
## Regular expressions
* see sources
* [Stackoverflow: How to validate an email address](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) [FSM](https://i.stack.imgur.com/YI6KR.png)
### BTW: [email.utils.parseaddr](https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr)
```python
>>> from email.utils import parseaddr
>>> parseaddr('Ghost <"hello@nowhere"@pyug.at>')
('Ghost', '"hello@nowhere"@pyug.at')
>>> print(parseaddr('"more\"fun\"\\"hello\\"@nowhere"@pyug.at')[1])
"more"fun"\"hello\"@nowhere"@pyug.at
>>> print(parseaddr('""@pyug.at')[1])
""@pyug.at
```
## Storing
```python
def store_deliveries(cursor, cache, debug=[]):
"""
Store cached delivery information into the database.
Find queue_ids in *cache* and group delivery items by
them, but separately for delivery types 'from' and 'to'.
In addition, collect delivery items with queue_id is None.
After grouping we merge all items withing a group into a
single item. So we can combine several SQL queries into
a single one, which improves performance significantly.
Then store the merged items and the deliveries with
queue_id is None.
"""
...
```
Database schema: 3 tables:
* delivery_from: smtpd, milters, qmgr
* delivery_to: smtp, virtual, bounce, error
* noqueue: rejected by smtpd before even getting a queue_id
Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.
## Demo
...
## Questions / Suggestions
* Could you enhance speed by using prepared statements?
* Will old data be deleted (as required by GDPR)?
Both were implemented after the talk.

34
journal-postfix.yml Normal file
View file

@ -0,0 +1,34 @@
# Deploy journal-postfix
# This will install a service that writes mail delivery information
# obtained from systemd-journal (unit postfix@-.service) to a
# PostgreSQL database.
#
# You can configure the database connection parameters (and optionally
# a verp_marker) as host vars like this:
#
# mailserver:
# postgresql:
# host: 127.0.0.1
# port: 5432
# dbname: mailserver
# username: mailserver
# password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# postfix:
# verp_marker: rstxyz
#
# If you do not, then you must edit /etc/journal-postfix/main.yml
# on the destination hosts and run systemctl start journal-postfix
# manually.
- name: install journal-postfix
user: root
hosts: mail
roles:
- journal-postfix

View file

@ -0,0 +1,17 @@
# this file is part of ansible role journal-postfix
[Unit]
Description=Extract postfix message delivery information from systemd journal messages\
and store them in a PostgreSQL database. Configuration is in /etc/journal-postfix/main.yml
After=multi-user.target
[Service]
Type=simple
ExecStart=/srv/journal-postfix/run.py
User=journal-postfix
WorkingDirectory=/srv/journal-postfix/
Restart=on-failure
RestartPreventExitStatus=97
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,85 @@
Parse postfix entries in systemd journal and collect delivery information.
The information on mail deliveries is written to tables in a PostgreSQL
database. The database can then be queried by a UI showing delivery status
to end users. The UI is not part of this package.
This software is tailor-made for debian buster with systemd as init system.
It is meant to run on the same system on which Postfix is running,
or on a system receiving the log stream of a Postfix instance in its
systemd journal.
Prerequisites / Postfix configuration:
- Access to a PostgreSQL database.
- Postfix: Only virtual mailboxes are supported.
- Postfix: You can use short or long queue_ids (see
http://www.postfix.org/postconf.5.html#enable_long_queue_ids),
but since the uniqueness of short queue_ids is very limited,
usage of long queue_ids is *strongly recommended*.
Installation:
- apt install python3-psycopg2 python3-systemd python3-yaml
- Edit /etc/journal-postfix/main.yml
- Output is written to the journal (unit journal-postfix). READ IT!
Side effects (database):
- The configured database user will create the tables
- delivery_from
- delivery_to
- noqueue
in the configured database, if they do not yet exist.
These tables will be filled with results from parsing the journal.
Table noqueue contains deliveries rejected by smtpd before they
got a queue_id. Deliveries with queue_id are in tables delivery_from
and delivery_to, which are separate, because an email can have only
one sender, but more than one recipient. Entries in both tables are
related through the queue_id and the approximate date; note that
short queue_ids are not unique for a delivery transaction, so
consider changing your Postfix configuration to long queue_ids.
- Log output is written to journald, unit journal-postfix.
Configuration:
- Edit the config file in YAML format located at
/etc/journal-postfix/main.conf
Limitations:
- The log output of Postfix may contain messages not primarily relevant
for delivery, namely messages of levels panic, fatal, error, warning.
They are discarded.
- The postfix server must be configured to use virtual mailboxes;
deliveries to local mailboxes are ignored.
- Parsing is specific to a Postfix version and only version 3.4.5
(the version in Debian buster) is supported; it is intended to support
Postfix versions in future stable Debian releases.
- This script does not support concurrency; we assume that there is only
one process writing to the database tables. Thus clustered postfix
setups are not supported.
Options:
- If you use dovecot as lmtpd, you will also get dovecot_ids upon
successful delivery.
- If you have configured Postfix to store VERP-ids of outgoing mails
in table 'mail_from' in the same database, then bounce emails can
be associated with original emails. The VERP-ids must have a certain
format.
- The subject of emails will be extracted from log messages starting
with "info: header Subject:". To enable these messages configure
Postfix like this: Enabled header_checks in main.cf (
header_checks = regexp:/etc/postfix/header_checks
) and put this line into /etc/postfix/header_checks:
/^Subject:/ INFO
- You can also import log messages from a log file in syslog format:
Run this script directly from command line with options --file
(the path to the file to be parsed) and --year (the year of the
first message in this log file).
Note: For the name of the month to be recognized correctly, the
script must be run with this locale.
Attention: When running from the command line, log output will
not be sent to unit journal-postfix; use this command instead:
journalctl --follow SYSLOG_IDENTIFIER=python3

File diff suppressed because it is too large Load diff

212
journal-postfix/files/srv/run.py Executable file
View file

@ -0,0 +1,212 @@
#!/usr/bin/env python3
"""
Main script to be run as a systemd unit or manually.
"""
import argparse
import datetime
import os
import sys
from pprint import pprint
from typing import Iterable, List, Optional, Tuple, Union
import psycopg2
import psycopg2.extras
from systemd import journal
import settings
from parser import init_parser, parse_entry, extract_delivery
from sources import (
iter_journal_messages_since,
iter_journal_messages_follow,
iter_logfile_messages,
)
from storage import (
init_db,
init_session,
get_latest_timestamp,
delete_old_deliveries,
store_delivery_items,
)
exit_code_without_restart = 97
def run(
dsn: str,
verp_marker: Optional[str] = None,
filepath: Optional[str] = None,
year: Optional[int] = None,
debug: List[str] = [],
) -> None:
"""
Determine loop(s) and run them within a database context.
"""
init_parser(verp_marker=verp_marker)
with psycopg2.connect(dsn) as conn:
with conn.cursor(
cursor_factory=psycopg2.extras.RealDictCursor
) as curs:
init_session(curs)
if filepath and year:
run_loop(
iter_logfile_messages(filepath, year), curs, debug=debug
)
else:
begin_timestamp = get_latest_timestamp(curs)
run_loop(
iter_journal_messages_since(begin_timestamp),
curs,
debug=debug,
)
begin_timestamp = get_latest_timestamp(curs)
run_loop(
iter_journal_messages_follow(begin_timestamp),
curs,
debug=debug,
)
def run_loop(
iterable: Iterable[Tuple[bool, Optional[dict]]],
curs: psycopg2.extras.RealDictCursor,
debug: List[str] = []
) -> None:
"""
Loop over log entries obtained from *iterable*.
Parse the message, extract delivery information from it and store
that delivery information.
For performance reasons delivery items are collected in a cache
before writing them (i.e., committing a database transaction).
"""
cache = []
msg_count = settings.max_messages_per_commit
last_delete = None
for commit, msg_details in iterable:
parsed_entry = None
if msg_details:
parsed_entry = parse_entry(msg_details)
if 'all' in debug or (
parsed_entry and parsed_entry.get('comp') in debug
):
print('_' * 80)
print('MSG_DETAILS:', msg_details)
print('PARSED_ENTRY:', parsed_entry)
if parsed_entry:
errors, delivery = extract_delivery(msg_details, parsed_entry)
if not errors and delivery:
if 'all' in debug or parsed_entry.get('comp') in debug:
print('DELIVERY:')
pprint(delivery)
# it may happen that a delivery of type 'from' has
# a recipient; in this case add a second delivery
# of type 'to' to the cache, but only for deliveries
# with queue_id
if (
delivery['type'] == 'from'
and 'recipient' in delivery
and delivery.get('queue_id')
):
delivery2 = delivery.copy()
delivery2['type'] = 'to'
cache.append(delivery2)
del delivery['recipient']
cache.append(delivery)
msg_count -= 1
if msg_count == 0:
commit = True
elif errors:
msg = (
f'Extracting delivery from parsed entry failed: '
f'errors={errors}; msg_details={msg_details}; '
f'parsed_entry={parsed_entry}'
)
journal.send(msg, PRIORITY=journal.LOG_CRIT)
if 'all' in debug or parsed_entry.get('comp') in debug:
print('EXTRACTION ERRORS:', errors)
if commit:
if 'all' in debug:
print('.' * 40, 'committing')
# store cache, clear cache, reset message counter
store_delivery_items(curs, cache, debug=debug)
cache = []
msg_count = settings.max_messages_per_commit
now = datetime.datetime.utcnow()
if last_delete is None or last_delete < now - settings.delete_interval:
delete_old_deliveries(curs)
last_delete = now
if 'all' in debug:
print('.' * 40, 'deleting old deliveries')
else:
store_delivery_items(curs, cache, debug=debug)
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument(
'--debug',
help='Comma-separated list of components to be debugged; '
'valid component names are the Postfix components '
'plus "sql" plus "all".',
)
parser.add_argument(
'--file',
help='File path of a Postfix logfile in syslog '
'format to be parsed instead of the journal',
)
parser.add_argument(
'--year',
help='If --file is given, we need to know '
'the year of the first line in the logfile',
)
args = parser.parse_args()
config = settings.get_config()
if config:
# check if startup is enabled or fail
msg = None
if 'startup' not in config:
msg = 'Parameter "startup" is not configured.'
elif not config['startup']:
msg = 'Startup is not enabled in the config file.'
if msg:
journal.send(msg, PRIORITY=journal.LOG_CRIT)
sys.exit(exit_code_without_restart)
# check more params and call run
try:
verp_marker = config['postfix']['verp_marker']
except Exception:
verp_marker = None
debug: List[str] = []
if args.debug:
debug = args.debug.split(',')
filepath = None
year = None
if args.file:
filepath = args.file
if not args.year:
print(
'If --file is given, we need to know the year'
' of the first line in the logfile. Please use --year.'
)
sys.exit(1)
else:
year = int(args.year)
dsn = init_db(config)
if dsn:
run(
dsn,
verp_marker=verp_marker,
filepath=filepath,
year=year,
debug=debug,
)
else:
print('Config invalid, see journal.')
sys.exit(exit_code_without_restart)
if __name__ == '__main__':
main()

View file

@ -0,0 +1,125 @@
#!/usr/bin/env python3
"""
Settings for journal-postfix.
"""
import os
import datetime
from typing import Union, Optional
from systemd import journal
from yaml import load
main_config_file: str = '/etc/journal-postfix/main.yml'
"""
Filepath to the main config file.
Can be overriden by environment variable JOURNAL_POSTFIX_MAIN_CONF.
"""
systemd_unitname: str = 'postfix@-.service'
"""
Name of the systemd unit running the postfix service.
"""
journal_poll_interval: Union[float, int] = 10.0
"""
Poll timeout in seconds for fetching messages from the journal.
Will be overriden if set in the main config.
If the poll times out, it is checked whether the last commit
lies more than max_delay_before_commit seconds in the past;
if so, the current database transaction will be committed.
"""
max_delay_before_commit: datetime.timedelta = datetime.timedelta(seconds=30)
"""
How much time may pass before committing a database transaction?
Will be overriden if set in the main config.
(The actual maximal delay can be one journal_poll_interval in addition.)
"""
max_messages_per_commit: int = 1000
"""
How many messages to cache at most before committing a database transaction?
Will be overriden if set in the main config.
"""
delete_deliveries_after_days: int = 0
"""
After how many days shall deliveries be deleted from the database?
A value of 0 means that data are never deleted.
"""
def get_config() -> Optional[dict]:
"""
Load config from the main config and return it.
The default main config file path (global main_config_file)
can be overriden with environment variable
JOURNAL_POSTFIX_MAIN_CONF.
"""
try:
filename = os.environ['JOURNAL_POSTFIX_MAIN_CONF']
global main_config_file
main_config_file = filename
except Exception:
filename = main_config_file
try:
with open(filename, 'r') as config_file:
config_raw = config_file.read()
except Exception:
msg = f'ERROR: cannot read config file {filename}'
journal.send(msg, PRIORITY=journal.LOG_CRIT)
return None
try:
config = load(config_raw)
except Exception as err:
msg = f'ERROR: invalid yaml syntax in {filename}: {err}'
journal.send(msg, PRIORITY=journal.LOG_CRIT)
return None
# override some global variables
_global_value_from_config(config['postfix'], 'systemd_unitname', str)
_global_value_from_config(config, 'journal_poll_interval', float)
_global_value_from_config(config, 'max_delay_before_commit', 'seconds')
_global_value_from_config(config, 'max_messages_per_commit', int)
_global_value_from_config(config, 'delete_deliveries_after_days', int)
_global_value_from_config(config, 'delete_interval', 'seconds')
return config
def _global_value_from_config(
config, name: str, type_: Union[type, str]
) -> None:
"""
Set a global variable to the value obtained from *config*.
Also cast to *type_*.
"""
try:
value = config.get(name)
if type_ == 'seconds':
value = datetime.timedelta(seconds=float(value))
else:
value = type_(value) # type: ignore
globals()[name] = value
except Exception:
if value is not None:
msg = f'ERROR: configured value of {name} is invalid.'
journal.send(msg, PRIORITY=journal.LOG_ERR)
if __name__ == '__main__':
print(get_config())

View file

@ -0,0 +1,5 @@
[pycodestyle]
max-line-length = 200
[mypy]
ignore_missing_imports = True

View file

@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
Data sources.
Note: python-systemd journal docs are at
https://www.freedesktop.org/software/systemd/python-systemd/journal.html
"""
import datetime
import select
from typing import Iterable, Optional, Tuple, Union
from systemd import journal
import settings
def iter_journal_messages_since(
timestamp: Union[int, float]
) -> Iterable[Tuple[bool, dict]]:
"""
Yield False and message details from the journal since *timestamp*.
This is the loading phase (loading messages that already existed
when we start).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit settings.systemd_unitname with
loglevel INFO and above are retrieved.
"""
timestamp = float(timestamp)
sdj = journal.Reader()
sdj.log_level(journal.LOG_INFO)
sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
sdj.seek_realtime(timestamp)
for entry in sdj:
yield False, _get_msg_details(entry)
def iter_journal_messages_follow(
timestamp: Union[int, float]
) -> Iterable[Tuple[bool, Optional[dict]]]:
"""
Yield commit and message details from the journal through polling.
This is the polling phase (after we have read pre-existing messages
in the loading phase).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit settings.systemd_unitname with
loglevel INFO and above are retrieved.
*commit* (bool) tells whether it is time to store the delivery
information obtained from the messages yielded by us.
It is set to True if settings.max_delay_before_commit has elapsed.
After this delay delivery information will be written; to be exact:
the delay may increase by up to one settings.journal_poll_interval.
"""
sdj = journal.Reader()
sdj.log_level(journal.LOG_INFO)
sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
sdj.seek_realtime(timestamp)
p = select.poll()
p.register(sdj, sdj.get_events())
last_commit = datetime.datetime.utcnow()
interval_ms = settings.journal_poll_interval * 1000
while True:
p.poll(interval_ms)
commit = False
now = datetime.datetime.utcnow()
if last_commit + settings.max_delay_before_commit < now:
commit = True
last_commit = now
if sdj.process() == journal.APPEND:
for entry in sdj:
yield commit, _get_msg_details(entry)
elif commit:
yield commit, None
def iter_logfile_messages(
filepath: str,
year: int,
commit_after_lines=settings.max_messages_per_commit,
) -> Iterable[Tuple[bool, dict]]:
"""
Yield messages and a commit flag from a logfile.
Loop through all lines of the file with given *filepath* and
extract the time and log message. If the log message starts
with 'postfix/', then extract the syslog_identifier, pid and
message text.
Since syslog lines do not contain the year, the *year* to which
the first log line belongs must be given.
Return a commit flag and a dict with these keys:
't': timestamp
'message': message text
'identifier': syslog identifier (e.g., 'postfix/smtpd')
'pid': process id
The commit flag will be set to True for every
(commit_after_lines)-th filtered message and serves
as a signal to the caller to commit this chunk of data
to the database.
"""
dt = None
with open(filepath, 'r') as fh:
cnt = 0