add ansible role journal-postfix (a log parser for Postfix) with playbook and doc

This commit is contained in:
iburadempa 2019-12-15 19:10:28 +01:00
parent 713372c850
commit e5a8025064
14 changed files with 3570 additions and 0 deletions

View file

@ -0,0 +1,378 @@
<h1>journal-postfix - A log parser for Postfix</h1>
<p>Experiences from applying Python to the domain of bad old email.</p>
<h2>Email ✉</h2>
<ul>
<li>old technology (starting in the 70ies)</li>
<li><a href="https://en.wikipedia.org/wiki/Store_and_forward">store-and-forward</a>: sent != delivered to recipient</li>
<li>non-delivery reasons:
<ul>
<li>recipient over quota</li>
<li>inexistent destination</li>
<li>malware</li>
<li>spam</li>
<li>server problem</li>
<li>...</li>
</ul></li>
<li>permanent / non-permanent failure (<a href="https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml">DSN ~ 5.X.Y / 4.X.Y</a>)</li>
<li>non-delivery modes
<ul>
<li>immediate reject on SMTP level</li>
<li>delayed <a href="https://en.wikipedia.org/wiki/Bounce_message">bounce messages</a> by <a href="https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png">reporting MTA</a> - queueing (e.g., ~5d) before delivery failure notification</li>
<li>discarding</li>
</ul></li>
<li>read receipts</li>
<li><a href="https://en.wikipedia.org/wiki/Email_tracking">Wikipedia: email tracking</a></li>
</ul>
<h2><a href="https://en.wikipedia.org/wiki/SMTP">SMTP</a></h2>
<p><a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example">SMTP session example</a>: envelope sender, envelope recipient may differ from From:, To:</p>
<p>Lists of error codes:</p>
<ul>
<li><a href="https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list">SMTP and ESMTP</a></li>
<li><a href="https://serversmtp.com/smtp-error/">SMTP</a></li>
<li><a href="https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx">SMTP</a></li>
</ul>
<p>Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)</p>
<pre><code>SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
</code></pre>
<ul>
<li>email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)</li>
</ul>
<h2><a href="http://www.postfix.org">Postfix</a></h2>
<ul>
<li>popular <a href="https://en.wikipedia.org/wiki/Message_transfer_agent">MTA</a></li>
<li>written in C</li>
<li>logging to files / journald</li>
<li>example log messages for a (non-)delivery + stats</li>
</ul>
<pre><code>Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 &lt;spameri@tiscali.it&gt;: Relay access denied; from=&lt;spameri@tiscali.it&gt; to=&lt;spameri@tiscali.it&gt; proto=ESMTP helo=&lt;WIN-G7CPHCGK247&gt;
Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=&lt;ibu@cosmopool.net&gt; to=&lt;ibu@multiname.org&gt; proto=ESMTP helo=&lt;mail.cosmopool.net&gt;
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=&lt;d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net&gt;
Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=&lt;ibu@cosmopool.net&gt;, size=1365, nrcpt=2 (queue active)
Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu2@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu2@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg Saved)
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
</code></pre>
<ul>
<li><a href="http://www.postfix.org/OVERVIEW.html">involved postfix components</a>
<ul>
<li>smtpd (port 25: smtp, port 587: submission)</li>
<li>cleanup</li>
<li>smtp/lmtp</li>
</ul></li>
<li>missing log parser</li>
</ul>
<h2>Idea</h2>
<ul>
<li>follow log stream and write summarized delivery information to a database</li>
<li>goal: spot delivery problems, collect delivery stats</li>
<li>a GUI could then display the current delivery status to users</li>
</ul>
<h2>Why Python?</h2>
<ul>
<li>simple and fun language, clear and concise</li>
<li>well suited for text processing</li>
<li>libs available for systemd, PostgreSQL</li>
<li>huge standard library (used here: datetime, re, yaml, argparse, select)</li>
<li>speed sufficient?</li>
</ul>
<h2>Development iterations</h2>
<ul>
<li>hmm, easy task, might take a few days</li>
<li>PoC: reading and polling from journal works as expected</li>
<li>used postfix logfiles in syslog format and wrote regexps matching them iteratively</li>
<li>separated parsing messages from extracting delivery information</li>
<li>created a delivery table</li>
<li>hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)</li>
<li>introduced polling timeout and SQL transactions handling several messages at once</li>
<li>... much faster</li>
<li>looks fine, but wait... did I catch all syntax variants of Postfix log messages?</li>
<li>looked into Postfix sources and almost got lost</li>
<li>weeks of hard work identifying relevant log output directives</li>
<li>completely rewrote parser to deal with the rich log msg syntax, e.g.:<br> <code>def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)</code></li>
<li>oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones</li>
<li>mails may have multiple recipients... split delivery table into delivery_from and delivery_to</li>
<li>decide which delivery information is relevant</li>
<li>cleanup and polish (config mgmt, logging)</li>
<li>write ansible role</li>
</ul>
<h2>Structure</h2>
<svg viewBox="0 0 1216 400" xmlns="http://www.w3.org/2000/svg" xmlns:inkspace="http://www.inkscape.org/namespaces/inkscape" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs id="defs_block">
<filter height="1.504" id="filter_blur" inkspace:collect="always" width="1.1575" x="-0.07875" y="-0.252">
<feGaussianBlur id="feGaussianBlur3780" inkspace:collect="always" stdDeviation="4.2" />
</filter>
</defs>
<title>blockdiag</title>
<desc>blockdiag {
default_fontsize = 20;
node_height = 80;
journal_since -&gt; run_loop;
journal_follow -&gt; run_loop;
logfile -&gt; run_loop;
run_loop -&gt; parse -&gt; extract_delivery -&gt; store;
store -&gt; delivery_from;
store -&gt; delivery_to;
store -&gt; noqueue;
group { label="input iterables"; journal_since; journal_follow; logfile; };
group { label="output tables"; delivery_from; delivery_to; noqueue; };
}
</desc>
<rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="56" y="30" />
<rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="1016" y="30" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="259" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="166" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="286" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="451" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="643" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="835" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="46" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="166" />
<rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="286" />
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="256" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="87" x="320.5" y="90">run_loop</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="79">journal_sin</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="128.5" y="101">ce</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="160" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="199">journal_fol</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="32" x="128.0" y="221">low</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="280" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="128.0" y="330">logfile</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="448" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="512.0" y="90">parse</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="640" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="704.0" y="79">extract_del</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="704.0" y="101">ivery</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="832" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="896.0" y="90">store</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="40" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="79">delivery_fr</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="1088.5" y="101">om</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="160" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="210">delivery_to</text>
<rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="280" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="1088.0" y="330">noqueue</text>
<path d="M 384 80 L 440 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="447,80 440,76 440,84 447,80" stroke="rgb(0,0,0)" />
<path d="M 576 80 L 632 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="639,80 632,76 632,84 639,80" stroke="rgb(0,0,0)" />
<path d="M 768 80 L 824 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="831,80 824,76 824,84 831,80" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 1016 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,80 1016,76 1016,84 1023,80" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 80 L 992 200" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 200 L 1016 200" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,200 1016,196 1016,204 1023,200" stroke="rgb(0,0,0)" />
<path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 80 L 992 320" fill="none" stroke="rgb(0,0,0)" />
<path d="M 992 320 L 1016 320" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="1023,320 1016,316 1016,324 1023,320" stroke="rgb(0,0,0)" />
<path d="M 192 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<path d="M 192 200 L 240 200" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 200 L 240 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<path d="M 192 320 L 240 320" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 320 L 240 80" fill="none" stroke="rgb(0,0,0)" />
<path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
<polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="122" x="128.0" y="38">input iter ...</text>
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="113" x="1088.5" y="38">output tables</text>
</svg>
<h2>Iterables</h2>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">def</span> iter_journal_messages_since(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
<a class="sourceLine" id="cb3-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-3" title="3"><span class="co"> Yield False and message details from the journal since *timestamp*.</span></a>
<a class="sourceLine" id="cb3-4" title="4"></a>
<a class="sourceLine" id="cb3-5" title="5"><span class="co"> This is the loading phase (loading messages that already existed</span></a>
<a class="sourceLine" id="cb3-6" title="6"><span class="co"> when we start).</span></a>
<a class="sourceLine" id="cb3-7" title="7"></a>
<a class="sourceLine" id="cb3-8" title="8"><span class="co"> Argument *timestamp* is a UNIX timestamp.</span></a>
<a class="sourceLine" id="cb3-9" title="9"></a>
<a class="sourceLine" id="cb3-10" title="10"><span class="co"> Only journal entries for systemd unit UNITNAME with loglevel</span></a>
<a class="sourceLine" id="cb3-11" title="11"><span class="co"> INFO and above are retrieved.</span></a>
<a class="sourceLine" id="cb3-12" title="12"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-13" title="13"> ...</a>
<a class="sourceLine" id="cb3-14" title="14"></a>
<a class="sourceLine" id="cb3-15" title="15"><span class="kw">def</span> iter_journal_messages_follow(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
<a class="sourceLine" id="cb3-16" title="16"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-17" title="17"><span class="co"> Yield commit and message details from the journal through polling.</span></a>
<a class="sourceLine" id="cb3-18" title="18"></a>
<a class="sourceLine" id="cb3-19" title="19"><span class="co"> This is the polling phase (after we have read pre-existing messages</span></a>
<a class="sourceLine" id="cb3-20" title="20"><span class="co"> in the loading phase).</span></a>
<a class="sourceLine" id="cb3-21" title="21"></a>
<a class="sourceLine" id="cb3-22" title="22"><span class="co"> Argument *timestamp* is a UNIX timestamp.</span></a>
<a class="sourceLine" id="cb3-23" title="23"></a>
<a class="sourceLine" id="cb3-24" title="24"><span class="co"> Only journal entries for systemd unit UNITNAME with loglevel</span></a>
<a class="sourceLine" id="cb3-25" title="25"><span class="co"> INFO and above are retrieved.</span></a>
<a class="sourceLine" id="cb3-26" title="26"></a>
<a class="sourceLine" id="cb3-27" title="27"><span class="co"> *commit* (bool) tells whether it is time to store the delivery</span></a>
<a class="sourceLine" id="cb3-28" title="28"><span class="co"> information obtained from the messages yielded by us.</span></a>
<a class="sourceLine" id="cb3-29" title="29"><span class="co"> It is set to True if max_delay_before_commit has elapsed.</span></a>
<a class="sourceLine" id="cb3-30" title="30"><span class="co"> After this delay delivery information will be written; to be exact:</span></a>
<a class="sourceLine" id="cb3-31" title="31"><span class="co"> the delay may increase by up to one journal_poll_interval.</span></a>
<a class="sourceLine" id="cb3-32" title="32"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-33" title="33"> ...</a>
<a class="sourceLine" id="cb3-34" title="34"></a>
<a class="sourceLine" id="cb3-35" title="35"><span class="kw">def</span> iter_logfile_messages(filepath: <span class="bu">str</span>, year: <span class="bu">int</span>,</a>
<a class="sourceLine" id="cb3-36" title="36"> commit_after_lines<span class="op">=</span>max_messages_per_commit):</a>
<a class="sourceLine" id="cb3-37" title="37"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-38" title="38"><span class="co"> Yield messages and a commit flag from a logfile.</span></a>
<a class="sourceLine" id="cb3-39" title="39"></a>
<a class="sourceLine" id="cb3-40" title="40"><span class="co"> Loop through all lines of the file with given *filepath* and</span></a>
<a class="sourceLine" id="cb3-41" title="41"><span class="co"> extract the time and log message. If the log message starts</span></a>
<a class="sourceLine" id="cb3-42" title="42"><span class="co"> with &#39;postfix/&#39;, then extract the syslog_identifier, pid and</span></a>
<a class="sourceLine" id="cb3-43" title="43"><span class="co"> message text.</span></a>
<a class="sourceLine" id="cb3-44" title="44"></a>
<a class="sourceLine" id="cb3-45" title="45"><span class="co"> Since syslog lines do not contain the year, the *year* to which</span></a>
<a class="sourceLine" id="cb3-46" title="46"><span class="co"> the first log line belongs must be given.</span></a>
<a class="sourceLine" id="cb3-47" title="47"></a>
<a class="sourceLine" id="cb3-48" title="48"><span class="co"> Return a commit flag and a dict with these keys:</span></a>
<a class="sourceLine" id="cb3-49" title="49"><span class="co"> &#39;t&#39;: timestamp</span></a>
<a class="sourceLine" id="cb3-50" title="50"><span class="co"> &#39;message&#39;: message text</span></a>
<a class="sourceLine" id="cb3-51" title="51"><span class="co"> &#39;identifier&#39;: syslog identifier (e.g., &#39;postfix/smtpd&#39;)</span></a>
<a class="sourceLine" id="cb3-52" title="52"><span class="co"> &#39;pid&#39;: process id</span></a>
<a class="sourceLine" id="cb3-53" title="53"></a>
<a class="sourceLine" id="cb3-54" title="54"><span class="co"> The commit flag will be set to True for every</span></a>
<a class="sourceLine" id="cb3-55" title="55"><span class="co"> (commit_after_lines)-th filtered message and serves</span></a>
<a class="sourceLine" id="cb3-56" title="56"><span class="co"> as a signal to the caller to commit this chunk of data</span></a>
<a class="sourceLine" id="cb3-57" title="57"><span class="co"> to the database.</span></a>
<a class="sourceLine" id="cb3-58" title="58"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb3-59" title="59"> ...</a></code></pre></div>
<h2>Running loops</h2>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">def</span> run(dsn, verp_marker<span class="op">=</span><span class="va">False</span>, filepath<span class="op">=</span><span class="va">None</span>, year<span class="op">=</span><span class="va">None</span>, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb4-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-3" title="3"><span class="co"> Determine loop(s) and run them within a database context.</span></a>
<a class="sourceLine" id="cb4-4" title="4"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-5" title="5"> init(verp_marker<span class="op">=</span>verp_marker)</a>
<a class="sourceLine" id="cb4-6" title="6"> <span class="cf">with</span> psycopg2.<span class="ex">connect</span>(dsn) <span class="im">as</span> conn:</a>
<a class="sourceLine" id="cb4-7" title="7"> <span class="cf">with</span> conn.cursor(cursor_factory<span class="op">=</span>psycopg2.extras.RealDictCursor) <span class="im">as</span> curs:</a>
<a class="sourceLine" id="cb4-8" title="8"> <span class="cf">if</span> filepath:</a>
<a class="sourceLine" id="cb4-9" title="9"> run_loop(iter_logfile_messages(filepath, year), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-10" title="10"> <span class="cf">else</span>:</a>
<a class="sourceLine" id="cb4-11" title="11"> begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
<a class="sourceLine" id="cb4-12" title="12"> run_loop(iter_journal_messages_since(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-13" title="13"> begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
<a class="sourceLine" id="cb4-14" title="14"> run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
<a class="sourceLine" id="cb4-15" title="15"></a>
<a class="sourceLine" id="cb4-16" title="16"><span class="kw">def</span> run_loop(iterable, curs, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb4-17" title="17"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-18" title="18"><span class="co"> Loop over log messages obtained from *iterable*.</span></a>
<a class="sourceLine" id="cb4-19" title="19"></a>
<a class="sourceLine" id="cb4-20" title="20"><span class="co"> Parse the message, extract delivery information from it and store</span></a>
<a class="sourceLine" id="cb4-21" title="21"><span class="co"> that delivery information.</span></a>
<a class="sourceLine" id="cb4-22" title="22"></a>
<a class="sourceLine" id="cb4-23" title="23"><span class="co"> For performance reasons delivery items are collected in a cache</span></a>
<a class="sourceLine" id="cb4-24" title="24"><span class="co"> before writing them (i.e., committing a database transaction).</span></a>
<a class="sourceLine" id="cb4-25" title="25"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb4-26" title="26"> cache <span class="op">=</span> []</a>
<a class="sourceLine" id="cb4-27" title="27"> msg_count <span class="op">=</span> max_messages_per_commit</a>
<a class="sourceLine" id="cb4-28" title="28"> <span class="cf">for</span> commit, msg_details <span class="kw">in</span> iterable:</a>
<a class="sourceLine" id="cb4-29" title="29"> ...</a></code></pre></div>
<h2>Parsing</h2>
<p>Parse what you can. (But only msg_info in Postfix, and only relevant components.)</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">def</span> parse(msg_details, debug<span class="op">=</span><span class="va">False</span>):</a>
<a class="sourceLine" id="cb5-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-3" title="3"><span class="co"> Parse a log message returning a dict.</span></a>
<a class="sourceLine" id="cb5-4" title="4"></a>
<a class="sourceLine" id="cb5-5" title="5"><span class="co"> *msg_details* is assumed to be a dict with these keys:</span></a>
<a class="sourceLine" id="cb5-6" title="6"></a>
<a class="sourceLine" id="cb5-7" title="7"><span class="co"> * &#39;identifier&#39; (syslog identifier),</span></a>
<a class="sourceLine" id="cb5-8" title="8"><span class="co"> * &#39;pid&#39; (process id),</span></a>
<a class="sourceLine" id="cb5-9" title="9"><span class="co"> * &#39;message&#39; (message text)</span></a>
<a class="sourceLine" id="cb5-10" title="10"></a>
<a class="sourceLine" id="cb5-11" title="11"><span class="co"> The syslog identifier and process id are copied to the resulting dict.</span></a>
<a class="sourceLine" id="cb5-12" title="12"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-13" title="13"> ...</a>
<a class="sourceLine" id="cb5-14" title="14"></a>
<a class="sourceLine" id="cb5-15" title="15"><span class="kw">def</span> _parse_branch(comp, msg, res):</a>
<a class="sourceLine" id="cb5-16" title="16"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-17" title="17"><span class="co"> Parse a log message string *msg*, adding results to dict *res*.</span></a>
<a class="sourceLine" id="cb5-18" title="18"></a>
<a class="sourceLine" id="cb5-19" title="19"><span class="co"> Depending on the component *comp* we branch to functions</span></a>
<a class="sourceLine" id="cb5-20" title="20"><span class="co"> named _parse_{comp}.</span></a>
<a class="sourceLine" id="cb5-21" title="21"></a>
<a class="sourceLine" id="cb5-22" title="22"><span class="co"> Add parsing results to dict *res*. Always add key &#39;action&#39;.</span></a>
<a class="sourceLine" id="cb5-23" title="23"><span class="co"> Try to parse every syntactical element.</span></a>
<a class="sourceLine" id="cb5-24" title="24"><span class="co"> Note: We parse what we can. Assessment of parsing results relevant</span></a>
<a class="sourceLine" id="cb5-25" title="25"><span class="co"> for delivery is done in :func:`extract_delivery`.</span></a>
<a class="sourceLine" id="cb5-26" title="26"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb5-27" title="27"> ...</a></code></pre></div>
<h2>Extracting</h2>
<p>Extract what is relevant.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">def</span> extract_delivery(msg_details, parsed):</a>
<a class="sourceLine" id="cb6-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb6-3" title="3"><span class="co"> Compute delivery information from parsing results.</span></a>
<a class="sourceLine" id="cb6-4" title="4"></a>
<a class="sourceLine" id="cb6-5" title="5"><span class="co"> Basically this means that we map the parsed fields to</span></a>
<a class="sourceLine" id="cb6-6" title="6"><span class="co"> a type (&#39;from&#39; or &#39;to&#39;) and to the database</span></a>
<a class="sourceLine" id="cb6-7" title="7"><span class="co"> fields for table &#39;delivery_from&#39; or &#39;delivery_to&#39;.</span></a>
<a class="sourceLine" id="cb6-8" title="8"></a>
<a class="sourceLine" id="cb6-9" title="9"><span class="co"> We branch to functions _extract_{comp} where comp is the</span></a>
<a class="sourceLine" id="cb6-10" title="10"><span class="co"> name of a Postfix component.</span></a>
<a class="sourceLine" id="cb6-11" title="11"></a>
<a class="sourceLine" id="cb6-12" title="12"><span class="co"> Return a list of error strings and a dict with the</span></a>
<a class="sourceLine" id="cb6-13" title="13"><span class="co"> extracted information. Keys with None values are removed</span></a>
<a class="sourceLine" id="cb6-14" title="14"><span class="co"> from the dict.</span></a>
<a class="sourceLine" id="cb6-15" title="15"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb6-16" title="16"> ...</a></code></pre></div>
<h2>Regular expressions</h2>
<ul>
<li><p>see sources</p></li>
<li><p><a href="https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression">Stackoverflow: How to validate an email address</a> <a href="https://i.stack.imgur.com/YI6KR.png">FSM</a></p></li>
</ul>
<h3>BTW: <a href="https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr">email.utils.parseaddr</a></h3>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb7-1" title="1"><span class="op">&gt;&gt;&gt;</span> <span class="im">from</span> email.utils <span class="im">import</span> parseaddr</a>
<a class="sourceLine" id="cb7-2" title="2"><span class="op">&gt;&gt;&gt;</span> parseaddr(<span class="st">&#39;Ghost &lt;&quot;hello@nowhere&quot;@pyug.at&gt;&#39;</span>)</a>
<a class="sourceLine" id="cb7-3" title="3">(<span class="st">&#39;Ghost&#39;</span>, <span class="st">&#39;&quot;hello@nowhere&quot;@pyug.at&#39;</span>)</a>
<a class="sourceLine" id="cb7-4" title="4"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;more</span><span class="ch">\&quot;</span><span class="st">fun</span><span class="ch">\&quot;\\</span><span class="st">&quot;hello</span><span class="ch">\\</span><span class="st">&quot;@nowhere&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
<a class="sourceLine" id="cb7-5" title="5"><span class="co">&quot;more&quot;</span>fun<span class="st">&quot;</span><span class="ch">\&quot;</span><span class="st">hello</span><span class="ch">\&quot;</span><span class="st">@nowhere&quot;</span><span class="op">@</span>pyug.at</a>
<a class="sourceLine" id="cb7-6" title="6"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
<a class="sourceLine" id="cb7-7" title="7"><span class="co">&quot;&quot;</span><span class="op">@</span>pyug.at</a></code></pre></div>
<h2>Storing</h2>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">def</span> store_deliveries(cursor, cache, debug<span class="op">=</span>[]):</a>
<a class="sourceLine" id="cb8-2" title="2"> <span class="co">&quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb8-3" title="3"><span class="co"> Store cached delivery information into the database.</span></a>
<a class="sourceLine" id="cb8-4" title="4"></a>
<a class="sourceLine" id="cb8-5" title="5"><span class="co"> Find queue_ids in *cache* and group delivery items by</span></a>
<a class="sourceLine" id="cb8-6" title="6"><span class="co"> them, but separately for delivery types &#39;from&#39; and &#39;to&#39;.</span></a>
<a class="sourceLine" id="cb8-7" title="7"><span class="co"> In addition, collect delivery items with queue_id is None.</span></a>
<a class="sourceLine" id="cb8-8" title="8"></a>
<a class="sourceLine" id="cb8-9" title="9"><span class="co"> After grouping we merge all items withing a group into a</span></a>
<a class="sourceLine" id="cb8-10" title="10"><span class="co"> single item. So we can combine several SQL queries into </span></a>
<a class="sourceLine" id="cb8-11" title="11"><span class="co"> a single one, which improves performance significantly.</span></a>
<a class="sourceLine" id="cb8-12" title="12"></a>
<a class="sourceLine" id="cb8-13" title="13"><span class="co"> Then store the merged items and the deliveries with</span></a>
<a class="sourceLine" id="cb8-14" title="14"><span class="co"> queue_id is None.</span></a>
<a class="sourceLine" id="cb8-15" title="15"><span class="co"> &quot;&quot;&quot;</span></a>
<a class="sourceLine" id="cb8-16" title="16"> ...</a></code></pre></div>
<p>Database schema: 3 tables:</p>
<ul>
<li>delivery_from: smtpd, milters, qmgr</li>
<li>delivery_to: smtp, virtual, bounce, error</li>
<li>noqueue: rejected by smtpd before even getting a queue_id</li>
</ul>
<p>Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.</p>
<h2>Demo</h2>
<pre><code>...
</code></pre>
<h2>Questions / Suggestions</h2>
<ul>
<li>Could you enhance speed by using prepared statements?</li>
<li>Will old data be deleted (as required by GDPR)?</li>
</ul>
<p>Both were implemented after the talk.</p>

View file

@ -0,0 +1,340 @@
# journal-postfix - A log parser for Postfix
Experiences from applying Python to the domain of bad old email.
## Email ✉
* old technology (starting in the 70ies)
* [store-and-forward](https://en.wikipedia.org/wiki/Store_and_forward): sent != delivered to recipient
* non-delivery reasons:
* recipient over quota
* inexistent destination
* malware
* spam
* server problem
* ...
* permanent / non-permanent failure ([DSN ~ 5.X.Y / 4.X.Y](https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml))
* non-delivery modes
* immediate reject on SMTP level
* delayed [bounce messages](https://en.wikipedia.org/wiki/Bounce_message) by [reporting MTA](https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png) - queueing (e.g., ~5d) before delivery failure notification
* discarding
* read receipts
* [Wikipedia: email tracking](https://en.wikipedia.org/wiki/Email_tracking)
## [SMTP](https://en.wikipedia.org/wiki/SMTP)
[SMTP session example](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example):
envelope sender, envelope recipient may differ from From:, To:
Lists of error codes:
* [SMTP and ESMTP](https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list)
* [SMTP](https://serversmtp.com/smtp-error/)
* [SMTP](https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx)
Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)
SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
* email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)
## [Postfix](http://www.postfix.org)
* popular [MTA](https://en.wikipedia.org/wiki/Message_transfer_agent)
* written in C
* logging to files / journald
* example log messages for a (non-)delivery + stats
```
Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 <spameri@tiscali.it>: Relay access denied; from=<spameri@tiscali.it> to=<spameri@tiscali.it> proto=ESMTP helo=<WIN-G7CPHCGK247>
Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=<ibu@cosmopool.net> to=<ibu@multiname.org> proto=ESMTP helo=<mail.cosmopool.net>
Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=<d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net>
Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=<ibu@cosmopool.net>, size=1365, nrcpt=2 (queue active)
Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu2@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu2@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg Saved)
Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
```
* [involved postfix components](http://www.postfix.org/OVERVIEW.html)
* smtpd (port 25: smtp, port 587: submission)
* cleanup
* smtp/lmtp
* missing log parser
## Idea
* follow log stream and write summarized delivery information to a database
* goal: spot delivery problems, collect delivery stats
* a GUI could then display the current delivery status to users
## Why Python?
* simple and fun language, clear and concise
* well suited for text processing
* libs available for systemd, PostgreSQL
* huge standard library (used here: datetime, re, yaml, argparse, select)
* speed sufficient?
## Development iterations
* hmm, easy task, might take a few days
* PoC: reading and polling from journal works as expected
* used postfix logfiles in syslog format and wrote regexps matching them iteratively
* separated parsing messages from extracting delivery information
* created a delivery table
* hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)
* introduced polling timeout and SQL transactions handling several messages at once
* ... much faster
* looks fine, but wait... did I catch all syntax variants of Postfix log messages?
* looked into Postfix sources and almost got lost
* weeks of hard work identifying relevant log output directives
* completely rewrote parser to deal with the rich log msg syntax, e.g.:<br>
`def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)`
* oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones
* mails may have multiple recipients... split delivery table into delivery_from and delivery_to
* decide which delivery information is relevant
* cleanup and polish (config mgmt, logging)
* write ansible role
## Structure
```blockdiag
blockdiag {
default_fontsize = 20;
node_height = 80;
journal_since -> run_loop;
journal_follow -> run_loop;
logfile -> run_loop;
run_loop -> parse -> extract_delivery -> store;
store -> delivery_from;
store -> delivery_to;
store -> noqueue;
group { label="input iterables"; journal_since; journal_follow; logfile; };
group { label="output tables"; delivery_from; delivery_to; noqueue; };
}
```
## Iterables
```python
def iter_journal_messages_since(timestamp: Union[int, float]):
"""
Yield False and message details from the journal since *timestamp*.
This is the loading phase (loading messages that already existed
when we start).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit UNITNAME with loglevel
INFO and above are retrieved.
"""
...
def iter_journal_messages_follow(timestamp: Union[int, float]):
"""
Yield commit and message details from the journal through polling.
This is the polling phase (after we have read pre-existing messages
in the loading phase).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit UNITNAME with loglevel
INFO and above are retrieved.
*commit* (bool) tells whether it is time to store the delivery
information obtained from the messages yielded by us.
It is set to True if max_delay_before_commit has elapsed.
After this delay delivery information will be written; to be exact:
the delay may increase by up to one journal_poll_interval.
"""
...
def iter_logfile_messages(filepath: str, year: int,
commit_after_lines=max_messages_per_commit):
"""
Yield messages and a commit flag from a logfile.
Loop through all lines of the file with given *filepath* and
extract the time and log message. If the log message starts
with 'postfix/', then extract the syslog_identifier, pid and
message text.
Since syslog lines do not contain the year, the *year* to which
the first log line belongs must be given.
Return a commit flag and a dict with these keys:
't': timestamp
'message': message text
'identifier': syslog identifier (e.g., 'postfix/smtpd')
'pid': process id
The commit flag will be set to True for every
(commit_after_lines)-th filtered message and serves
as a signal to the caller to commit this chunk of data
to the database.
"""
...
```
## Running loops
```python
def run(dsn, verp_marker=False, filepath=None, year=None, debug=[]):
"""
Determine loop(s) and run them within a database context.
"""
init(verp_marker=verp_marker)
with psycopg2.connect(dsn) as conn:
with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as curs:
if filepath:
run_loop(iter_logfile_messages(filepath, year), curs, debug=debug)
else:
begin_timestamp = get_latest_timestamp(curs)
run_loop(iter_journal_messages_since(begin_timestamp), curs, debug=debug)
begin_timestamp = get_latest_timestamp(curs)
run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug=debug)
def run_loop(iterable, curs, debug=[]):
"""
Loop over log messages obtained from *iterable*.
Parse the message, extract delivery information from it and store
that delivery information.
For performance reasons delivery items are collected in a cache
before writing them (i.e., committing a database transaction).
"""
cache = []
msg_count = max_messages_per_commit
for commit, msg_details in iterable:
...
```
## Parsing
Parse what you can. (But only msg_info in Postfix, and only relevant components.)
```python
def parse(msg_details, debug=False):
"""
Parse a log message returning a dict.
*msg_details* is assumed to be a dict with these keys:
* 'identifier' (syslog identifier),
* 'pid' (process id),
* 'message' (message text)
The syslog identifier and process id are copied to the resulting dict.
"""
...
def _parse_branch(comp, msg, res):
"""
Parse a log message string *msg*, adding results to dict *res*.
Depending on the component *comp* we branch to functions
named _parse_{comp}.
Add parsing results to dict *res*. Always add key 'action'.
Try to parse every syntactical element.
Note: We parse what we can. Assessment of parsing results relevant
for delivery is done in :func:`extract_delivery`.
"""
...
```
## Extracting
Extract what is relevant.
```python
def extract_delivery(msg_details, parsed):
"""
Compute delivery information from parsing results.
Basically this means that we map the parsed fields to
a type ('from' or 'to') and to the database
fields for table 'delivery_from' or 'delivery_to'.
We branch to functions _extract_{comp} where comp is the
name of a Postfix component.
Return a list of error strings and a dict with the
extracted information. Keys with None values are removed
from the dict.
"""
...
```
## Regular expressions
* see sources
* [Stackoverflow: How to validate an email address](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) [FSM](https://i.stack.imgur.com/YI6KR.png)
### BTW: [email.utils.parseaddr](https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr)
```python
>>> from email.utils import parseaddr
>>> parseaddr('Ghost <"hello@nowhere"@pyug.at>')
('Ghost', '"hello@nowhere"@pyug.at')
>>> print(parseaddr('"more\"fun\"\\"hello\\"@nowhere"@pyug.at')[1])
"more"fun"\"hello\"@nowhere"@pyug.at
>>> print(parseaddr('""@pyug.at')[1])
""@pyug.at
```
## Storing
```python
def store_deliveries(cursor, cache, debug=[]):
"""
Store cached delivery information into the database.
Find queue_ids in *cache* and group delivery items by
them, but separately for delivery types 'from' and 'to'.
In addition, collect delivery items with queue_id is None.
After grouping we merge all items withing a group into a
single item. So we can combine several SQL queries into
a single one, which improves performance significantly.
Then store the merged items and the deliveries with
queue_id is None.
"""
...
```
Database schema: 3 tables:
* delivery_from: smtpd, milters, qmgr
* delivery_to: smtp, virtual, bounce, error
* noqueue: rejected by smtpd before even getting a queue_id
Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.
## Demo
...
## Questions / Suggestions
* Could you enhance speed by using prepared statements?
* Will old data be deleted (as required by GDPR)?
Both were implemented after the talk.

34
journal-postfix.yml Normal file
View file

@ -0,0 +1,34 @@
# Deploy journal-postfix
# This will install a service that writes mail delivery information
# obtained from systemd-journal (unit postfix@-.service) to a
# PostgreSQL database.
#
# You can configure the database connection parameters (and optionally
# a verp_marker) as host vars like this:
#
# mailserver:
# postgresql:
# host: 127.0.0.1
# port: 5432
# dbname: mailserver
# username: mailserver
# password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# postfix:
# verp_marker: rstxyz
#
# If you do not, then you must edit /etc/journal-postfix/main.yml
# on the destination hosts and run systemctl start journal-postfix
# manually.
- name: install journal-postfix
user: root
hosts: mail
roles:
- journal-postfix

View file

@ -0,0 +1,17 @@
# this file is part of ansible role journal-postfix
[Unit]
Description=Extract postfix message delivery information from systemd journal messages\
and store them in a PostgreSQL database. Configuration is in /etc/journal-postfix/main.yml
After=multi-user.target
[Service]
Type=simple
ExecStart=/srv/journal-postfix/run.py
User=journal-postfix
WorkingDirectory=/srv/journal-postfix/
Restart=on-failure
RestartPreventExitStatus=97
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,85 @@
Parse postfix entries in systemd journal and collect delivery information.
The information on mail deliveries is written to tables in a PostgreSQL
database. The database can then be queried by a UI showing delivery status
to end users. The UI is not part of this package.
This software is tailor-made for debian buster with systemd as init system.
It is meant to run on the same system on which Postfix is running,
or on a system receiving the log stream of a Postfix instance in its
systemd journal.
Prerequisites / Postfix configuration:
- Access to a PostgreSQL database.
- Postfix: Only virtual mailboxes are supported.
- Postfix: You can use short or long queue_ids (see
http://www.postfix.org/postconf.5.html#enable_long_queue_ids),
but since the uniqueness of short queue_ids is very limited,
usage of long queue_ids is *strongly recommended*.
Installation:
- apt install python3-psycopg2 python3-systemd python3-yaml
- Edit /etc/journal-postfix/main.yml
- Output is written to the journal (unit journal-postfix). READ IT!
Side effects (database):
- The configured database user will create the tables
- delivery_from
- delivery_to
- noqueue
in the configured database, if they do not yet exist.
These tables will be filled with results from parsing the journal.
Table noqueue contains deliveries rejected by smtpd before they
got a queue_id. Deliveries with queue_id are in tables delivery_from
and delivery_to, which are separate, because an email can have only
one sender, but more than one recipient. Entries in both tables are
related through the queue_id and the approximate date; note that
short queue_ids are not unique for a delivery transaction, so
consider changing your Postfix configuration to long queue_ids.
- Log output is written to journald, unit journal-postfix.
Configuration:
- Edit the config file in YAML format located at
/etc/journal-postfix/main.conf
Limitations:
- The log output of Postfix may contain messages not primarily relevant
for delivery, namely messages of levels panic, fatal, error, warning.
They are discarded.
- The postfix server must be configured to use virtual mailboxes;
deliveries to local mailboxes are ignored.
- Parsing is specific to a Postfix version and only version 3.4.5
(the version in Debian buster) is supported; it is intended to support
Postfix versions in future stable Debian releases.
- This script does not support concurrency; we assume that there is only
one process writing to the database tables. Thus clustered postfix
setups are not supported.
Options:
- If you use dovecot as lmtpd, you will also get dovecot_ids upon
successful delivery.
- If you have configured Postfix to store VERP-ids of outgoing mails
in table 'mail_from' in the same database, then bounce emails can
be associated with original emails. The VERP-ids must have a certain
format.
- The subject of emails will be extracted from log messages starting
with "info: header Subject:". To enable these messages configure
Postfix like this: Enabled header_checks in main.cf (
header_checks = regexp:/etc/postfix/header_checks
) and put this line into /etc/postfix/header_checks:
/^Subject:/ INFO
- You can also import log messages from a log file in syslog format:
Run this script directly from command line with options --file
(the path to the file to be parsed) and --year (the year of the
first message in this log file).
Note: For the name of the month to be recognized correctly, the
script must be run with this locale.
Attention: When running from the command line, log output will
not be sent to unit journal-postfix; use this command instead:
journalctl --follow SYSLOG_IDENTIFIER=python3

File diff suppressed because it is too large Load diff

212
journal-postfix/files/srv/run.py Executable file
View file

@ -0,0 +1,212 @@
#!/usr/bin/env python3
"""
Main script to be run as a systemd unit or manually.
"""
import argparse
import datetime
import os
import sys
from pprint import pprint
from typing import Iterable, List, Optional, Tuple, Union
import psycopg2
import psycopg2.extras
from systemd import journal
import settings
from parser import init_parser, parse_entry, extract_delivery
from sources import (
iter_journal_messages_since,
iter_journal_messages_follow,
iter_logfile_messages,
)
from storage import (
init_db,
init_session,
get_latest_timestamp,
delete_old_deliveries,
store_delivery_items,
)
exit_code_without_restart = 97
def run(
dsn: str,
verp_marker: Optional[str] = None,
filepath: Optional[str] = None,
year: Optional[int] = None,
debug: List[str] = [],
) -> None:
"""
Determine loop(s) and run them within a database context.
"""
init_parser(verp_marker=verp_marker)
with psycopg2.connect(dsn) as conn:
with conn.cursor(
cursor_factory=psycopg2.extras.RealDictCursor
) as curs:
init_session(curs)
if filepath and year:
run_loop(
iter_logfile_messages(filepath, year), curs, debug=debug
)
else:
begin_timestamp = get_latest_timestamp(curs)
run_loop(
iter_journal_messages_since(begin_timestamp),
curs,
debug=debug,
)
begin_timestamp = get_latest_timestamp(curs)
run_loop(
iter_journal_messages_follow(begin_timestamp),
curs,
debug=debug,
)
def run_loop(
iterable: Iterable[Tuple[bool, Optional[dict]]],
curs: psycopg2.extras.RealDictCursor,
debug: List[str] = []
) -> None:
"""
Loop over log entries obtained from *iterable*.
Parse the message, extract delivery information from it and store
that delivery information.
For performance reasons delivery items are collected in a cache
before writing them (i.e., committing a database transaction).
"""
cache = []
msg_count = settings.max_messages_per_commit
last_delete = None
for commit, msg_details in iterable:
parsed_entry = None
if msg_details:
parsed_entry = parse_entry(msg_details)
if 'all' in debug or (
parsed_entry and parsed_entry.get('comp') in debug
):
print('_' * 80)
print('MSG_DETAILS:', msg_details)
print('PARSED_ENTRY:', parsed_entry)
if parsed_entry:
errors, delivery = extract_delivery(msg_details, parsed_entry)
if not errors and delivery:
if 'all' in debug or parsed_entry.get('comp') in debug:
print('DELIVERY:')
pprint(delivery)
# it may happen that a delivery of type 'from' has
# a recipient; in this case add a second delivery
# of type 'to' to the cache, but only for deliveries
# with queue_id
if (
delivery['type'] == 'from'
and 'recipient' in delivery
and delivery.get('queue_id')
):
delivery2 = delivery.copy()
delivery2['type'] = 'to'
cache.append(delivery2)
del delivery['recipient']
cache.append(delivery)
msg_count -= 1
if msg_count == 0:
commit = True
elif errors:
msg = (
f'Extracting delivery from parsed entry failed: '
f'errors={errors}; msg_details={msg_details}; '
f'parsed_entry={parsed_entry}'
)
journal.send(msg, PRIORITY=journal.LOG_CRIT)
if 'all' in debug or parsed_entry.get('comp') in debug:
print('EXTRACTION ERRORS:', errors)
if commit:
if 'all' in debug:
print('.' * 40, 'committing')
# store cache, clear cache, reset message counter
store_delivery_items(curs, cache, debug=debug)
cache = []
msg_count = settings.max_messages_per_commit
now = datetime.datetime.utcnow()
if last_delete is None or last_delete < now - settings.delete_interval:
delete_old_deliveries(curs)
last_delete = now
if 'all' in debug:
print('.' * 40, 'deleting old deliveries')
else:
store_delivery_items(curs, cache, debug=debug)
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument(
'--debug',
help='Comma-separated list of components to be debugged; '
'valid component names are the Postfix components '
'plus "sql" plus "all".',
)
parser.add_argument(
'--file',
help='File path of a Postfix logfile in syslog '
'format to be parsed instead of the journal',
)
parser.add_argument(
'--year',
help='If --file is given, we need to know '
'the year of the first line in the logfile',
)
args = parser.parse_args()
config = settings.get_config()
if config:
# check if startup is enabled or fail
msg = None
if 'startup' not in config:
msg = 'Parameter "startup" is not configured.'
elif not config['startup']:
msg = 'Startup is not enabled in the config file.'
if msg:
journal.send(msg, PRIORITY=journal.LOG_CRIT)
sys.exit(exit_code_without_restart)
# check more params and call run
try:
verp_marker = config['postfix']['verp_marker']
except Exception:
verp_marker = None
debug: List[str] = []
if args.debug:
debug = args.debug.split(',')
filepath = None
year = None
if args.file:
filepath = args.file
if not args.year:
print(
'If --file is given, we need to know the year'
' of the first line in the logfile. Please use --year.'
)
sys.exit(1)
else:
year = int(args.year)
dsn = init_db(config)
if dsn:
run(
dsn,
verp_marker=verp_marker,
filepath=filepath,
year=year,
debug=debug,
)
else:
print('Config invalid, see journal.')
sys.exit(exit_code_without_restart)
if __name__ == '__main__':
main()

View file

@ -0,0 +1,125 @@
#!/usr/bin/env python3
"""
Settings for journal-postfix.
"""
import os
import datetime
from typing import Union, Optional
from systemd import journal
from yaml import load
main_config_file: str = '/etc/journal-postfix/main.yml'
"""
Filepath to the main config file.
Can be overriden by environment variable JOURNAL_POSTFIX_MAIN_CONF.
"""
systemd_unitname: str = 'postfix@-.service'
"""
Name of the systemd unit running the postfix service.
"""
journal_poll_interval: Union[float, int] = 10.0
"""
Poll timeout in seconds for fetching messages from the journal.
Will be overriden if set in the main config.
If the poll times out, it is checked whether the last commit
lies more than max_delay_before_commit seconds in the past;
if so, the current database transaction will be committed.
"""
max_delay_before_commit: datetime.timedelta = datetime.timedelta(seconds=30)
"""
How much time may pass before committing a database transaction?
Will be overriden if set in the main config.
(The actual maximal delay can be one journal_poll_interval in addition.)
"""
max_messages_per_commit: int = 1000
"""
How many messages to cache at most before committing a database transaction?
Will be overriden if set in the main config.
"""
delete_deliveries_after_days: int = 0
"""
After how many days shall deliveries be deleted from the database?
A value of 0 means that data are never deleted.
"""
def get_config() -> Optional[dict]:
"""
Load config from the main config and return it.
The default main config file path (global main_config_file)
can be overriden with environment variable
JOURNAL_POSTFIX_MAIN_CONF.
"""
try:
filename = os.environ['JOURNAL_POSTFIX_MAIN_CONF']
global main_config_file
main_config_file = filename
except Exception:
filename = main_config_file
try:
with open(filename, 'r') as config_file:
config_raw = config_file.read()
except Exception:
msg = f'ERROR: cannot read config file {filename}'
journal.send(msg, PRIORITY=journal.LOG_CRIT)
return None
try:
config = load(config_raw)
except Exception as err:
msg = f'ERROR: invalid yaml syntax in {filename}: {err}'
journal.send(msg, PRIORITY=journal.LOG_CRIT)
return None
# override some global variables
_global_value_from_config(config['postfix'], 'systemd_unitname', str)
_global_value_from_config(config, 'journal_poll_interval', float)
_global_value_from_config(config, 'max_delay_before_commit', 'seconds')
_global_value_from_config(config, 'max_messages_per_commit', int)
_global_value_from_config(config, 'delete_deliveries_after_days', int)
_global_value_from_config(config, 'delete_interval', 'seconds')
return config
def _global_value_from_config(
config, name: str, type_: Union[type, str]
) -> None:
"""
Set a global variable to the value obtained from *config*.
Also cast to *type_*.
"""
try:
value = config.get(name)
if type_ == 'seconds':
value = datetime.timedelta(seconds=float(value))
else:
value = type_(value) # type: ignore
globals()[name] = value
except Exception:
if value is not None:
msg = f'ERROR: configured value of {name} is invalid.'
journal.send(msg, PRIORITY=journal.LOG_ERR)
if __name__ == '__main__':
print(get_config())

View file

@ -0,0 +1,5 @@
[pycodestyle]
max-line-length = 200
[mypy]
ignore_missing_imports = True

View file

@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
Data sources.
Note: python-systemd journal docs are at
https://www.freedesktop.org/software/systemd/python-systemd/journal.html
"""
import datetime
import select
from typing import Iterable, Optional, Tuple, Union
from systemd import journal
import settings
def iter_journal_messages_since(
timestamp: Union[int, float]
) -> Iterable[Tuple[bool, dict]]:
"""
Yield False and message details from the journal since *timestamp*.
This is the loading phase (loading messages that already existed
when we start).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit settings.systemd_unitname with
loglevel INFO and above are retrieved.
"""
timestamp = float(timestamp)
sdj = journal.Reader()
sdj.log_level(journal.LOG_INFO)
sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
sdj.seek_realtime(timestamp)
for entry in sdj:
yield False, _get_msg_details(entry)
def iter_journal_messages_follow(
timestamp: Union[int, float]
) -> Iterable[Tuple[bool, Optional[dict]]]:
"""
Yield commit and message details from the journal through polling.
This is the polling phase (after we have read pre-existing messages
in the loading phase).
Argument *timestamp* is a UNIX timestamp.
Only journal entries for systemd unit settings.systemd_unitname with
loglevel INFO and above are retrieved.
*commit* (bool) tells whether it is time to store the delivery
information obtained from the messages yielded by us.
It is set to True if settings.max_delay_before_commit has elapsed.
After this delay delivery information will be written; to be exact:
the delay may increase by up to one settings.journal_poll_interval.
"""
sdj = journal.Reader()
sdj.log_level(journal.LOG_INFO)
sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
sdj.seek_realtime(timestamp)
p = select.poll()
p.register(sdj, sdj.get_events())
last_commit = datetime.datetime.utcnow()
interval_ms = settings.journal_poll_interval * 1000
while True:
p.poll(interval_ms)
commit = False
now = datetime.datetime.utcnow()
if last_commit + settings.max_delay_before_commit < now:
commit = True
last_commit = now
if sdj.process() == journal.APPEND:
for entry in sdj:
yield commit, _get_msg_details(entry)
elif commit:
yield commit, None
def iter_logfile_messages(
filepath: str,
year: int,
commit_after_lines=settings.max_messages_per_commit,
) -> Iterable[Tuple[bool, dict]]:
"""
Yield messages and a commit flag from a logfile.
Loop through all lines of the file with given *filepath* and
extract the time and log message. If the log message starts
with 'postfix/', then extract the syslog_identifier, pid and
message text.
Since syslog lines do not contain the year, the *year* to which
the first log line belongs must be given.
Return a commit flag and a dict with these keys:
't': timestamp
'message': message text
'identifier': syslog identifier (e.g., 'postfix/smtpd')
'pid': process id
The commit flag will be set to True for every
(commit_after_lines)-th filtered message and serves
as a signal to the caller to commit this chunk of data
to the database.
"""
dt = None
with open(filepath, 'r') as fh:
cnt = 0
while True:
line = fh.readline()
if not line:
break
# get datetime
timestamp = line[:15]
dt_prev = dt
dt = _parse_logfile_timestamp(timestamp, year)
if dt is None:
continue # discard log message with invalid timestamp
# if we transgress a year boundary, then increment the year
if dt_prev and dt + datetime.timedelta(days=1) < dt_prev:
year += 1
dt = _parse_logfile_timestamp(timestamp, year)
# filter postfix messages
msg = line[21:].strip()
if 'postfix/' in msg:
cnt += 1
syslog_identifier, msg_ = msg.split('[', 1)
pid, msg__ = msg_.split(']', 1)
message = msg__[2:]
commit = cnt % commit_after_lines == 0
yield commit, {
't': dt,
'message': message,
'identifier': syslog_identifier,
'pid': pid,
}
def _get_msg_details(journal_entry: dict) -> dict:
"""
Return information extracted from a journal entry object as a dict.
"""
return {
't': journal_entry['__REALTIME_TIMESTAMP'],
'message': journal_entry['MESSAGE'],
'identifier': journal_entry.get('SYSLOG_IDENTIFIER'),
'pid': journal_entry.get('SYSLOG_PID'),
}
def _parse_logfile_timestamp(
timestamp: Optional[str],
year: int
) -> Optional[datetime.datetime]:
"""
Parse a given syslog *timestamp* and return a datetime.
Since the timestamp does not contain the year, it is an
extra argument.
Note: Successful parsing og the month's name depends on
the locale under which this script runs.
"""
if timestamp is None:
return None
try:
timestamp = timestamp.replace(' ', ' ')
t1 = datetime.datetime.strptime(timestamp, '%b %d %H:%M:%S')
t2 = t1.replace(year=year)
return t2
except Exception:
return None

View file

@ -0,0 +1,337 @@
#!/usr/bin/env python3
"""
Storage to PostgreSQL.
"""
import datetime
import json
import re
import time
from collections import defaultdict
from traceback import format_exc
from typing import Any, Dict, Iterable, List, Optional, Tuple, Union
import psycopg2
import psycopg2.extras
from systemd import journal
import settings
from storage_setup import (
get_create_table_stmts,
get_sql_prepared_statement,
get_sql_execute_prepared_statement,
table_fields,
)
def get_latest_timestamp(curs: psycopg2.extras.RealDictCursor) -> int:
"""
Fetch the latest timestamp from the database.
Return the latest timestamp of a message transfer from the database.
If there are no records yet, return 0.
"""
last = 0
curs.execute(
"SELECT greatest(max(t_i), max(t_f)) AS last FROM delivery_from"
)
last1 = curs.fetchone()['last']
if last1:
last = max(
last, (last1 - datetime.datetime(1970, 1, 1)).total_seconds()
)
curs.execute(
"SELECT greatest(max(t_i), max(t_f)) AS last FROM delivery_to"
)
last2 = curs.fetchone()['last']
if last2:
last = max(
last, (last2 - datetime.datetime(1970, 1, 1)).total_seconds()
)
return last
def delete_old_deliveries(curs: psycopg2.extras.RealDictCursor) -> None:
"""
Delete deliveries older than the configured number of days.
See config param *delete_deliveries_after_days*.
"""
max_days = settings.delete_deliveries_after_days
if max_days:
now = datetime.datetime.utcnow()
dt = datetime.timedelta(days=max_days)
t0 = now - dt
curs.execute("DELETE FROM delivery_from WHERE t_i < %s", (t0,))
curs.execute("DELETE FROM delivery_to WHERE t_i < %s", (t0,))
curs.execute("DELETE FROM noqueue WHERE t < %s", (t0,))
def store_delivery_items(
cursor,
cache: List[dict],
debug: List[str] = []
) -> None:
"""
Store cached delivery items into the database.
Find queue_ids in *cache* and group delivery items by
them, but separately for delivery types 'from' and 'to'.
In addition, collect delivery items with queue_id is None.
After grouping we merge all items withing a group into a
single item. So we can combine several SQL queries into
a single one, which improves performance significantly.
Then store the merged items and the deliveries with
queue_id is None.
"""
if 'all' in debug or 'sql' in debug:
print(f'Storing {len(cache)} messages.')
if not cache:
return
from_items, to_items, noqueue_items = _group_delivery_items(cache)
deliveries_from = _merge_delivery_items(from_items, item_type='from')
deliveries_to = _merge_delivery_items(to_items, item_type='to')
_store_deliveries(cursor, 'delivery_from', deliveries_from, debug=debug)
_store_deliveries(cursor, 'delivery_to', deliveries_to, debug=debug)
_store_deliveries(cursor, 'noqueue', noqueue_items, debug=debug)
FromItems = Dict[str, List[dict]]
ToItems = Dict[Tuple[str, Optional[str]], List[dict]]
NoqueueItems = Dict[int, dict]
def _group_delivery_items(
cache: List[dict]
) -> Tuple[FromItems, ToItems, NoqueueItems]:
"""
Group delivery items by type and queue_id.
Return items of type 'from', of type 'to' and items without
queue_id.
"""
delivery_from_items: FromItems = defaultdict(list)
delivery_to_items: ToItems = defaultdict(list)
noqueue_items: NoqueueItems = {}
noqueue_i = 1
for item in cache:
if item.get('queue_id'):
queue_id = item['queue_id']
if item.get('type') == 'from':
delivery_from_items[queue_id].append(item)
else:
recipient = item.get('recipient')
delivery_to_items[(queue_id, recipient)].append(item)
else:
noqueue_items[noqueue_i] = item
noqueue_i += 1
return delivery_from_items, delivery_to_items, noqueue_items
def _merge_delivery_items(
delivery_items: Union[FromItems, ToItems],
item_type: str = 'from',
) -> Dict[Union[str, Tuple[str, Optional[str]]], dict]:
"""
Compute deliveries by combining multiple delivery items.
Take lists of delivery items for each queue_id (in case
of item_type=='from') or for (queue_id, recipient)-pairs
(in case of item_type='to').
Each delivery item is a dict obtained from one log message.
The dicts are consecutively updated (merged), except for the
raw log messages (texts) which are collected into a list.
The fields of the resulting delivery are filtered according
to the target table.
Returned is a dict mapping queue_ids (in case
of item_type=='from') or (queue_id, recipient)-pairs
(in case of item_type='to') to deliveries.
"""
deliveries = {}
for group, items in delivery_items.items():
delivery = {}
messages = []
for item in items:
message = item.pop('message')
identifier = item.pop('identifier')
pid = item.pop('pid')
messages.append(f'{identifier}[{pid}]: {message}')
delivery.update(item)
delivery['messages'] = messages
deliveries[group] = delivery
return deliveries
def _store_deliveries(
cursor: psycopg2.extras.RealDictCursor,
table_name: str,
deliveries: Dict[Any, dict],
debug: List[str] = [],
) -> None:
"""
Store grouped and merged delivery items.
"""
if not deliveries:
return
n = len(deliveries.values())
t0 = time.time()
cursor.execute('BEGIN')
_store_deliveries_batch(cursor, table_name, deliveries.values())
cursor.execute('COMMIT')
t1 = time.time()
if 'all' in debug or 'sql' in debug:
milliseconds = (t1 - t0) * 1000
print(
'*' * 10,
f'SQL transaction time {table_name}: '
f'{milliseconds:.2f} ms ({n} deliveries)',
)
def _store_deliveries_batch(
cursor: psycopg2.extras.RealDictCursor,
table_name: str,
deliveries: Iterable[dict]
) -> None:
"""
Store *deliveries* (i.e., grouped and merged delivery items).
We use a prepared statement and execute_batch() from
psycopg2.extras to improve performance.
"""
rows = []
for delivery in deliveries:
# get values for all fields of the table
field_values: List[Any] = []
t = delivery.get('t')
delivery['t_i'] = t
delivery['t_f'] = t
for field in table_fields[table_name]:
if field in delivery:
if field == 'messages':
field_values.append(json.dumps(delivery[field]))
else:
field_values.append(delivery[field])
else:
field_values.append(None)
rows.append(field_values)
sql = get_sql_execute_prepared_statement(table_name)
try:
psycopg2.extras.execute_batch(cursor, sql, rows)
except Exception as err:
msg = f'SQL statement failed: "{sql}" -- the values were: {rows}'
journal.send(msg, PRIORITY=journal.LOG_ERR)
def init_db(config: dict) -> Optional[str]:
"""
Initialize database; if ok return DSN, else None.
Try to get parameters for database access,
check existence of tables and possibly create them.
"""
dsn = _get_dsn(config)
if dsn:
ok = _create_tables(dsn)
if not ok:
return None
return dsn
def _get_dsn(config: dict) -> Optional[str]:
"""
Return the DSN (data source name) from the *config*.
"""
try:
postgresql_config = config['postgresql']
hostname = postgresql_config['hostname']
port = postgresql_config['port']
database = postgresql_config['database']
username = postgresql_config['username']
password = postgresql_config['password']
except Exception:
msg = f"""ERROR: invalid config in {settings.main_config_file}
The config file must contain a section like this:
postgresql:
hostname: <HOSTNAME_OR_IP>
port: <PORT>
database: <DATABASE_NAME>
username: <USERNAME>
password: <PASSWORD>
"""
journal.send(msg, PRIORITY=journal.LOG_CRIT)
return None
dsn = f'host={hostname} port={port} dbname={database} '\
f'user={username} password={password}'
return dsn
def _create_tables(dsn: str) -> bool:
"""
Check existence of tables and possibly create them, returning success.
"""
try:
with psycopg2.connect(dsn) as conn:
with conn.cursor() as curs:
for table_name, sql_stmts in get_create_table_stmts().items():
ok = _create_table(curs, table_name, sql_stmts)
if not ok:
return False
except Exception:
journal.send(
f'ERROR: cannot connect to database, check params'
f' in {settings.main_config_file}',
PRIORITY=journal.LOG_CRIT,
)
return False
return True
def _create_table(
cursor: psycopg2.extras.RealDictCursor,
table_name: str,
sql_stmts: List[str]
) -> bool:
"""
Try to create a table if it does not exist and return whether it exists.
If creation failed, emit an error to the journal.
"""
cursor.execute("SELECT EXISTS(SELECT * FROM "
"information_schema.tables WHERE table_name=%s)",
(table_name,))
table_exists = cursor.fetchone()[0]
if not table_exists:
for sql_stmt in sql_stmts:
try:
cursor.execute(sql_stmt)
except Exception:
journal.send(
'ERROR: database user needs privilege to create tables.\n'
'Alternatively, you can create the table manually like'
' this:\n\n'
+ '\n'.join([sql + ';' for sql in sql_stmts]),
PRIORITY=journal.LOG_CRIT,
)
return False
return True
def init_session(cursor: psycopg2.extras.RealDictCursor) -> None:
"""
Init a database session.
Define prepared statements.
"""
stmt = get_sql_prepared_statement('delivery_from')
cursor.execute(stmt)
stmt = get_sql_prepared_statement('delivery_to')
cursor.execute(stmt)
stmt = get_sql_prepared_statement('noqueue')
cursor.execute(stmt)

View file

@ -0,0 +1,210 @@
#!/usr/bin/env python3
"""
Database table definitions and prepared statements.
Note: (short) postfix queue IDs are not unique:
http://postfix.1071664.n5.nabble.com/Queue-ID-gets-reused-Not-unique-td25387.html
"""
from typing import Dict, List
_table_def_delivery_from = [
[
dict(name='t_i', dtype='TIMESTAMP'),
dict(name='t_f', dtype='TIMESTAMP'),
dict(name='queue_id', dtype='VARCHAR(16)', null=False, extra='UNIQUE'),
dict(name='host', dtype='VARCHAR(200)'),
dict(name='ip', dtype='VARCHAR(50)'),
dict(name='sasl_username', dtype='VARCHAR(300)'),
dict(name='orig_queue_id', dtype='VARCHAR(16)'),
dict(name='status', dtype='VARCHAR(10)'),
dict(name='accepted', dtype='BOOL', null=False, default='TRUE'),
dict(name='done', dtype='BOOL', null=False, default='FALSE'),
dict(name='sender', dtype='VARCHAR(300)'),
dict(name='message_id', dtype='VARCHAR(1000)'),
dict(name='resent_message_id', dtype='VARCHAR(1000)'),
dict(name='subject', dtype='VARCHAR(1000)'),
dict(name='phase', dtype='VARCHAR(15)'),
dict(name='error', dtype='VARCHAR(1000)'),
dict(name='size', dtype='INT'),
dict(name='nrcpt', dtype='INT'),
dict(name='verp_id', dtype='INT'),
dict(name='messages', dtype='JSONB', null=False, default="'{}'::JSONB"),
],
"CREATE INDEX delivery_from__queue_id ON delivery_from (queue_id)",
"CREATE INDEX delivery_from__t_i ON delivery_from (t_i)",
"CREATE INDEX delivery_from__t_f ON delivery_from (t_f)",
"CREATE INDEX delivery_from__sender ON delivery_from (sender)",
"CREATE INDEX delivery_from__message_id ON delivery_from (message_id)",
]
_table_def_delivery_to = [
[
dict(name='t_i', dtype='TIMESTAMP'),
dict(name='t_f', dtype='TIMESTAMP'),
dict(name='queue_id', dtype='VARCHAR(16)', null=False),
dict(name='recipient', dtype='VARCHAR(300)'),
dict(name='orig_recipient', dtype='VARCHAR(300)'),
dict(name='host', dtype='VARCHAR(200)'),
dict(name='ip', dtype='VARCHAR(50)'),
dict(name='port', dtype='VARCHAR(10)'),
dict(name='relay', dtype='VARCHAR(10)'),
dict(name='delay', dtype='VARCHAR(200)'),
dict(name='delays', dtype='VARCHAR(200)'),
dict(name='dsn', dtype='VARCHAR(10)'),
dict(name='status', dtype='VARCHAR(10)'),
dict(name='status_text', dtype='VARCHAR(1000)'),
dict(name='messages', dtype='JSONB', null=False, default="'{}'::JSONB"),
],
"ALTER TABLE delivery_to ADD CONSTRAINT"
" delivery_to__queue_id_recipient UNIQUE(queue_id, recipient)",
"CREATE INDEX delivery_to__queue_id ON delivery_to (queue_id)",
"CREATE INDEX delivery_to__recipient ON delivery_to (recipient)",
"CREATE INDEX delivery_to__t_i ON delivery_to (t_i)",
"CREATE INDEX delivery_to__t_f ON delivery_to (t_f)",
]
_table_def_noqueue = [
[
dict(name='t', dtype='TIMESTAMP'),
dict(name='host', dtype='VARCHAR(200)'),
dict(name='ip', dtype='VARCHAR(50)'),
dict(name='sender', dtype='VARCHAR(300)'),
dict(name='recipient', dtype='VARCHAR(300)'),
dict(name='sasl_username', dtype='VARCHAR(300)'),
dict(name='status', dtype='VARCHAR(10)'),
dict(name='phase', dtype='VARCHAR(15)'),
dict(name='error', dtype='VARCHAR(1000)'),
dict(name='message', dtype='TEXT'),
],
"CREATE INDEX noqueue__t ON noqueue (t)",
"CREATE INDEX noqueue__sender ON noqueue (sender)",
"CREATE INDEX noqueue__recipient ON noqueue (recipient)",
]
_tables: Dict[str, list] = {
'delivery_from': _table_def_delivery_from,
'delivery_to': _table_def_delivery_to,
'noqueue': _table_def_noqueue,
}
_prepared_statements = {
'delivery_from':
"PREPARE delivery_from_insert ({}) AS "
"INSERT INTO delivery_from ({}) VALUES ({}) "
"ON CONFLICT (queue_id) DO UPDATE SET {}",
'delivery_to':
"PREPARE delivery_to_insert ({}) AS "
"INSERT INTO delivery_to ({}) VALUES ({}) "
"ON CONFLICT (queue_id, recipient) DO UPDATE SET {}",
'noqueue':
"PREPARE noqueue_insert ({}) AS "
"INSERT INTO noqueue ({}) VALUES ({}){}",
}
table_fields: Dict[str, List[str]] = {}
"""
Lists of field names for tables, populated by get_create_table_stmts().
"""
def get_sql_prepared_statement(table_name: str) -> str:
"""
Return SQL defining a prepared statement for inserting into a table.
Table 'noqueue' is handled differently, because it does not have
an UPDATE clause.
"""
col_names = []
col_types = []
col_args = []
col_upds = []
col_i = 0
for field in _tables[table_name][0]:
# column type
col_type = field['dtype']
if field['dtype'].lower().startswith('varchar'):
col_type = 'TEXT'
col_types.append(col_type)
# column args
col_i += 1
col_arg = '$' + str(col_i)
# column name
col_name = field['name']
col_names.append(col_name)
if 'default' in field:
default = field['default']
col_args.append(f'COALESCE({col_arg},{default})')
else:
col_args.append(col_arg)
# column update
col_upd = f'{col_name}=COALESCE({col_arg},{table_name}.{col_name})'
if col_name != 't_i':
if col_name == 'messages':
col_upd = f'{col_name}={table_name}.{col_name}||{col_arg}'
if table_name != 'noqueue':
col_upds.append(col_upd)
stmt = _prepared_statements[table_name].format(
','.join(col_types),
','.join(col_names),
','.join(col_args),
','.join(col_upds),
)
return stmt
def get_sql_execute_prepared_statement(table_name: str) -> str:
"""
Return SQL for executing the given table's prepared statement.
The result is based on global variable _tables.
"""
fields = _tables[table_name][0]
return "EXECUTE {}_insert ({})"\
.format(table_name, ','.join(['%s' for i in range(len(fields))]))
def get_create_table_stmts() -> Dict[str, List[str]]:
"""
Return a dict mapping table names to SQL statements creating the tables.
Also populate global variable table_fields.
"""
res = {}
for table_name, table_def in _tables.items():
stmts = table_def.copy()
stmts[0] = _get_sql_create_stmt(table_name, table_def[0])
res[table_name] = stmts
field_names = [x['name'] for x in table_def[0]]
global table_fields
table_fields[table_name] = field_names
return res
def _get_sql_create_stmt(table_name: str, fields: List[dict]):
"""
Return the 'CREATE TABLE' SQL statement for a table.
Factor in NULL, DEFAULT and extra DDL text.
"""
sql = f"CREATE TABLE {table_name} (\n id BIGSERIAL,"
col_defs = []
for field in fields:
col_def = f" {field['name']} {field['dtype']}"
if 'null' in field and field['null'] is False:
col_def += " NOT NULL"
if 'default' in field:
col_def += f" DEFAULT {field['default']}"
if 'extra' in field:
col_def += f" {field['extra']}"
col_defs.append(col_def)
sql += '\n' + ',\n'.join(col_defs)
sql += '\n)'
return sql

View file

@ -0,0 +1,90 @@
- name: user journal-postfix
user:
name: journal-postfix
group: systemd-journal
state: present
system: yes
uid: 420
create_home: no
home: /srv/journal-postfix
password: '!'
password_lock: yes
comment: created by ansible role journal-postfix
- name: directories /srv/journal-postfix, /etc/journal-postfix
file:
path: "{{ item }}"
state: directory
owner: journal-postfix
group: systemd-journal
mode: 0755
loop:
- /srv/journal-postfix
- /etc/journal-postfix
- name: install dependencies
apt:
name: python3-psycopg2,python3-systemd,python3-yaml
state: present
update_cache: yes
install_recommends: no
- name: files in /srv/journal-postfix
copy:
src: "srv/{{ item }}"
dest: "/srv/journal-postfix/{{ item }}"
owner: journal-postfix
group: systemd-journal
mode: 0644
force: yes
loop:
- run.py
- settings.py
- sources.py
- parser.py
- storage.py
- storage_setup.py
- README.md
- setup.cfg
- name: make some files executable
file:
path: "{{ item }}"
mode: 0755
loop:
- /srv/journal-postfix/run.py
- /srv/journal-postfix/settings.py
- name: determine whether to startup
set_fact:
startup: "{{ mailserver.postgresql.host is defined and mailserver.postgresql.port is defined and mailserver.postgresql.dbname is defined and mailserver.postgresql.username is defined and mailserver.postgresql.password is defined }}"
- name: file /etc/journal-postfix/main.yml
template:
src: main.yml
dest: /etc/journal-postfix/main.yml
owner: journal-postfix
group: systemd-journal
mode: 0600
force: no
- name: file journal-postfix.service
copy:
src: journal-postfix.service
dest: /etc/systemd/system/journal-postfix.service
owner: root
group: root
mode: 0644
force: yes
- name: enable systemd unit journal-postfix.service
systemd:
enabled: yes
daemon_reload: yes
name: journal-postfix.service
- name: restart systemd unit journal-postfix.service
systemd:
state: restarted
name: journal-postfix.service
when: startup

View file

@ -0,0 +1,45 @@
# Configuration for journal-postfix, see /srv/journal-postfix
# To enable startup of systemd unit journal-postfix set this to yes:
startup: {{ 'yes' if startup else 'no' }}
# PostgreSQL database connection parameters
postgresql:
hostname: {{ mailserver.postgresql.host | default('127.0.0.1') }}
port: {{ mailserver.postgresql.port | default('5432') }}
database: {{ mailserver.postgresql.dbname | default('mailserver') }}
username: {{ mailserver.postgresql.username | default('mailserver') }}
password: {{ mailserver.postgresql.password | default('*************') }}
# Postfix parameters
postfix:
# Systemd unit name of the Postfix unit. Only one unit is supported.
systemd_unitname: postfix@-.service
# If you have configured Postfix to rewrite envelope sender
# addresses of outgoing mails so that it includes a VERP
# (Variable Envelope Return Path) of the form
# {local_part}+{verp_marker}-{id}@{domain}, where id is an
# integer, then set the verp_marker here:
verp_marker: {{ mailserver.postfix.verp_marker | default('') }}
# Poll timeout in seconds for fetching messages from the journal.
journal_poll_interval: 10.0
# How much time may pass before committing a database transaction?
# (The actual maximal delay can be one journal_poll_interval in addition.)
max_delay_before_commit: 60.0
# How many messages to cache at most before committing a database transaction?
max_messages_per_commit: 10000
# Delete delivery records older than this number of days.
# A value of 0 means that data are never deleted.
# Note: Deliveries may have a substantial time intervals over which they
# are active; here the age of a delivery is determined by its start time.
delete_deliveries_after_days: 30
# The time interval in seconds after which a deletion of old
# delivery records is triggered. (Will not be smaller than
# max_delay_before_commit + journal_poll_interval.)
delete_interval: 3600