<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Personal Data Explorations on brege.org</title>
    <link>https://brege.org/series/personal-data-explorations/</link>
    <description>Recent content in Personal Data Explorations on brege.org</description>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright (c) 2016-2026 Wyatt Brege</copyright>
    <lastBuildDate>Sun, 12 Apr 2026 22:51:32 -0400</lastBuildDate>
    <atom:link href="https://brege.org/series/personal-data-explorations/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Exploring my camera, screenshot, and image activity</title>
      <link>https://brege.org/post/image-activity/</link>
      <pubDate>Wed, 11 Feb 2026 16:09:29 -0500</pubDate>
      <guid>https://brege.org/post/image-activity/</guid>
      <description>A data exploration project around my personal image collection habits.</description>
      <content:encoded><![CDATA[<p>This project is <a href="/series/personal-data-explorations">part of a series of data exploration projects</a> around my personal computer usage.</p>
<p>GitHub link: <a href="https://github.com/brege/image-activity">github.com/brege/image-activity</a></p>
<h2 id="overview">Overview</h2>
<ul>
<li>Generate heatmaps and histograms of image saving activity over hours, days, and months</li>
<li>Use file timestamps, modified-times, EXIF, and regex parsing for refined image discovery</li>
<li>Add bands and markers for major life events</li>
</ul>
<h2 id="background">Background</h2>
<p>I wanted to determine if my image activity is dependent on major events and device purchases in my life.</p>
<ul>
<li>do I tend to take more pictures during certain times of year?</li>
<li>how has my screenshot usage evolved over the last 15 years?</li>
<li>do I have &ldquo;honeymoon&rdquo; periods after a device purchase?</li>
<li>in what ways has my camera and screenshot usage changed between being an academic, chef, and developer?</li>
</ul>
<p>I&rsquo;m not a social media person, although my <a href="https://mastodon.social/@brege">mastodon</a> did see an uptick of usage following my hip surgery, where I began hiking and foraging a lot.</p>
<p>My image activity fits in three main categories:</p>
<ol>
<li><strong>camera</strong>: storage of camera photos from my phone</li>
<li><strong>screenshots</strong>: screenshots on both my laptop and phone</li>
<li><strong>internet</strong>: pictures downloaded from the internet</li>
</ol>
<h2 id="gallery">Gallery</h2>
<p>I&rsquo;ve marked in these first line charts, <a href="#camera-usage">Camera Usage</a> and <a href="#image-capture-concurrency">Image Capture Concurrency</a>, times when I&rsquo;ve purchased a major device (a new phone or laptop) and a couple key periods of my life. These plots have all been normalized to a 0-100 photo count scale.</p>
<h3 id="camera-usage">Camera Usage</h3>
<p>From 2010 to 2017 I was a Physics TA and, following my 2014 physics prelims, a computational astrophysics doctoral researcher. I began attending conferences in 2015, exploring places around Pullman, WA during these researcher years there.</p>
<img src="img/combined/panel.png" width="100%">
<p>At the end of 2017, I left that life. I embraced my love of food and cooking and became a professional chef for a number of years thereafter, including the Covid-19 pandemic. This period of my life saw a greater number of photos taken: pictures of plates, menus, schedules, etc. My camera photos before this time were mostly non-work-related: travel, events, and pets drove image origination.</p>
<h3 id="image-capture-concurrency">Image Capture Concurrency</h3>
<img src="img/combined/sum.png" width="100%">
<h3 id="heatmaps">Heatmaps</h3>
<p>I only have one experience with online coursework: the data science bootcamp I attended in the fall of 2023. This period did not have a major impact on my screenshotting habits. There are three principal areas in which screenshot usage was more frequent:</p>
<ol>
<li>The creation of my website <a href="https://brege.org">brege.org</a> around August 2016.</li>
<li>As an executive chef, screenshotting is recurrent for scheduling, text message records, receipts/purchase dates, etc.</li>
<li>Agentic-driven coding workflows, beginning midway through 2025, saw a surge in screenshot usage. Screenshots have become a large part of my front-end debugging workflow for web app development&ndash;extending well beyond data-structured <a href="https://www.cypress.io">Cypress</a> end-to-end tests.</li>
</ol>
<p>I did not find my screenshot usage noticeably change during my brief stint with online coursework.</p>
<table>
  <tr>
    <td><img src="img/screenshot/heatmap-laptop.png" width="100%"></td>
    <td><img src="img/screenshot/heatmap-phone.png" width="100%"></td>
    <td><img src="img/camera/heatmap-phone.png" width="100%"></td>
  </tr>
</table>
<p>In general, it appears that I take more screenshots on desktop earlier in the week and in the afternoon (averaged over the last ~15 years). To my surprise, the heatmap for screenshots on my phone have nearly identical densities. I assumed this would be biased toward the weekend and closer to 17:00 because of sports and restaurant dinner service.</p>
<p>Camera usage frequency, on the other hand, is made distinct by day of week only on density during Thursday evening and Saturday afternoon.  It&rsquo;s especially featured in both my Chef days and post-op mobility.</p>
<h3 id="histograms">Histograms</h3>
<p>By device and source, then binned on hours of the day, day of the week, and month of the year, histograms provide a finer distribution in one dimension.</p>
<table>
  <tr>
    <td><img src="img/screenshot/hour.png" width="100%"></td>
    <td><img src="img/combined/hour.png" width="100%"></td>
  </tr>
</table>
<p>For the hourly concentration of all three photo habits, my activity roughly follows a Boltzmann distribution.</p>
<p>These distributions generally peak at two distinct hours:</p>
<ul>
<li>camera photos and screenshots center around 15:00</li>
<li>internet photos are generally concentrated around 20:00</li>
</ul>
<p>Each bin is averaged for each picture type over the last 15 years, regardless of timezone.</p>
<table>
  <tr>
    <td><img src="img/combined/day.png" width="100%"></td>
    <td><img src="img/combined/month.png" width="100%"></td>
  </tr>
</table>
<p>Image activity generally increases at the beginning and end of standard university semesters, which also include the height of summer and the holiday period when I am always always travelling. Screenshotting is highest in the fall to mid-winter.</p>
<p>In my experience, restaurants are historically busier between, roughly, Friendsgiving and Father&rsquo;s Day. Camera usage also largest during high summer. Beach. Hiking. Produce selection during chef years.</p>
]]></content:encoded>
    </item>
    <item>
      <title>20 years of email</title>
      <link>https://brege.org/post/email-analysis/</link>
      <pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://brege.org/post/email-analysis/</guid>
      <description>Making my case that satisfaction surveys are the new email cancer.</description>
      <content:encoded><![CDATA[<p>I&rsquo;m not intentionally a data hoarder. I just haven&rsquo;t been an effective or aggressive email deleter or filter user. This has changed some in recent years, as the techniques for spam emails have evolved to covertly trojan &ldquo;survey&rdquo; subterfuge into my mailbox.</p>
<h2 id="i-have-survey-fatigue">I have survey fatigue</h2>
<p>Surveys are marketing emails. I can&rsquo;t believe I used to take the time to respond to some of them. An analysis on my email history has shown that my hunch on survey spam is correct. Around 2023, I began marking all surveys as spam, and I&rsquo;ve got the data to prove just how rampant companies have used surveys to get their brand in your inbox.</p>
<h2 id="emails-that-matter">Emails that matter</h2>
<p>My main goal in exploring my emails in-depth was to build a predictor of whether an email had any <strong>future usefulness</strong>.</p>
<p>Mail from friends and family, correspondence with students, receipts and financial records, etc. fit the <strong>binary of keep</strong>. After I manually processed this massive backlog of email (with great help of Thunderbird&rsquo;s filters), what I found I discarded most, besides spam, were surveys, newsletters, and other mass mailers.</p>
<p>What I found, qualitatively, was:</p>
<ul>
<li>imperfect spelling, capitalization, and grammar</li>
<li>little-to-no HTML markup</li>
<li>all emails meant only for me sans phishing and spam</li>
</ul>
<p>These traits defined true keepsakes.</p>
<h2 id="introducing-sanoma">Introducing sanoma</h2>
<p><a href="https://en.wiktionary.org/wiki/sanoma">en.wiktionary.org/wiki/sanoma</a></p>
<pre><code>sanoma (noun) Finnish 
  message, communication (a communication or the content of a physical
  message; also the message contained in some act or expression such as a
  work of art)
</code></pre>
<p><strong>sanoma</strong> (<a href="https://github.com/brege/sanoma">github.com/brege/sanoma</a>) uses YAML workflows to define multi-step analysis pipelines. The workflow runner automatically discovers and executes tools from the <code>sanoma/analysis/</code> and <code>sanoma/plot/</code> directories, making it easy to chain data extraction, filtering, analysis, and visualization into reproducible pipelines.</p>
<p>I developed this YAML workflow method in my Markdown-to-PDF project&ndash;<strong><a href="https://github.com/brege/oshea">oshea</a></strong>&ndash;where I realized comprehensive end-to-end tests were just manifest workflows. It&rsquo;s an intuitive way to string command line sequences together. The <em>pipeline</em> term in machine learning/data science is congruent to this system.</p>
<h2 id="data-mining">Data Mining</h2>
<p>While much of this can be done in a Jupyter notebook (far easier to refresh plots this way, although <code>:MarkdownPreview</code> in <strong>Neovim</strong> is sufficient), I built this project as a way to data-mine my own activity. I also want to create a visualization harness for many things on my computer:</p>
<ul>
<li>text message history</li>
<li>email history</li>
<li>screenshot frequency</li>
<li>browser history and bookmarks</li>
</ul>
<p>Because email is text-based, and because my first concept of &ldquo;AI&rdquo; was the need for combative spam filters that have been built over the last thirty years, email felt like a good starting point.</p>
<h2 id="grad-school-emails">Grad-school Emails</h2>
<p>The monthly timeline reveals the academic year rhythm: high volume during active semesters with dramatic drops during summer breaks and winter holidays. The 2016-2017 dip corresponds to the dissertation defense period, where militant email sanitation was a reprieve from LaTeX and simulation monitoring&ndash;hence the dip.</p>
<p>My personal dataset has about 35K emails between my grad-school emails and <a href="https://brege.org">my current website&rsquo;s</a> personal email. Not included are my Gmail and undergrad email(s). I plan on synchronizing those at a later date.</p>
<h3 id="grad-school-timeline-seasonality">Grad-school Timeline Seasonality</h3>
<p><img alt="Grad-school Emails (monthly)" loading="lazy" src="/post/email-analysis/img/wsu/timeline.png"></p>
<p>WSU&rsquo;s Okta system required changing passwords every 6 months, and some time after my defense my account died. I am thankful that I had a Thunderbird profile tucked away on a drive that allowed me to recover all of my university emails.</p>
<h3 id="grad-school-and-onward-histogram">Grad-school and onward Histogram</h3>
<p><img alt="Grad-school Emails (yearly)" loading="lazy" src="/post/email-analysis/img/wsu/histogram.png"></p>
<p>The year-over-year histogram demonstrates consistent academic seasonality, with September-April peaks and May &ndash; mid-August valleys across all years of graduate study. Even with teaching summer labs, the bureaucratic pressure in the summertime dies. I loved teaching in the summer.</p>
<h2 id="spam-marketing-and-surveys">Spam, Marketing, and Surveys</h2>
<p>The spam timeline shows minimal marketing emails pre-2010, followed by a sharp increase around university enrollment. By 2015, spam reached 60-80% of all emails and has remained consistently high. The GDPR implementation around 2018 created a spike in <code>unsubscribe</code> language as companies scrambled to comply with new regulations.</p>
<h3 id="marketing-spam-trends">Marketing Spam Trends</h3>
<p><img alt="Spam Timeline" loading="lazy" src="/post/email-analysis/img/spam/timeline.png"></p>
<p>The tail in the beginning of this timeline is presented for context.
It only includes a &ldquo;purified&rdquo; hotmail account mailbox from my teenage years that extended a bit into my undergrad years. Those years overlap with Gmail usage (not integrated into this data) and my GVSU university email.</p>
<h3 id="keyword-buckets">Keyword Buckets</h3>
<p><img alt="Spam Keywords" loading="lazy" src="/post/email-analysis/img/spam/keywords.png"></p>
<p>Another useful filter for spam emails is checking for keywords like <strong><code>unsubscribe</code></strong> in the message body.</p>
<p><code>unsubscribe_bait</code> dominates with over 12,500 matches, followed by <code>satisfaction</code> surveys (~8k) and direct &ldquo;survey&rdquo; requests (~4k). This reveals how modern marketing shifted from direct sales to engagement-focused tactics requesting feedback and reviews.</p>
<h3 id="conclusion-satisfaction-surveys-are-the-new-email-cancer">Conclusion: Satisfaction Surveys are the new email cancer</h3>
<p><img alt="Spam Heatmap" loading="lazy" src="/post/email-analysis/img/spam/heatmap.png"></p>
<p>The heatmap (filtered to post-2010) shows &ldquo;satisfaction&rdquo; spam as the most persistent threat, maintaining 20-25% frequency from 2012 onwards. Survey-based spam shows steady growth, intensifying after 2020, when both GDPR constraints pressured companies to invent new angles of attack, becoming increasingly desperate for customer &ldquo;feedback&rdquo; (attention) during the pandemic. <strong>Satisfaction feedback surveys are advertisements.</strong></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
