<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Data-Science on brege.org</title>
    <link>https://brege.org/categories/data-science/</link>
    <description>Recent content in Data-Science on brege.org</description>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright (c) 2016-2026 Wyatt Brege</copyright>
    <lastBuildDate>Sun, 12 Apr 2026 22:51:32 -0400</lastBuildDate>
    <atom:link href="https://brege.org/categories/data-science/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Exploring my camera, screenshot, and image activity</title>
      <link>https://brege.org/post/image-activity/</link>
      <pubDate>Wed, 11 Feb 2026 16:09:29 -0500</pubDate>
      <guid>https://brege.org/post/image-activity/</guid>
      <description>A data exploration project around my personal image collection habits.</description>
      <content:encoded><![CDATA[<p>This project is <a href="/series/personal-data-explorations">part of a series of data exploration projects</a> around my personal computer usage.</p>
<p>GitHub link: <a href="https://github.com/brege/image-activity">github.com/brege/image-activity</a></p>
<h2 id="overview">Overview</h2>
<ul>
<li>Generate heatmaps and histograms of image saving activity over hours, days, and months</li>
<li>Use file timestamps, modified-times, EXIF, and regex parsing for refined image discovery</li>
<li>Add bands and markers for major life events</li>
</ul>
<h2 id="background">Background</h2>
<p>I wanted to determine if my image activity is dependent on major events and device purchases in my life.</p>
<ul>
<li>do I tend to take more pictures during certain times of year?</li>
<li>how has my screenshot usage evolved over the last 15 years?</li>
<li>do I have &ldquo;honeymoon&rdquo; periods after a device purchase?</li>
<li>in what ways has my camera and screenshot usage changed between being an academic, chef, and developer?</li>
</ul>
<p>I&rsquo;m not a social media person, although my <a href="https://mastodon.social/@brege">mastodon</a> did see an uptick of usage following my hip surgery, where I began hiking and foraging a lot.</p>
<p>My image activity fits in three main categories:</p>
<ol>
<li><strong>camera</strong>: storage of camera photos from my phone</li>
<li><strong>screenshots</strong>: screenshots on both my laptop and phone</li>
<li><strong>internet</strong>: pictures downloaded from the internet</li>
</ol>
<h2 id="gallery">Gallery</h2>
<p>I&rsquo;ve marked in these first line charts, <a href="#camera-usage">Camera Usage</a> and <a href="#image-capture-concurrency">Image Capture Concurrency</a>, times when I&rsquo;ve purchased a major device (a new phone or laptop) and a couple key periods of my life. These plots have all been normalized to a 0-100 photo count scale.</p>
<h3 id="camera-usage">Camera Usage</h3>
<p>From 2010 to 2017 I was a Physics TA and, following my 2014 physics prelims, a computational astrophysics doctoral researcher. I began attending conferences in 2015, exploring places around Pullman, WA during these researcher years there.</p>
<img src="img/combined/panel.png" width="100%">
<p>At the end of 2017, I left that life. I embraced my love of food and cooking and became a professional chef for a number of years thereafter, including the Covid-19 pandemic. This period of my life saw a greater number of photos taken: pictures of plates, menus, schedules, etc. My camera photos before this time were mostly non-work-related: travel, events, and pets drove image origination.</p>
<h3 id="image-capture-concurrency">Image Capture Concurrency</h3>
<img src="img/combined/sum.png" width="100%">
<h3 id="heatmaps">Heatmaps</h3>
<p>I only have one experience with online coursework: the data science bootcamp I attended in the fall of 2023. This period did not have a major impact on my screenshotting habits. There are three principal areas in which screenshot usage was more frequent:</p>
<ol>
<li>The creation of my website <a href="https://brege.org">brege.org</a> around August 2016.</li>
<li>As an executive chef, screenshotting is recurrent for scheduling, text message records, receipts/purchase dates, etc.</li>
<li>Agentic-driven coding workflows, beginning midway through 2025, saw a surge in screenshot usage. Screenshots have become a large part of my front-end debugging workflow for web app development&ndash;extending well beyond data-structured <a href="https://www.cypress.io">Cypress</a> end-to-end tests.</li>
</ol>
<p>I did not find my screenshot usage noticeably change during my brief stint with online coursework.</p>
<table>
  <tr>
    <td><img src="img/screenshot/heatmap-laptop.png" width="100%"></td>
    <td><img src="img/screenshot/heatmap-phone.png" width="100%"></td>
    <td><img src="img/camera/heatmap-phone.png" width="100%"></td>
  </tr>
</table>
<p>In general, it appears that I take more screenshots on desktop earlier in the week and in the afternoon (averaged over the last ~15 years). To my surprise, the heatmap for screenshots on my phone have nearly identical densities. I assumed this would be biased toward the weekend and closer to 17:00 because of sports and restaurant dinner service.</p>
<p>Camera usage frequency, on the other hand, is made distinct by day of week only on density during Thursday evening and Saturday afternoon.  It&rsquo;s especially featured in both my Chef days and post-op mobility.</p>
<h3 id="histograms">Histograms</h3>
<p>By device and source, then binned on hours of the day, day of the week, and month of the year, histograms provide a finer distribution in one dimension.</p>
<table>
  <tr>
    <td><img src="img/screenshot/hour.png" width="100%"></td>
    <td><img src="img/combined/hour.png" width="100%"></td>
  </tr>
</table>
<p>For the hourly concentration of all three photo habits, my activity roughly follows a Boltzmann distribution.</p>
<p>These distributions generally peak at two distinct hours:</p>
<ul>
<li>camera photos and screenshots center around 15:00</li>
<li>internet photos are generally concentrated around 20:00</li>
</ul>
<p>Each bin is averaged for each picture type over the last 15 years, regardless of timezone.</p>
<table>
  <tr>
    <td><img src="img/combined/day.png" width="100%"></td>
    <td><img src="img/combined/month.png" width="100%"></td>
  </tr>
</table>
<p>Image activity generally increases at the beginning and end of standard university semesters, which also include the height of summer and the holiday period when I am always always travelling. Screenshotting is highest in the fall to mid-winter.</p>
<p>In my experience, restaurants are historically busier between, roughly, Friendsgiving and Father&rsquo;s Day. Camera usage also largest during high summer. Beach. Hiking. Produce selection during chef years.</p>
]]></content:encoded>
    </item>
    <item>
      <title>20 years of email</title>
      <link>https://brege.org/post/email-analysis/</link>
      <pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://brege.org/post/email-analysis/</guid>
      <description>Making my case that satisfaction surveys are the new email cancer.</description>
      <content:encoded><![CDATA[<p>I&rsquo;m not intentionally a data hoarder. I just haven&rsquo;t been an effective or aggressive email deleter or filter user. This has changed some in recent years, as the techniques for spam emails have evolved to covertly trojan &ldquo;survey&rdquo; subterfuge into my mailbox.</p>
<h2 id="i-have-survey-fatigue">I have survey fatigue</h2>
<p>Surveys are marketing emails. I can&rsquo;t believe I used to take the time to respond to some of them. An analysis on my email history has shown that my hunch on survey spam is correct. Around 2023, I began marking all surveys as spam, and I&rsquo;ve got the data to prove just how rampant companies have used surveys to get their brand in your inbox.</p>
<h2 id="emails-that-matter">Emails that matter</h2>
<p>My main goal in exploring my emails in-depth was to build a predictor of whether an email had any <strong>future usefulness</strong>.</p>
<p>Mail from friends and family, correspondence with students, receipts and financial records, etc. fit the <strong>binary of keep</strong>. After I manually processed this massive backlog of email (with great help of Thunderbird&rsquo;s filters), what I found I discarded most, besides spam, were surveys, newsletters, and other mass mailers.</p>
<p>What I found, qualitatively, was:</p>
<ul>
<li>imperfect spelling, capitalization, and grammar</li>
<li>little-to-no HTML markup</li>
<li>all emails meant only for me sans phishing and spam</li>
</ul>
<p>These traits defined true keepsakes.</p>
<h2 id="introducing-sanoma">Introducing sanoma</h2>
<p><a href="https://en.wiktionary.org/wiki/sanoma">en.wiktionary.org/wiki/sanoma</a></p>
<pre><code>sanoma (noun) Finnish 
  message, communication (a communication or the content of a physical
  message; also the message contained in some act or expression such as a
  work of art)
</code></pre>
<p><strong>sanoma</strong> (<a href="https://github.com/brege/sanoma">github.com/brege/sanoma</a>) uses YAML workflows to define multi-step analysis pipelines. The workflow runner automatically discovers and executes tools from the <code>sanoma/analysis/</code> and <code>sanoma/plot/</code> directories, making it easy to chain data extraction, filtering, analysis, and visualization into reproducible pipelines.</p>
<p>I developed this YAML workflow method in my Markdown-to-PDF project&ndash;<strong><a href="https://github.com/brege/oshea">oshea</a></strong>&ndash;where I realized comprehensive end-to-end tests were just manifest workflows. It&rsquo;s an intuitive way to string command line sequences together. The <em>pipeline</em> term in machine learning/data science is congruent to this system.</p>
<h2 id="data-mining">Data Mining</h2>
<p>While much of this can be done in a Jupyter notebook (far easier to refresh plots this way, although <code>:MarkdownPreview</code> in <strong>Neovim</strong> is sufficient), I built this project as a way to data-mine my own activity. I also want to create a visualization harness for many things on my computer:</p>
<ul>
<li>text message history</li>
<li>email history</li>
<li>screenshot frequency</li>
<li>browser history and bookmarks</li>
</ul>
<p>Because email is text-based, and because my first concept of &ldquo;AI&rdquo; was the need for combative spam filters that have been built over the last thirty years, email felt like a good starting point.</p>
<h2 id="grad-school-emails">Grad-school Emails</h2>
<p>The monthly timeline reveals the academic year rhythm: high volume during active semesters with dramatic drops during summer breaks and winter holidays. The 2016-2017 dip corresponds to the dissertation defense period, where militant email sanitation was a reprieve from LaTeX and simulation monitoring&ndash;hence the dip.</p>
<p>My personal dataset has about 35K emails between my grad-school emails and <a href="https://brege.org">my current website&rsquo;s</a> personal email. Not included are my Gmail and undergrad email(s). I plan on synchronizing those at a later date.</p>
<h3 id="grad-school-timeline-seasonality">Grad-school Timeline Seasonality</h3>
<p><img alt="Grad-school Emails (monthly)" loading="lazy" src="/post/email-analysis/img/wsu/timeline.png"></p>
<p>WSU&rsquo;s Okta system required changing passwords every 6 months, and some time after my defense my account died. I am thankful that I had a Thunderbird profile tucked away on a drive that allowed me to recover all of my university emails.</p>
<h3 id="grad-school-and-onward-histogram">Grad-school and onward Histogram</h3>
<p><img alt="Grad-school Emails (yearly)" loading="lazy" src="/post/email-analysis/img/wsu/histogram.png"></p>
<p>The year-over-year histogram demonstrates consistent academic seasonality, with September-April peaks and May &ndash; mid-August valleys across all years of graduate study. Even with teaching summer labs, the bureaucratic pressure in the summertime dies. I loved teaching in the summer.</p>
<h2 id="spam-marketing-and-surveys">Spam, Marketing, and Surveys</h2>
<p>The spam timeline shows minimal marketing emails pre-2010, followed by a sharp increase around university enrollment. By 2015, spam reached 60-80% of all emails and has remained consistently high. The GDPR implementation around 2018 created a spike in <code>unsubscribe</code> language as companies scrambled to comply with new regulations.</p>
<h3 id="marketing-spam-trends">Marketing Spam Trends</h3>
<p><img alt="Spam Timeline" loading="lazy" src="/post/email-analysis/img/spam/timeline.png"></p>
<p>The tail in the beginning of this timeline is presented for context.
It only includes a &ldquo;purified&rdquo; hotmail account mailbox from my teenage years that extended a bit into my undergrad years. Those years overlap with Gmail usage (not integrated into this data) and my GVSU university email.</p>
<h3 id="keyword-buckets">Keyword Buckets</h3>
<p><img alt="Spam Keywords" loading="lazy" src="/post/email-analysis/img/spam/keywords.png"></p>
<p>Another useful filter for spam emails is checking for keywords like <strong><code>unsubscribe</code></strong> in the message body.</p>
<p><code>unsubscribe_bait</code> dominates with over 12,500 matches, followed by <code>satisfaction</code> surveys (~8k) and direct &ldquo;survey&rdquo; requests (~4k). This reveals how modern marketing shifted from direct sales to engagement-focused tactics requesting feedback and reviews.</p>
<h3 id="conclusion-satisfaction-surveys-are-the-new-email-cancer">Conclusion: Satisfaction Surveys are the new email cancer</h3>
<p><img alt="Spam Heatmap" loading="lazy" src="/post/email-analysis/img/spam/heatmap.png"></p>
<p>The heatmap (filtered to post-2010) shows &ldquo;satisfaction&rdquo; spam as the most persistent threat, maintaining 20-25% frequency from 2012 onwards. Survey-based spam shows steady growth, intensifying after 2020, when both GDPR constraints pressured companies to invent new angles of attack, becoming increasingly desperate for customer &ldquo;feedback&rdquo; (attention) during the pandemic. <strong>Satisfaction feedback surveys are advertisements.</strong></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Flavor Network</title>
      <link>https://brege.org/post/the-flavor-network/</link>
      <pubDate>Wed, 04 Jan 2023 04:04:49 -0500</pubDate>
      <guid>https://brege.org/post/the-flavor-network/</guid>
      <description>This tool allows you to explore the flavor network, a social graph for flavor profiles. The network is based on the &lt;a href=&#34;https://karenandandrew.com/books/the-flavor-bible/&#34;&gt;&lt;em&gt;Flavor Bible&lt;/em&gt;&lt;/a&gt; and soon the companion book &lt;a href=&#34;https://karenandandrew.com/books/what-to-drink-with-what-you-eat/&#34;&gt;&lt;em&gt;What to Drink with What You Eat&lt;/em&gt;&lt;/a&gt;.</description>
      <content:encoded><![CDATA[




<link rel="stylesheet" href="/css/network.css">

<div id="network-title">
  <div id="recipe-link"></div>
</div>
<div id="network"
  data-nodes-path=/data/flavor/nodes.json 
  data-edges-path=/data/flavor/edges.json
  data-sim-path=/data/flavor/similarity.json
></div>
<div id="network-settings">
  <div id="physics-toggle">
    <label for="physics">
      dynamics
    </label>
    <input type="checkbox" id="physics" checked>
  </div>
  <div id="scroll-toggle">
    <label for="click-to-use">
      zoom lock
    </label>
    <input type="checkbox" id="click-to-use" checked>
  </div>
  <div id="lenses-dropdown">
    <label for="lenses" title="used to see flavor combinations by 'social circle' or 'besties'">
      lens:
    </label>
    <select id="lenses">
      <option value="similarity" title="'friends list' - see ingredients that have similar overall 'friendships' as my recipe items" selected>
        similarity
      </option>
      <option value="affinity" title="'besties' - see ingredients that are 'best friends' with my recipe items">
        affinity
      </option>
      <option value="hybrid" title="uses a mix of 'friendships' and 'besties' to see flavor combinations">
        hybrid
      </option>
    </select>
  </div>
</div>

<script src="https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js"></script>
<script src="/js/flavor-network.js"></script>




<link rel="stylesheet" href="/css/search-bar.css">
<div id="searchbox">
  <div id="search-form" data-search-path=/data/flavor/nodes.json>
    <input id="search-input" autofocus placeholder="Search.." aria-label="search" type="search" autocomplete="off">
  </div>
  <div id="search-results-container" aria-label="search results"></div>
</div>
<script src="/js/search-plots.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/3.4.6/fuse.min.js" ></script>


<p>This tool allows you to explore the flavor network, a social graph for flavor profiles.
The network is based on the
<a href="https://karenandandrew.com/books/the-flavor-bible/"><em>Flavor Bible</em></a> and soon the companion book
<a href="https://karenandandrew.com/books/what-to-drink-with-what-you-eat/"><em>What to Drink with What You Eat</em></a>.</p>
<p>Search for an ingredient you like, and the graph will refine to give you a web of ingredients that share highly similar flavor profiles.
Then, click on a new ingredient in the network to add it to your recipe above the search box (or to remove it).
Clicking on a recipe item or a node has the same effect.
Search is not sorted by the flavor metric, it is instead sorted <a href="https://fusejs.io/">lexically</a>.</p>
<p>In this way, you can start building out recipes, menu items and tastings from a consensus of flavor combinations.</p>
<h2 id="overview">Overview</h2>
<p>What you are seeing:</p>
<ul>
<li>the nodes with color are your recipe ingredients</li>
<li>the suggested ingredients are determined by <a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard <strong>similarity</strong></a> (default) or by one of the other options in the &lsquo;<strong>lens</strong>&rsquo; dropdown</li>
<li>if you choose the <strong>hybrid</strong> option, the suggested ingredients are fiducially split between:
<ul>
<li>the most similar ingredients in the flavor metric (<strong>similarity</strong>)</li>
<li>the most similar ingredients by text ranking (<strong>affinity</strong>)</li>
</ul>
</li>
<li>the edges from one ingredient to another are weighted by a consensus of chef and expert opinion <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></li>
<li>if your ingredient is missing, it was likely missing in the book (<em>quinoa</em>) or was pruned because its mentions were too sparse <sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></li>
</ul>
<p>If a node is present without an edge, it means that the ingredient has a very good similarity with your recipe, but wasn&rsquo;t mentioned (connected) in its book-entry literally.
Reconstructing &lsquo;ghost&rsquo; entries and connections by training a model with listed affinities is one of the ultimate goals of this project.</p>
<p>The amount of suggestions gradually decreases as you add more ingredients to your recipe.
This is for performance reasons, as with the physics simulation disabling itself at destabilization.
When that happens, maybe you discovered a flavor affinity.</p>
<p>I have included autogenerated links from your recipe basket to a few popular recipe resources above the network graph, including a database for cocktail mixing. <sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<h2 id="why-and-how">Why and how</h2>
<p>Understanding why I chose this text for the dataset is probably already apparent to its readers, but the key thing to take away is that the authors did a fine job formatting something computer readable and human usable&ndash;a rare feat!
Most importantly, it is aggregated from <em>chefs</em>, from real humans in kitchens doing what works, what&rsquo;s delicious, and what&rsquo;s in season.
Recipe API&rsquo;s don&rsquo;t have this kind of granularity, many rely too heavily on user data to seed recommendations.
To my knowledge, this is the only dataset of this kind.</p>
<p>Technical tools only involve <a href="https://visjs.org/">vis.js</a> for visualization and <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">BeautifulSoup</a> for parsing.
The data is scraped from the <em>Flavor Bible</em>, and the similarity matrix is calculated using <a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a> for <a href="https://en.wikipedia.org/wiki/Pairwise_comparison">pairwise comparisons</a>.
I am working on cleaning up the initial data with some mix of modern techniques with some concoction of
<a href="https://www.nltk.org/">nltk</a>,
<a href="https://github.com/seatgeek/fuzzywuzzy">fuzzywuzzy</a>
and/or
<a href="https://huggingface.co/docs/transformers/model_doc/bert">Bert</a>.
The current form was done entirely with regexp/bs4 parsing.
The suggested nodes can be improved by using a weighted Jaccard probability distribution (<a href="https://arxiv.org/abs/1809.04052">arXiv</a>).
Source code for this part of the calculation (the text → dataset chain) is <a href="https://github.com/brege/flavor-project">available on GitHub</a>.</p>
<h2 id="inspiration">Inspiration</h2>
<p>In 2019, I was helping fellow chefs come up with new specials.
At this point in time, we were rolling about four-ten new specials as a team every week,
ranging from brunch, cocktails, lunch, football apps and our highly anticipated farm-to-fork pop-up dinners.
But sometimes you just get plain stuck.
A good trick, at least for creativity, is to set rules so you have some boundaries to push.
But if you are going to set rules, they should at least solve a few things:</p>
<ol>
<li>do something new</li>
<li>use something old</li>
<li>feature three things in season</li>
</ol>
<p>I hate having extra stuff around, but I love new stuff coming in, yet I don&rsquo;t like wasting things, but then I actually look forward to doing inventory.  Ah, Schrodinger&rsquo;s cook.</p>
<p>Specials:  we would work out new ideas together over the prep table.
Sometimes ideas required working things out on paper, usually butcher&rsquo;s,
and occoasionally crude graphs of our plate setups evolved.
These were sketches of sauce and protein layouts, heavy edges between ideas if their pairing &lsquo;sang&rsquo;, then as a guide hanging from the ticket rail on the night a feature debuted.</p>
<p>Karen and Andrew&rsquo;s book was gifted to me later that year, and it changed my game.
It finally put in words a mental ranking of flavor profiles based on ingredient query.
I had a good resource that gave answers, and especially new ideas, quickly.
And it was thorough enough to trust.</p>
<p>This method was so helpful, I started dreaming of a computer tool to help me sketch out this process. I remembered a reddit
<a href="https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/">post</a>
that
<a href="https://www.reddit.com/r/dataisbeautiful/comments/ae88pk/interactive_visualization_of_related_subreddits/">spurred others</a>
to lay out some of the underlying ideas here: overlapping communities :: compatible flavors.</p>
<h2 id="broader-thoughts">Broader thoughts</h2>
<p>I believe the impact of mathematical concepts to the broader culinary scope to be a major upgrade in our thoughtfulness about food.
To extend its application, in creativity and clarity, not abused in statistics to pressure a sale and disable the <em>creative mind</em>.
While I do see how a tool like this could provide immense practical application in the distribution world, my focus here is to empower chefs, bartenders, brewers, baristas, and sommeliers to create new things.</p>
<p>When it comes to tools available to chefs,
compared to musicians, writers, and artists,
chef&rsquo;s are unfortunately at a disadvantage creatively.
Yes, we have recipes, but those are instructions, and do little to help us build on <a href="https://ruhlman.com/ruhlmans-books/">ratio</a> or <a href="https://www.saltfatacidheat.com/">balance</a>.
What might be more helpful, I think, is a playground for putting new food ideas together.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>In the book, the weight of the pairing is given by the emphasis of the text:</p>
<ul>
<li>normal text means mentioned by at least one expert</li>
<li><strong>bold</strong> is recommended by many experts</li>
<li><strong>BOLD CAPS</strong> is highly recommended</li>
<li>*<strong>BOLD CAPS</strong> is the &ldquo;Holy Grail&rdquo; of pairings</li>
</ul>
<p>If the ingredient is not mentioned, it is given no weight (or edge) but it does not mean a flavor pairing doesn&rsquo;t exist.
This is part of the purpose of this tool! Lastly, there are a few dozen mentions of &ldquo;Avoid&rdquo;, and should be thought of as opposite charges.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>If you encounter a bug, please feel free to contact me by <a href="mailto:wyatt@brege.org">email</a>
or open an issue on
<a href="https://github.com/brege/flavor-project/issues">GitHub</a>!&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>When <em>What to Drink</em> has been parsed and merged with the network, the latter link in the recipe site list should become much more robust. How fun!&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Les Miserables</title>
      <link>https://brege.org/post/les-miserables-graph-search/</link>
      <pubDate>Sat, 24 Dec 2022 05:30:47 -0500</pubDate>
      <guid>https://brege.org/post/les-miserables-graph-search/</guid>
      <description>A network graph of character connections from one of my favorite books
and authors of all time, Victor Hugo&amp;rsquo;s Les Miserables.</description>
      <content:encoded><![CDATA[<p><em>Les Miserables is one of my favorite books.  I read most of the original translation on a train ride to Portland, OR from Chicago, IL back in 2008 and enjoyed the remainder on the return trip back East.  It taught me compassion: when Valjean places the coin in Cosette&rsquo;s shoe.  Father Christmas always misses her.  There was an earlier passage of a man stepping on a coin in front of her, while she swept dressed in rags.</em></p>
<p>The graph may take a moment to load.</p>
<p>



<style>
  #network { height: 60vh; }   
</style>

<div id="network" data-nodes-path=data/nodes.json data-edges-path=data/edges.json></div>

<script src="https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js"></script>
<script src="js/lesmis-network.js"></script>



<link rel="stylesheet" href="/css/search-bar.css">
<div id="searchbox">
  <div id="search-form" data-search-path=data/nodes.json>
    <input id="search-input" autofocus placeholder="Search.." aria-label="search" type="search" autocomplete="off">
  </div>
  <div id="search-results-container" aria-label="search results"></div>
</div>
<script src="/js/search-plots.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/3.4.6/fuse.min.js" ></script>

</p>
<p>The search bar is the major addition to the graphing methods.
Nodes can be clicked and added to a subgraph builder.
You can continue to search for new node members in the search bar
(which has a rudimentary autofill that&rsquo;s a straight json query)
and clicking on them will add them to the builder.
Simultaneously, the graph will reduce to a graph containing only
all nodes with edges linked to nodes in the builder.</p>
<p>Items can be removed from the builder either by clicking the little builder tabs or re-clicking the node.  Clearing the builder bar completely will redraw the whole graph.</p>
<p>Testing and development was done on the mini pesto data set I made for <a href="/post/what-is-pesto/">What is Pesto?</a>.  Recipe builder coming soon(!)</p>
<p>Please email me at <a href="mailto:wyatt@brege.org">wyatt@brege.org</a> with any questions.</p>
<p>Dataset can be found here:</p>
<ul>
<li><a href="data/nodes.json"><code>nodes.json</code></a></li>
<li><a href="data/edges.json"><code>edges.json</code></a></li>
</ul>
<blockquote>
<p>Lingering annoyances:</p>
<ul>
<li>Slow</li>
<li>Javascript needs clean up</li>
<li>I have great fear running this on my 700x3000 dataset..</li>
</ul>
</blockquote>
]]></content:encoded>
    </item>
    <item>
      <title>Network Graphs with Images</title>
      <link>https://brege.org/post/network-graphs-with-images/</link>
      <pubDate>Wed, 21 Dec 2022 02:15:04 -0500</pubDate>
      <guid>https://brege.org/post/network-graphs-with-images/</guid>
      <description>A followup to the Network Graphs in Hugo post, this time with avatars for
the nodes.</description>
      <content:encoded><![CDATA[<p>This is a follow-up to the previous post <a href="/post/network-graphs-in-hugo/">Network Graphs in Hugo</a>.
I&rsquo;m feeling fruity.  These aren&rsquo;t <em>all</em> tree fruits, but a few clusters organized by tree grafting compatibility.</p>




<style>
  #mynetwork {
    background-color: #EFEBE9;  
    border-radius: 10px;
    border: 1px solid #cccccc;
    margin: 5px 0 40px 0;
  }
</style>

<div id="mynetwork" data-nodes-path=data/nodes.json data-edges-path=data/edges.json></div>

<script src="https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js"></script>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script src="js/fruit-network.js"></script>

<ol>
<li>
<p>Data for the network is stored in two separate JSON files in this page bundle:</p>
<ul>
<li><a href="data/nodes.json"><code>nodes.json</code></a></li>
<li><a href="data/edges.json"><code>edges.json</code></a></li>
</ul>
</li>
<li>
<p>The shortcode and post-local javascript work together:</p>
<ul>
<li><code>fruit-network.html</code></li>
<li><a href="js/fruit-network.js"><code>fruit-network.js</code></a>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl">{{ $nodesPath := .Get &#34;nodesPath&#34; }}
</span></span><span class="line"><span class="cl">{{ $edgesPath := .Get &#34;edgesPath&#34; }}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">style</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="p">#</span><span class="nn">mynetwork</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">background-color</span><span class="p">:</span> <span class="mh">#f5f5f5</span><span class="p">;</span> <span class="c">/* a medium gray color */</span>
</span></span><span class="line"><span class="cl">    <span class="k">border-radius</span><span class="p">:</span> <span class="mi">10</span><span class="kt">px</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">border</span><span class="p">:</span> <span class="mi">1</span><span class="kt">px</span> <span class="kc">solid</span> <span class="mh">#cccccc</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">margin</span><span class="p">:</span> <span class="mi">5</span><span class="kt">px</span> <span class="mi">0</span> <span class="mi">40</span><span class="kt">px</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">&lt;/</span><span class="nt">style</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;mynetwork&#34;</span> <span class="na">data-nodes-path</span><span class="o">=</span><span class="s">{{</span> <span class="err">$</span><span class="na">nodesPath</span> <span class="err">}}</span> <span class="na">data-edges-path</span><span class="o">=</span><span class="s">{{</span> <span class="err">$</span><span class="na">edgesPath</span> <span class="err">}}</span><span class="p">&gt;&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&#34;https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&#34;https://code.jquery.com/jquery-3.6.0.min.js&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&#34;js/fruit-network.js&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
</span></span></code></pre></div></li>
</ul>
</li>
</ol>
<p>This will provide network graph physics where the nodes are images (all sourced from <a href="https://www.wikipedia.org/">Wikipedia</a>. Hugo template for completeness:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl">{{&lt; fruit-network nodesPath=&#34;data/nodes.json&#34; edgesPath=&#34;data/edges.json&#34; scriptPath=&#34;js/fruit-network.js&#34; &gt;}}
</span></span></code></pre></div>]]></content:encoded>
    </item>
    <item>
      <title>Network Graphs in Hugo</title>
      <link>https://brege.org/post/network-graphs-in-hugo/</link>
      <pubDate>Fri, 09 Dec 2022 23:02:42 -0500</pubDate>
      <guid>https://brege.org/post/network-graphs-in-hugo/</guid>
      <description>First crack at making a simple toy network graph in Hugo.</description>
      <content:encoded><![CDATA[<p>This is a simple toy to see how a network graph can be added in a Hugo article.  I&rsquo;ll be testing new features on it as I learn new things.</p>




<div id="mynetwork" data-nodes-path="data/nodes.json" data-edges-path="data/edges.json">
  <script src="https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js"></script>
  <script src="js/toy-network.js"></script>
</div>

<p>Relative to the root of the Hugo website directory, here&rsquo;s some basic files to make this interactive.
Note that The JSON data and CSS is added inline here to make the scope of this tutorial focus on Hugo-specific structures.</p>
<ol>
<li>
<p>The javascript file lives in this page bundle:</p>
<ul>
<li><a href="js/toy-network.js"><code>toy-network.js</code></a></li>
</ul>
</li>
<li>
<p>This file accesses data for the nodes and edges from two JSON files in this page bundle:</p>
<ul>
<li><a href="data/nodes.json"><code>nodes.json</code></a></li>
<li><a href="data/edges.json"><code>edges.json</code></a></li>
</ul>
</li>
<li>
<p>In the shortcodes directory <code>/layouts/shortcodes/</code>:</p>
<ul>
<li><code>toy-network.html</code>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;mynetwork&#34;</span> <span class="na">data-nodes-path</span><span class="o">=</span><span class="s">&#34;data/nodes.json&#34;</span> <span class="na">data-edges-path</span><span class="o">=</span><span class="s">&#34;data/edges.json&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&#34;https://visjs.github.io/vis-network/standalone/umd/vis-network.min.js&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&#34;js/toy-network.js&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="cl"><span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span></code></pre></div></li>
</ul>
</li>
<li>
<p>Do the normal way of making a post in Hugo, but invoke the shortcode within the body of your markdown:</p>
<ul>
<li><code>index.md</code>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl">{{&lt; toy-network nodesPath=&#34;data/nodes.json&#34; edgesPath=&#34;data/edges.json&#34; scriptPath=&#34;js/toy-network.js&#34; &gt;}}
</span></span></code></pre></div></li>
</ul>
</li>
</ol>
<p>This will provide the simple network graph above.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Hockey Catch-all Statistics versus Salary Cap</title>
      <link>https://brege.org/post/hockey-fgvt-and-salary-caps/</link>
      <pubDate>Tue, 07 Nov 2017 11:11:52 -0800</pubDate>
      <guid>https://brege.org/post/hockey-fgvt-and-salary-caps/</guid>
      <description>&lt;p&gt;This project &lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; is motivated by the &amp;ldquo;&lt;a href=&#34;https://en.wikipedia.org/wiki/Wins_Above_Replacement&#34;&gt;WAR&lt;/a&gt;&amp;rdquo; stat in baseball, where I have adopted the &amp;ldquo;Goals vs. Threshold&amp;rdquo; (GVT) statistic from &lt;a href=&#34;https://web.archive.org/web/20130407214751/http://hockeyprospectus.com/article.php?articleid=236&#34;&gt;Tom Awad&lt;/a&gt;.  Here, I only consider the Offensive GVT for forward skaters and defensemen (OGVT).&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>This project <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> is motivated by the &ldquo;<a href="https://en.wikipedia.org/wiki/Wins_Above_Replacement">WAR</a>&rdquo; stat in baseball, where I have adopted the &ldquo;Goals vs. Threshold&rdquo; (GVT) statistic from <a href="https://web.archive.org/web/20130407214751/http://hockeyprospectus.com/article.php?articleid=236">Tom Awad</a>.  Here, I only consider the Offensive GVT for forward skaters and defensemen (OGVT).</p>
<p>I take as input the spreadsheet provided by <a href="http://www.hockeyabstract.com/testimonials/nhl2016-17playerdata">Robert Vollman</a>, which has not been updated with GVT data yet.  I made minor modifications to his spreadsheet in LibreOffice Calc to make it export to the CSV file format well.   The code calculates OGVT by player, which is weighted against his own team&rsquo;s Threshold Offensive Contribution by forwards ($TOC_F$), or defensemen ($TOC_D$), per minute, rather than league wide.</p>
<p>To get an estimate of how good a goal is compared to an assist, we estimate that a goal scored contributes 1.5 times as much as an assist contributes to a goal.  Therefore, the calculated goal value (or assist) scored by an entity $x$ is
$$
\begin{aligned}
GV_x &amp;= \frac{1.5 G_x}{A_x + 1.5 G_x}, \\
AV_x &amp;= \frac{GV_x}{1.5}
\end{aligned}
$$
where $G_x$ is goals scored by either an individual, $x=i$, team, $x=T$, or the league as a whole, $x=L$, and $A_x$ are the assists scored by those subcategories.</p>
<p>The total offensive contribution of all forwards, $TOC_F$, is determined by</p>
<p>$$ TOC_F = \frac{\sum_{f \in T} G_f \times GV_T + A_f \times AV_T}{\sum_{f \in T} MP_f} \times OTV$$
where $MP_f$ is the minutes by forward, and the offensive threshold value is $OTV = 0.75$ via Tom Awad or $0.58$ via Alan Ryder (I chose the former).  I chose an uppercase $F$ so that one may distinguish this value, which applies to <em>all</em> forwards on the team, from an individual forward, $f$.</p>
<p>The final formula to calculate $OGVT$ for each forward $f$ is, according to Awad, then
$$
OGVT = G_f \times GV_f + A_f \times AV_f - MP_f \times TOC_F
$$</p>
<p>Additionally, I wanted to get a sense for one player&rsquo;s value to the team  in relation to his salary cap hit.  Here, I show from the 2016-17 NHL regular season $OGVT$ versus Salary Cap for the Stanley Cup Champion Pittsburgh Penguins, the cap-troubled Detroit Red Wings, and the young Edmonton Oilers with generational talent Connor McDavid (only forward skaters).</p>


<div id="scatterplot" data-csv-path="data/NHL_PIT-DET-EDM_OGVT_2016-17.csv"></div>
<style> 
#scatterplot {
  margin: 10px 0;
  width: 100%;
  height: 500px;
  background: var(--code-bg);
  border: 1px solid var(--border);
  border-radius: var(--radius);
  padding: 1rem;
}

</style>
<script type="text/javascript" src="https://d3js.org/d3.v4.min.js" charset="utf-8"></script> 
<script type="text/javascript" src="/js/scatterplot.js"></script>

<p>However, in debugging my code, something seemed strange to me.  This first term in the $OGVT$ expression, with some math, reduces to the number of goals by that individual:
$$
\begin{aligned}
G_f \times GV_f + A_f \times AV_f
&amp;= G_f \times GV_f + A_f \times \frac{GV_f}{1.5} \\
&amp;= \left( G_f  + \frac{A_f}{1.5}\right )
\times GV_f \\
&amp;= \left( G_f  + \frac{A_f}{1.5}\right )
\times \left( \frac{1.5 G_f}{A_f + 1.5 G_f} \right) \\
&amp;= \left( 1.5 G_f  + A_f \right)
\times \left( \frac{G_f}{A_f + 1.5 G_f} \right) \\
&amp;= G_f.
\end{aligned}
$$
So, unless I&rsquo;m misunderstanding Tom Awad&rsquo;s definition of terms here:</p>
<blockquote>
<p>A player&rsquo;s OGVT is therefore:</p>
<p>OGVT = (G x GV) + (A x AV) - (MP x TOC)</p>
<p>Where G is the player&rsquo;s goals, A his assists, MP his minutes played, GV his goal value, AV his assist value, and TOC the Threshold offensive contribution value for his position.</p>
</blockquote>
<p>I don&rsquo;t quite understand how this first set of terms is relevant, as it essentially removes the direct value of a skater&rsquo;s assists in the calculation of this catch-all offensive statistic.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Actually, I mostly wanted to get some experience with <a href="https://d3js.org/">D3</a> and using publically accesible data.  I&rsquo;m still investigating why the axes titles aren&rsquo;t showing on my plot.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>]]></content:encoded>
    </item>
  </channel>
</rss>
