<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Benjamin Egelund-Müller</title>
    <link>https://b.egelund-muller.com/posts/</link>
    <description>Recent posts from Benjamin Egelund-Müller</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <managingEditor>newsletter@egelund-muller.com (Benjamin Egelund-Müller)</managingEditor>
    <webMaster>newsletter@egelund-muller.com (Benjamin Egelund-Müller)</webMaster>
    <copyright>Benjamin Egelund-M&amp;uuml;ller 2021</copyright>
    <lastBuildDate>Thu, 20 May 2021 14:31:02 +0000</lastBuildDate><atom:link href="https://b.egelund-muller.com/posts/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Stateful functions as a service</title>
      <link>https://b.egelund-muller.com/2021/stateful-functions-as-a-service/</link>
      <pubDate>Thu, 20 May 2021 14:31:02 +0000</pubDate>
      <guid>https://b.egelund-muller.com/2021/stateful-functions-as-a-service/</guid>
      <description>&lt;p&gt;One of the trends in cloud computing that I&amp;rsquo;m most excited about is stateful functions as a service (FaaS). The technology is still in its very early stages, but I think it&amp;rsquo;s the next leap for serverless.&lt;/p&gt;
&lt;h2 id=&#34;serverless-functions-today&#34;&gt;Serverless functions today&lt;/h2&gt;
&lt;p&gt;The serverless functions we know today are services like AWS Lambda and Google Cloud Functions. They essentially consist of one massive load balancing layer that receives all requests (for HTTP functions) and events (for event processing functions), and routes them to a machine that is running your function, dynamically spinning machines up and down depending on load.&lt;/p&gt;
&lt;p&gt;The result is a lovely developer experience where you just bundle up your code and hand it over to the cloud provider. You don&amp;rsquo;t have to worry about where your code runs or how it scales. And when there aren&amp;rsquo;t any requests or events to handle, it doesn&amp;rsquo;t cost anything!&lt;/p&gt;
&lt;p&gt;These functions are stateless, meaning they don&amp;rsquo;t have any persistent disks and don&amp;rsquo;t have any shared memory. The way they can keep state is by connecting to an external data system, such as Postgres, Redis or FaunaDB. That&amp;rsquo;s not a problem for many ETL and CRUD applications &amp;ndash; most services don&amp;rsquo;t rely on local disk or memory between requests anyway.&lt;/p&gt;
&lt;h2 id=&#34;use-cases-for-stateful-serverless-functions&#34;&gt;Use cases for stateful serverless functions&lt;/h2&gt;
&lt;p&gt;Nonetheless, there&amp;rsquo;s a long tail of interesting applications that rely on local state, and I think they&amp;rsquo;ll become more popular in the future. They&amp;rsquo;re things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Streaming aggregations that roll up many events over a period of time. For example, counting unique visitors per page in real-time.&lt;/li&gt;
&lt;li&gt;Streaming joins that pair events from different sources. For example, joining observations with matching timestamps from two different IoT sensors.&lt;/li&gt;
&lt;li&gt;Real-time collaboration, where many users edit the same object and receive updates. For example, multi-person document editing or multi-player games.&lt;/li&gt;
&lt;li&gt;Low-latency metadata lookups at scale. For example, routing tables or looking up user permissions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Today, you need machines with dedicated memory and disk space to handle these use cases. And you might also need a distributed configuration store like Consul to manage sharding, or a stream processing system like Flink to coordinate and shuffle data between machines.&lt;/p&gt;
&lt;h2 id=&#34;how-will-they-work&#34;&gt;How will they work?&lt;/h2&gt;
&lt;p&gt;A lot of the use cases for stateful functions depend on what kinds of guarantees the platform will be able to provide. The dream is for a function invocation to treat global state as an in-memory object that it manipulates without interference from concurrent functions. In practice, that&amp;rsquo;s probably not feasible and you might need some map/reduce-style logic for merging distributed state changes.&lt;/p&gt;
&lt;p&gt;There are some systems today that can claim they&amp;rsquo;re a &amp;ldquo;stateful FaaS platform&amp;rdquo; (see the resources below), but it&amp;rsquo;s far from a solved problem. There are many things to consider, such as consistency, isolation, exactly-once processing, latency, hot keys, etc. Under the hood, the platform will have to coordinate where to run functions depending on which physical machines hold what state, and also how to merge changes consistently. Data will have to be shuffled and replicated between machines depending on load. These problems become even harder if you want to run stateful functions in different geographic regions to get lower end-user latency.&lt;/p&gt;
&lt;h2 id=&#34;a-world-computer&#34;&gt;A world computer&lt;/h2&gt;
&lt;p&gt;If your serverless functions can manipulate global state without worrying about scalability, consistency or latency, you can effectively treat the stateful FaaS platform as one giant machine that never goes down. You could get rid of your external database altogether &amp;ndash; just write data to the global state, and read or update it in later function invocations. I like the idea of a &amp;ldquo;world computer&amp;rdquo;, a term the Ethereum project uses (Ethereum&amp;rsquo;s smart contracts are essentially stateful functions, although not scalable).&lt;/p&gt;
&lt;p&gt;In theory, it&amp;rsquo;s the end-state for cloud, where you write your code like it runs on one giant server that scales infinitely and is responsive across the globe. In practice, there&amp;rsquo;s probably going to be caveats, but it&amp;rsquo;s going to be exciting to see where it leads.&lt;/p&gt;
&lt;h2 id=&#34;resources&#34;&gt;Resources&lt;/h2&gt;
&lt;p&gt;Here are some useful resources for learning and thinking about stateful serverless functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://flink.apache.org/stateful-functions.html&#34;&gt;Stateful Functions (Apache Flink)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://arxiv.org/abs/2001.04592&#34;&gt;Cloudburst: Stateful Functions-as-a-Service (paper)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://blog.cloudflare.com/introducing-workers-durable-objects/&#34;&gt;Workers Durable Objects Beta: A New Approach to Stateful Serverless (Cloudflare)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ethereum.org/en/developers/docs/smart-contracts/&#34;&gt;Introduction to smart contracts (Ethereum)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Three inspiring developer experiences</title>
      <link>https://b.egelund-muller.com/2021/inspiring-developer-experiences/</link>
      <pubDate>Fri, 23 Apr 2021 10:00:00 +0000</pubDate>
      <guid>https://b.egelund-muller.com/2021/inspiring-developer-experiences/</guid>
      <description>&lt;p&gt;I love trying out new developer tools. Since I started building &lt;a href=&#34;https://about.beneath.dev&#34;&gt;Beneath&lt;/a&gt;, I&amp;rsquo;ve probably tested dozens of developer tools looking for &lt;del&gt;great ideas to steal&lt;/del&gt; inspiration.&lt;/p&gt;
&lt;p&gt;One thing I&amp;rsquo;m always on the lookout for are features that make me go &amp;ldquo;wow&amp;rdquo;. Developer tools tend to cover a lot of complexity, so that kind of experience isn&amp;rsquo;t all that easy to create. In this post, I&amp;rsquo;ve put together three &amp;ldquo;wow&amp;rdquo; experiences from different developer tools that I think are a great source of inspiration.&lt;/p&gt;
&lt;h2 id=&#34;example-1-project-setup-in-vercel&#34;&gt;Example 1: Project setup in Vercel&lt;/h2&gt;
&lt;p&gt;The first example that comes to mind is &lt;a href=&#34;http://vercel.com&#34;&gt;Vercel&lt;/a&gt;&amp;rsquo;s project setup. Vercel is a platform that helps frontend developers deploy websites.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;wow&amp;rdquo; experience is unmistakable the first time you create a new project in Vercel. You just select a template, connect to a Git provider, and boom! It creates a repo, builds, &lt;em&gt;and&lt;/em&gt; deploys the site globally right away. It feels pretty magical to have a website with CI/CD up and running before even pulling the source code.&lt;/p&gt;
&lt;video controls&gt;
  &lt;source src=&#34;https://b.egelund-muller.com/media/inspiring-developer-experiences/vercel.mp4&#34; type=&#34;video/mp4&#34;&gt;
  Your browser does not support HTML video.
&lt;/video&gt;
&lt;p&gt;Vercel bundles several best practices for modern web projects, such as CI/CD, branch deploys, serverless functions, and edge-caching. Today, even for a personal web project, those features are awesome to have, but normally each of them add more complexity. I think it&amp;rsquo;s impressive the way Vercel has managed to combine all these features in such a surprisingly simple way.&lt;/p&gt;
&lt;h2 id=&#34;example-2-python-package-management-in-replit&#34;&gt;Example 2: Python package management in Replit&lt;/h2&gt;
&lt;p&gt;The second example I want to highlight is &lt;a href=&#34;http://replit.com/&#34;&gt;Replit&lt;/a&gt;&amp;rsquo;s package management for Python. Replit is an online IDE that lets you write and run code in the browser. It has many neat features, including the ability to run web services and collaboratively edit code. I&amp;rsquo;ve used it several times for user workshops for testing Beneath&amp;rsquo;s Python developer experience.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;wow&amp;rdquo; experience I want to highlight is the way Replit automatically installs Python packages. If you&amp;rsquo;re in a Python environment in Replit, and you try to run a Python file that imports an external module, Replit will detect if it&amp;rsquo;s not already installed and add it to your environment using &lt;a href=&#34;https://python-poetry.org&#34;&gt;Poetry&lt;/a&gt;, a brilliant Python package manager.&lt;/p&gt;
&lt;video controls&gt;
  &lt;source src=&#34;https://b.egelund-muller.com/media/inspiring-developer-experiences/replit.mp4&#34; type=&#34;video/mp4&#34;&gt;
  Your browser does not support HTML video.
&lt;/video&gt;
&lt;p&gt;In contrast with Vercel&amp;rsquo;s project setup, this is certainly a small feature, but I&amp;rsquo;ve had some traumatizing experiences with Python package management, and I almost universally forget to run &lt;code&gt;pip install ...&lt;/code&gt; when running Python code, so when I first encountered this feature, I couldn&amp;rsquo;t help but smile!&lt;/p&gt;
&lt;p&gt;It was also my first exposure to Poetry, a tool I&amp;rsquo;ve since used for all my Python projects. While credit really goes to Poetry for a lot of the hard work of this feature, such as getting rid of &lt;code&gt;requirements.txt&lt;/code&gt;, I think it&amp;rsquo;s definitely clever how Replit spotted the opportunity to leverage Poetry to transparently provide such a delightful feature.&lt;/p&gt;
&lt;h2 id=&#34;example-3-ad-hoc-queries-in-bigquery&#34;&gt;Example 3: Ad-hoc queries in BigQuery&lt;/h2&gt;
&lt;p&gt;The last example I&amp;rsquo;ll share in this post is running ad-hoc queries with &lt;a href=&#34;https://cloud.google.com/bigquery&#34;&gt;Google BigQuery&lt;/a&gt;. BigQuery is a serverless data warehouse that&amp;rsquo;s part of the Google Cloud Platform. As with most data warehouses, its core feature is running SQL queries that aggregate or transform large datasets.&lt;/p&gt;
&lt;p&gt;Even after years of using it, BigQuery continues to elicit a &amp;ldquo;wow&amp;rdquo; from me when I need to quickly run an ad-hoc SQL query on a large dataset. Unlike most data warehouses, BigQuery is completely serverless and so massively parallelized that it runs most queries in seconds from a cold start regardless of the data size. I just open the console, type a query, click run, and get a crazy fast response.&lt;/p&gt;
&lt;video controls&gt;
  &lt;source src=&#34;https://b.egelund-muller.com/media/inspiring-developer-experiences/bigquery.mp4&#34; type=&#34;video/mp4&#34;&gt;
  Your browser does not support HTML video.
&lt;/video&gt;
&lt;p&gt;In this video, I ran a query against one of the built-in public datasets. It aggregated a 509gb dataset with more than one billion rows in 2.8 seconds from a cold start with no prior configuration. I didn&amp;rsquo;t have to deploy a cluster, or even consider the memory or disk requirements of the workers. BigQuery console isn&amp;rsquo;t a particularly great user experience, but it&amp;rsquo;s hard not to be awed at the scale of compute power BigQuery is able to unleash in an instant.&lt;/p&gt;
&lt;h2 id=&#34;wrapping-up&#34;&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;The three developer experiences highlighted in this post are pretty different. In the case of Vercel, they have managed to integrate several complex features in such a thoughtful way that the end experience becomes simpler. In the case of Replit, they have created a delightful affordance by cleverly baking in a powerful package manager. And in the case of BigQuery, the serverless query engine allows them to run surprisingly fast queries from a cold start. Despite these differences, I think they all share a delightful simplicity.&lt;/p&gt;
&lt;p&gt;I feel like I&amp;rsquo;ve only scratched the surface of &amp;ldquo;wow&amp;rdquo;-worthy developer experiences. In this post, I&amp;rsquo;ve focused on modern developer services, but it&amp;rsquo;s crazy to think about the magic embodied in many of the tools we take for granted, like compilers and text editors. I&amp;rsquo;ll be writing more on this topic in the coming weeks.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d also love to hear about your favorite developer experiences. Share them with &lt;a href=&#34;https://twitter.com/begelundmuller&#34;&gt;me on Twitter&lt;/a&gt;. If I get enough good input, I&amp;rsquo;ll compile a longer list!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Consuming → Producing</title>
      <link>https://b.egelund-muller.com/2021/consuming-producing/</link>
      <pubDate>Thu, 08 Apr 2021 14:07:52 +0000</pubDate>
      <guid>https://b.egelund-muller.com/2021/consuming-producing/</guid>
      <description>&lt;p&gt;Welcome to my new blog. I&amp;rsquo;ll be using it to write about and reflect upon the things that occupy me, such as data science, data engineering, developer tools, my experiences as a founder, book reviews, and random ideas.&lt;/p&gt;
&lt;p&gt;My motivation for starting this blog is to produce and share more content. I feel that it&amp;rsquo;s too easy to passively consume content on the internet without engaging, and sometimes even without reflecting. I hope that having a place of my own to write can make me a better internet citizen.&lt;/p&gt;
&lt;p&gt;I hope you will find something interesting and useful here.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>