{ "version": "https://jsonfeed.org/version/1", "title": "Eric Chen", "home_page_url": "https://ericchen.cc", "feed_url": "https://ericchen.cc/feed.json", "authors": [ { "name": "Eric Chen", "url": "https://ericchen.cc", "avatar": "https://ericchen.cc/assets/2015/01/eric-chen.jpg" } ], "icon": "https://ericchen.cc/assets/icons/android-chrome-256x256.png", "expired": false, "items": [ { "id": "/2019/01/28/a-curated-daily-automated-newsletter-for-tweets", "title": "A Curated, Daily, Automated Newsletter for Tweets", "content_html": "
I decided to write a Python script that emails me a daily newsletter of a curated digest of people’s “best” tweets. This way, I can quickly catch up on the highlights from lots of people’s Twitter feeds.
\n\nA sample of my daily newsletter of curated Tweets.
\n\nI like to keep a high signal-to-noise ratio in my Twitter feed. I follow < 50 people, and I strongly prefer people that tweet infrequently (< 5 tweets per day) and about things I find interesting almost every time.
\n\nHowever, there are many more people I would like to follow. Some of these people either tweet too frequently or inconsistently about things that I find interesting. Now that Twitter has an algorithmic timeline, I wish you could specify what types of content you like to see, similar how the Youtube recommended videos work. But, they don’t.
\n\nIn the past, I’ve just visited these a some people’s Twitter page directly every week or so and skimmed for their most interesting content. While this keeps them out of my daily Twitter feed, I still need to sift through many tweets to find the most interesting ones. I put together this small project to fix this.
\n\nI am using a couple of tools to get this job done:
\n\npython-twitter: a Python wrapper around the Twitter API
\nTwitter developer account: generate API keys to use in the Python script
\nHeroku: online environment for running the Python script remotely on a schedule
\nOne key part of curation is how much to curate. First, I score all of their tweets from the past week using score = (# of favorites) * (# of retweets)
, e.g. 5 favorites and 10 retweets gives a score of 50. Uing the past week of data, I calculate the cutoff score for the 95th percentile. Then, of the tweets that they had in the past day, if they are above the cutoff, I include them in the daily digest email. As a final step, I have a max absolute limit for tweets I will include.
The code is below. Note, I have omitted parts of the code specific to my personal Twitter/Gmail accounts. You will need to replace those with strings for your specific accounts. You can find all of these strings by searching for 'REPLACE'
.
import twitter\nimport numpy as np\nfrom dateutil.parser import parse\nfrom datetime import datetime, timezone, timedelta\n\ndef send_email(user, pwd, recipient, subject, body):\n \"\"\"Sends email through Gmail based on arguments.\n\n Based on:\n https://docs.python.org/3.4/library/email-examples.html\n\n Args:\n user: A string for the Gmail address to send mail from, include @gmail.com.\n pwd: A string for the Gmail account password corresponding to user.\n recipient: A string or list of strings for email recipients.\n subject: A string containing the email subject.\n body: A string containing the email body in HTML.\n\n Returns:\n This function does not return any values.\n \"\"\"\n import smtplib\n from email.mime import multipart, text\n\n # parse variables to use for email\n FROM = user\n TO = recipient if isinstance(recipient, list) else [recipient]\n SUBJECT = subject\n TEXT = body\n\n # Create message container - the correct MIME type is multipart/alternative.\n msg = multipart.MIMEMultipart('alternative')\n msg['Subject'] = SUBJECT\n msg['From'] = FROM\n msg['To'] = \", \".join(TO)\n msg.attach(text.MIMEText(TEXT, 'html')) # Attach part into message container.\n\n try:\n server = smtplib.SMTP(\"smtp.gmail.com\", 587)\n server.ehlo()\n server.starttls()\n server.login(user, pwd)\n server.sendmail(FROM, TO, msg.as_string())\n server.close()\n print('👍 successfully sent the mail')\n except:\n print(\"failed to send mail\")\n\n\ndef get_tweets(api, screen_name, max_time_days=1):\n \"\"\"Get recent tweets from user in timeframe.\n\n Based on:\n https://github.com/bear/python-twitter/blob/master/examples/get_all_user_tweets.py\n \n Args:\n api: An authenticated python-twitter API object.\n screen_name: A string for the twitter user to get tweets from.\n max_time_days: An integer for how many days of tweets to retrieve.\n\n Returns:\n A list of recent tweets from the user in the given timeframe.\n \"\"\"\n fetch_size = 25\n timeline = api.GetUserTimeline(screen_name=screen_name, count=fetch_size)\n earliest_tweet = min(timeline, key=lambda x: x.id).id\n earliest_tweet_date = parse(min(timeline, key=lambda x: x.created_at).created_at)\n current_date = datetime.now(timezone.utc)\n max_time_delta = timedelta(days=max_time_days)\n\n while max_time_delta > current_date - earliest_tweet_date:\n tweets = api.GetUserTimeline(screen_name=screen_name,\n max_id=earliest_tweet - 1, count=fetch_size)\n new_earliest = min(tweets, key=lambda x: x.id).id\n earliest_tweet_date = parse(\n min(tweets, key=lambda x: x.created_at).created_at)\n\n if not tweets or new_earliest == earliest_tweet:\n break\n else:\n earliest_tweet = new_earliest\n timeline += tweets\n\n timeline_filtered = [tweet for tweet in timeline if current_date -\n parse(tweet.created_at) < max_time_delta]\n return timeline_filtered\n\n\ndef score_tweet(tweet):\n \"\"\"Generate a significance score for a tweet\n \n Formula for scoring a tweet: (# favorites) * (# retweets)\n\n Args:\n tweet: A tweet to score.\n\n Returns:\n Integer for this tweet's score.\n \"\"\"\n return(tweet.favorite_count * tweet.retweet_count)\n\n\ndef curate_tweets(api, screen_name, curate_percentile=95, curate_days=1, \n curate_max=3, baseline_days=7):\n \"\"\"Curate recent best tweets from a user.\n\n Args:\n api: A list of tweets to convert to HTML.\n screen_name: A String of user whose timeline to curate from.\n curate_percentile: Integer for percentile to use as cutoff for curating tweets. \n Only include tweets that score at or above the percentile.\n curate_days: Integer for how many days to curate from.\n curate_max: Integer for maximum number of tweets to return. If more tweets\n reach the curate_percentile cutoff than curate_max, then include only the\n highest scoring tweets.\n baseline_days: Integer for how many days to form baseline score for this \n user's tweets. This baseline is used to calculate the cutoff score that\n corresponds to curate_percentile.\n\n Returns:\n A list of curated Tweets for specified screen_name user. Sorted in \n descending order based on score.\n \"\"\"\n # establish a baseline for twitter engagement\n historical_tweets = get_tweets(api, screen_name, max_time_days=baseline_days)\n list_scores = [score_tweet(tweet) for tweet in historical_tweets]\n cutoff_score = min(np.percentile(list_scores, curate_percentile), 1)\n\n # filter the timeline\n tweets_digest = get_tweets(api, screen_name, max_time_days=curate_days)\n tweets_filter = [tweet for tweet in tweets_digest\n if score_tweet(tweet) >= cutoff_score]\n tweets_sort = sorted(tweets_filter,\n key=lambda x: score_tweet(x), reverse=True) \n # if there are more tweets than curate_max, return only the highest scoring\n return(tweets_sort[:curate_max])\n\n\ndef clean_string(raw_string):\n \"\"\"Remove line breaks from a string\n\n Args:\n raw_string: A string to clean.\n\n Returns:\n A cleaned version of the string without line breaks.\n \"\"\"\n return(\n raw_string.\n replace('\\n', ' ').\n replace('\\r', '')\n )\n\n\ndef strings_to_html(list_strings): \n \"\"\"Converts a list of strings to an unordered HTML list\n\n Args:\n list_strings: A list of strings to turn into an HTML list of those strings.\n\n Returns:\n An unordered HTML list of list_strings.\n \"\"\"\n html = '<ul><li>' + '</li><li>'.join(list_strings) + '</li></ul>'\n return(html)\n\n\ndef tweet_to_html(tweet):\n \"\"\"Convert tweet object to string representation in HTML.\n\n Args:\n tweet: A tweet to convert to HTML.\n\n Returns:\n A string that is an HTML format for representing the tweet.\n \"\"\" \n tweet_text = clean_string(tweet.full_text)\n tweet_url = 'https://twitter.com/i/web/status/' + tweet.id_str\n # ' ' is a double space in html\n return('%s (<a href=\"%s\">→</a>)' % (tweet_text, tweet_url)) \n\n\ndef get_email_subject():\n \"\"\"Generate email subject based on current date.\n\n Returns:\n A string for the email subject.\n \"\"\"\n string_curr_date = datetime.now(timezone.utc).strftime('%B %d, %Y')\n return('Twitter digest for ' + string_curr_date)\n\n\ndef tweets_to_html(tweets):\n \"\"\"Produce an html representation for a list of tweets.\n\n Args:\n tweets: A list of tweets to convert to HTML.\n\n Returns:\n A cleaned version of the string without line breaks.\n \"\"\"\n if len(tweets) == 0:\n return('')\n else:\n user = tweets[0].user\n html = '<h3>%s (@%s)</h3>' % (user.name, user.screen_name)\n list_html_tweets = [tweet_to_html(tweet) for tweet in tweets]\n html += strings_to_html(list_html_tweets)\n return(html)\n\ndef get_name(api, screen_name):\n \"\"\"Get the name associated with a Twitter screen_name.\n\n Args:\n api: An instance of authenticated Twitter API\n screen_name: A string for twitter username to find name of\n\n Returns:\n String which is the name for given Twitter screen_name.\n \"\"\"\n user = api.GetUser(screen_name=screen_name)\n return(user.name)\n\n\ndef main():\n # Twitter API authentication\n CONSUMER_KEY = 'REPLACE'\n CONSUMER_SECRET = 'REPLACE'\n ACCESS_TOKEN = 'REPLACE'\n ACCESS_TOKEN_SECRET = 'REPLACE'\n api = twitter.Api(consumer_key=CONSUMER_KEY,\n consumer_secret=CONSUMER_SECRET,\n access_token_key=ACCESS_TOKEN,\n access_token_secret=ACCESS_TOKEN_SECRET,\n tweet_mode='extended',\n sleep_on_rate_limit=True)\n\n # list of screen names to curate from, sort by Twitter name\n list_screen_names = sorted([\n 'REPLACE',\n 'REPLACE',\n ], key=lambda x: get_name(api, x).lower())\n\n # send email\n FROM_GMAIL = 'REPLACE'\n GMAIL_PASSWORD = 'REPLACE'\n TO_EMAIL = 'REPLACE'\n subject = get_email_subject()\n text = ''\n for screen_name in list_screen_names:\n timeline_curate = curate_tweets(api=api, screen_name=screen_name)\n text = ''.join([text, tweets_to_html(timeline_curate)])\n\n send_email(FROM_GMAIL, GMAIL_PASSWORD, TO_EMAIL, subject, text)\n\nif __name__ == '__main__':\n main()\n
I decided to write a Python script that emails me a daily newsletter of a curated digest of people’s “best” tweets. This way, I can quickly catch up on the highlights from lots of people’s Twitter feeds.
\n", "date_published": "2019-01-28 00:00:00 +0000", }, { "id": "/2017/12/16/normalized-scoring", "title": "Normalized Scoring", "content_html": "One issue that I see with advanced analytics in basketball is that they aren’t great for cross-era comparisons. For example, points per possession and offensive efficiency (points per 100 possessions) both adjust for pace of play by recognizing that more possessions naturally lead to more points. However, a possession-based statistic doesn’t recognize the fact that, across eras, the expected value of a possession changes. In the 2002-2003 regular season, the Dallas Mavericks were the highest rated team in terms of offensive efficiency with 96.4 points per 100 possessions. In the 2016-2017 regular season, the Philadelphia 76ers were the worst rated team in terms of offensive efficiency with 100.7 points per 100 possessions. The inflation of the expected value of a possession is likely due to changes in NBA rules that favor offense, as well as the increased use of the 3-point shot.
\n\nTraditional statistics (e.g. points per game) and advanced analytics aren’t a good starting point for cross-era comparisons. But here’s an idea of a simple statistic for individual scoring that does hold up over time. You can think of it as an alternative to points per game (PPG) or offensive efficiency for an individual player. I call it normalized scoring. Normalized scoring describes what percentage of the team’s points that a particular player scored. If you add up the normalized scoring for all of the players on one team, you get 100%.
\n\nFor example, in the 2017 NBA finals Kevin Durant averaged 35.2 PPG. The entire Warriors team averaged 121.6 PPG. Thus, Durant’s normalized scoring for the 2017 NBA finals is 28.9%, i.e. he scored 28.9% of the team’s points.
\n\nYou can start to see how this is a useful cross-era comparison. Because it doesn’t matter what rules change or what styles of basketball get more or less popular, if someone in 1995 has higher normalized scoring than someone in 2017, then that’s a good indicator that they were a “better scorer”.
\n\nI know that this statistic doesn’t capture a lot of the nuance of basketball, and probably is not particularly useful for scouting and player evaluation. which are typical uses for advanced analytics. But I see normalized scoring as a replacement for cross-era comparisons of PPG. For example, if you want to compare the scoring ability of Michael Jordan in his NBA finals against Kevin Durant, then normalized scoring is a really simple, but powerful number that guards against modern biases that I described earlier. For me, I came up with this statistic when I was doing the most common of comparisons and thinking about who is the NBA GOAT. How did Michael Jordan’s NBA Finals performances compare to LeBron James or Kevin Durant? In 2017, Durant averaged 35.2 PPG. In 1998, Jordan averaged 33.5 PPG.
\n\nThe benefit of normalized scoring lies in its simplicity. All the number does is acknowledge that “points” in and of themselves don’t matter. If Adam Silver doubles the value of every shot/basket in basketball tomorrow, that doesn’t mean that every player is now twice as good at scoring. You don’t win basketball with absolute numbers, you win with relative numbers. So, by making a relative PPG metric, you do a much better job of isolating how much true “scoring” any player did. Because it’s simple, though, it doesn’t bring a lot of the baggage that other advanced analytics bring (WAR, PER). Normalized scoring simply makes explicit what all fans calculate implicitly. If someone averages 10 PPG in pickup basketball to 21, of course that is different than 10 PPG in an NBA game. Normalized scoring is just the formalization of that simple intuition.
\n\nBelow, are normalized scoring and PPG for all NBA Finals MVPs from 1990-2017. I compiled this list to show how normalized scoring is more informative than PPG.
\n\nYear | Player | Normalized Scoring | Points Per Game |
---|---|---|---|
1990 | Isiah Thomas (Detroit Pistons) | 25.8% | 27.6 |
1991 | Michael Jordan (Chicago Bulls) | 30.8% | 31.2 |
1992 | Michael Jordan (Chicago Bulls) | 34.5% | 35.8 |
1993 | Michael Jordan (Chicago Bulls) | 38.4% | 41.0 |
1994 | Hakeem Olajuwon (Chicago Bulls) | 31.2% | 26.9 |
1995 | Hakeem Olajuwon (Chicago Bulls) | 28.7% | 32.8 |
1996 | Michael Jordan (Chicago Bulls) | 29.4% | 27.3 |
1997 | Michael Jordan (Chicago Bulls) | 36.8% | 32.3 |
1998 | Michael Jordan (Chicago Bulls) | 38.1% | 33.5 |
1999 | Tim Duncan (San Antonio Spurs ) | 32.3% | 27.4 |
2000 | Shaquille O'Neal (Los Angeles Lakers) | 36.2% | 38.0 |
2001 | Shaquille O'Neal (Los Angeles Lakers) | 32.8% | 33.0 |
2002 | Shaquille O'Neal (Los Angeles Lakers) | 34.2% | 36.3 |
2003 | Tim Duncan (San Antonio Spurs ) | 27.5% | 24.2 |
2004 | Chauncey Billups (Detroit Pistons ) | 23.6% | 21.0 |
2005 | Tim Duncan (San Antonio Spurs ) | 24.2% | 20.6 |
2006 | Dwyane Wade (Miami Heat ) | 37.3% | 34.7 |
2007 | Tony Parker (San Antonio Spurs ) | 28.3% | 24.5 |
2008 | Paul Pierce (Boston Celtics ) | 21.4% | 21.8 |
2009 | Kobe Bryant (Los Angeles Lakers) | 32.2% | 32.4 |
2010 | Kobe Bryant (Los Angeles Lakers) | 31.5% | 28.6 |
2011 | Dirk Nowitzki (Dallas Mavericks) | 27.5% | 26.0 |
2012 | LeBron James (Miami Heat ) | 28.0% | 28.6 |
2013 | LeBron James (Miami Heat ) | 26.1% | 25.3 |
2014 | Kawhi Leonard (San Antonio Spurs ) | 16.9% | 17.8 |
2015 | Andre Iguodala (Golden State Warriors ) | 16.2% | 16.3 |
2016 | LeBron James (Cleveland Cavaliers) | 29.6% | 29.7 |
2017 | Kevin Durant (Golden State Warriors ) | 28.9% | 35.2 |
One issue that I see with advanced analytics in basketball is that they aren’t great for cross-era comparisons. For example, points per possession and offensive efficiency (points per 100 possessions) both adjust for pace of play by recognizing that more possessions naturally lead to more points. However, a possession-based statistic doesn’t recognize the fact that, across eras, the expected value of a possession changes. In the 2002-2003 regular season, the Dallas Mavericks were the highest rated team in terms of offensive efficiency with 96.4 points per 100 possessions. In the 2016-2017 regular season, the Philadelphia 76ers were the worst rated team in terms of offensive efficiency with 100.7 points per 100 possessions. The inflation of the expected value of a possession is likely due to changes in NBA rules that favor offense, as well as the increased use of the 3-point shot.
\n", "date_published": "2017-12-16 00:00:00 +0000", }, { "id": "/2017/11/01/making-measurements", "title": "Making Measurements", "content_html": "As I write this, the iPhone X is about to launch, and I’m surprised by how many phone reviewers (I’ve found 14) seem to conflate screen size to the diagonal length of the display.1
\n\n\n\nHowever, when you turn it on, the iPhone X is all screen. It doesn’t have the big top and bottom of the iPhone Plus models—it’s just screen. It’s beautiful.
\n\nThe iPhone 8 has a 4.7-inch screen; the iPhone 8 Plus a 5.5-inch screen and the iPhone X a 5.8-inch screen.
\n
\n\nThe iPhone X actually features Apple’s biggest phone display, measuring 5.8 inches on the diagonal, which is even bigger than the display on Apple’s largest iPhone, the iPhone 8 Plus, which measures 5.5 inches.
\n
Jefferson Graham of USA Today in a Q&A article:
\n\n\n\n[Q:] Gary Moskowitz: “What’s the size of the iPhone X?” [A:] That would be 5.8 inches, the largest screen size ever for an iPhone. The iPhone 8 Plus is 5.5 inches.
\n
\n\nOut of all the big new features Apple packed into its game-changing new iPhone X, the phone’s stunning edge-to-edge OLED display is easily the biggest. In fact, it’s the biggest screen the company has ever packed into an iPhone in its 10-year history. And yet, the iPhone X itself is actually smaller than the iPhone 8 Plus and iPhone 7 Plus in terms of height and width.
\n
\n\nThe 5.8-inch screen is the biggest on an iPhone to date…
\n
\n\nShe [my wife] found the physical size of the Plus phone to be too large. That’s one of the more interesting elements of the iPhone X – it has more screen, in a smaller package.
\n\n…
\n\nIt [iPhone X] is getting you more screen size in less space than the 8 Plus.
\n
Brian X. Chen of The New York Times:
\n\n\n\nFirst, the basics: The iPhone X has a 5.8-inch screen that is larger than the 5.5-inch display on the iPhone 8 Plus and the 4.7-inch screen on the iPhone 8.
\n
Steve Kovach of Business Insider:
\n\n\n\nThe best part is the screen. At 5.8 inches, it’s slightly larger than the iPhone 8 Plus screen, but the iPhone X’s body is only a little larger than the iPhone 8.
\n
\n\nIt’s [The iPhone X is] also easier to hold than the larger iPhone 8 Plus but offers a larger screen (5.8 inches versus 5.5 inches) since the display runs from edge to edge and top to bottom.
\n
Julian Chokkattu of Digital Trends:
\n\n\n\nWhat we like most about the iPhone X is its size. It feels compact — it’s slightly larger than the 4.7-inch iPhone 8, but it has a bigger screen than the 5,5-inch iPhone 8 Plus. The X is comfortable in the hand, and it feels remarkable to have so much more screen real estate than a cumbersome “plus-sized” phone.
\n
\n\nSporting a 5.8-inch screen, the display real-estate is bigger than the iPhone 8 Plus, but the chassis is considerably smaller, thanks to the shift in the display aspect.
\n
\n\nThe display, the largest screen ever on an iPhone despite the fact that the overall size of the X is only marginally taller than the iPhone 8, is beautiful.
\n
\n\nIf you’ve used any of the iPhone Plus range, you’ll get on instantly with this handset. It’s [iPhone X has] got a bigger screen than any other iPhone, and yet it’s smaller than the iPhone 7 Plus.
\n
\n\nIn fact, its [iPhone X’s] display is actually larger than the 5.5-inch screen on the physically larger iPhone 8 Plus.
\n
Here’s the crux of the issue. The iPhone X has a 19.5:9 aspect ratio. This means the screen is approximately twice as tall as it is wide. Previous iPhones have had 16:9 (iPhone 5 and later) or 3:2 aspect ratios (iPhone 4s and earlier).
\n\nOf course, no one is actually interested in the diagonal of a screen (it’s the area of the screen that matters), but if two screens have the same aspect ratio, then screen diagonal is a fine proxy.
\n\nHowever, if two screens have different aspect ratios, then the diagonal length of the display is misleading. Here’s an example to illustrate that, where I’ve drawn two shapes. On the left is a square with a diagonal of 1 unit. On the right is a rectangle with a diagonal of 1.5 units. The rectangle has a 50% longer diagonal. But by screen area, it’s 23% smaller than the square.
\n\nScreen size != screen area.
\n\nI can continue elongating the diagonal while maintaining a constant screen area by making the rectangle skinnier. There is no limit to how much longer I can make the diagonal, while keeping screen area constant.
\n\nNot all tech reviewers have overlooked this fact. Phone Arena did the math, and the screen area of the 5.8 inch iPhone X is 2.6% smaller than the 5.5 inch iPhone Plus. Vlad Savov of The Verge has also been on top of this.
\n\nIdeally, as phones move to different shapes (with cutouts, notches, and curved edges), we can talk about screen area instead of diagonal length.2 Even screen area isn’t a perfect proxy of the screen size, though. Because at the end of the day, a bigger screen size is only useful if it lets you do or see more. For some use cases, like watching a movie, this might mean that an 19:9 ratio with smaller screen area than 3:2 ratio is actually better. Or consider on laptops, where vertical space is often at a premium, so 3:2 ratio is preferred to 16:9 ratio, all else being equal.
\n\nThis is all to say that screen size cannot be summed up by one metric (as is often the case, things are more grey than they are black and white). But if I had to pick one metric, I certainly wouldn’t choose screen diagonal.3 Wouldn’t it be simpler if we just quoted screen area?
\n\nSome reviewers, though, have gotten it right.
\n\nMark Spoonauer of Tom’s Guide:
\n\n\n\n \nIt’s worth noting that this 5.8-inch screen gives you less viewing area than the 5.5-inch iPhone 8 Plus, because the iPhone X’s screen has a narrower aspect ratio.
\n
\n\n \niPhone X has a little bit less screen real estate (in terms of area) than iPhone Plus. The 5.8-inch screen has a more vertical element than its iPhone Plus sibling.
\n
In fact, the “diagonal” reported for phones like the iPhone X and Galaxy Note 8 are not even their actual diagonal. Even though these screens have rounded corners, when measuring the diagonal, the companies are measuring as if the screen were an actual rectangle.
\n\nFrom Apple’s iPhone X tech specs:
\n\n\n\nThe iPhone X display has rounded corners that follow a beautiful curved design, and these corners are within a standard rectangle. When measured as a standard rectangular shape, the screen is 5.85 inches diagonally (actual viewable area is less).
\n
What, then, are we actually measuring? How much can a corner be rounded off before we stop using it’s “rectangular” diagonal? What about measuring the rectangular diagonal of a circular display, because we can think of a circle as actually transcribed in a square, so that it actually has (to quote Apple) “corners within a standard rectangle.” This practice seem disingenuous, and another reason not to measure diagonal length of screens. ↩
\nI can’t vouch for the validity of this fact, but I recall reading that television manufacturers were a big proponent of diagonal measuring, especially as TVs moved from 4:3 aspect ratio to 16:9 aspect ratio. A flat-screen LCD at 16:9 with the same screen area as a CRT display at 4:3 could suddenly be marketed as significantly “larger.” ↩
\nAs I write this, the iPhone X is about to launch, and I’m surprised by how many phone reviewers (I’ve found 14) seem to conflate screen size to the diagonal length of the display.1
\n\nSome reviewers, though, have gotten it right. ↩
\nOur understanding of the world is often not veridical. How the world is framed often becomes reality—which is why I’ve been thinking about two words recently.
\n\nThe first is “feature phone,” which I put in quotes because feature phones are phones without features. They don’t do what an iPhone does. If this word was coined by “feature phone” manufacturers trying to make their phones seem better than they actually are, in this respect I don’t think they were successful. People use phones too much. I doubt anybody has walked into a store looking for an iPhone and come out with a dumb phone that was billed as a “feature phone.”
\n\nThe other word that I’ve been thinking about is “defined contribution.” Like feature phone, both of these words are sugarcoating the most important part of the word. Feature phone, the functionality of your smartphone. Defined contribution, the money you receive in retirement. In a defined contribution plan, like a 401(k) plan, you put away money to buy stocks, bonds, and other investments, and then can have access to them in your retirement, but you can put away as little money as you want. Your contribution isn’t strictly defined, and, most importantly, your benefits are not defined. In a defined benefit plan (i.e. pension), your benefits are set by multiple factors (e.g. length of employment, ending salary, etc.). in a defined contribution plan, your benefits are variable. Depending on when and how much you contribute in addition to your asset allocation, you might have large or small nest egg at retirement. This is not to say that a defined contribution plan is inherently better or worse, but the name is hiding it’s defining feature: that it’s not a pension plan, it’s value is variable.
\n\nSo let’s call the shots as they are. A “feature phone” is a dumbphone. A “defined contribution” plan is a variable benefit plan. 👌
\n", "url": "https://ericchen.cc/2017/06/12/bad-names-copy/", "summary": "Our understanding of the world is often not veridical. How the world is framed often becomes reality—which is why I’ve been thinking about two words recently.
\n", "date_published": "2017-06-12 00:00:00 +0000", }, { "id": "/2017/05/23/thoughts-about-bike-sharing", "title": "Thoughts about Bike-sharing", "content_html": "As I was coming out of an exam a couple of days ago, I saw something I’ve seen a couple times before.
\n\nThe truck that redistributes bike-sharing bicycles.
\n\nI’ve seen this bicycle redistribution truck around campus a few times since Princeton started its partnership in 2016 with Zagster to launch bike-sharing on campus. The program has 10 or so bike docking locations on campus that support a fleet of 50 bikes.
\n\nIt’s just a little bit odd and funny that managing the bike-sharing program (a program that is meant to promote less driving around campus and more biking) requires loading bicycles into the back of a truck and redistributing the bikes to different stations. And according to the minutes from the March 27, 2016, meeting of Princeton’s student government, this truck goes around campus twice (twice!) a week to redistribute bikes.
\n\nThen, I was interested in doing a little more research into the effectiveness of the bike-sharing. While I’m not a user of the service, if I was, I would have one requirement: can I find a bike every time that I need one. That is, can I be confident that when I go to a particular station to get a bike that there will be one there.
\n\nFor different users of the service, I think the minimum success rate is quite different. For a tourist visiting Princeton’s campus, if you want to rent a bike, but there are no available bikes to rent, then no big deal. You can walk around campus instead.
\n\nBut if you’re a student who relies on the service to get to lectures and classes, then if there’s no bike available that is actually a huge issue. Presumably, the whole point of using the bike-sharing service is so that you can allot less travel time. So if you try to rent a bike 5 minutes before lecture but don’t find one, then you will be late for lecture. If this happens a couple times, then you’ll probably just lose trust in the service, and get yourself a personal bike.
\n\nAnd that’s my qualm about the service: if one of its goals is to reduce the need to have a personal bike on campus for students, then I’m not sure if it can do that job. Similar to ride-hailing services, a bike-sharing service needs a large amount of inventory and liquidity to be thought of as a viable option. Though the more I think about bike-sharing, the more I am convinced that it is not for replacing a bicycle for a daily user.
\n\nAll of that being said, here’s an interesting exercise: what is the success rate required for someone to depend on a bike-sharing service instead of a personal bike. I think the answer to this question is almost identical to how one might answer the same question about using Uber vs. using a personal vehicle. For me, I wouldn’t be surprised if a 99% success rate is the threshold to clear. That would mean that only 1 out of every 100 times you went to pick up a bike you couldn’t get one. At this level of reliability, you begin to approach the reliability of using a personal bicycle (I reckon that 99% reliability for a personal bike is a fair estimate).
\n\nBut to get to 99% reliability, I wouldn’t be surprised if the Princeton bike-sharing program would need two or three times as many bikes as it does today. Unlike Uber, which can use surge pricing as an on-demand adjustment to bring more drivers onto its network during busy times, bicycles cannot be “dynamically” added into the network as demand changes. Of the docking locations I see around campus, it’s not uncommon for them to have no bikes at particularly busy parts of the day.
\n\nAnother factor that is likely preventing a large influx of more bikes: the cost. According to an exploratory report on bike-sharing by the Los Angeles County Metro, the per bike installation cost for bike-sharing is $3000 to $5000! According to the press release, Princeton has 50 bikes deployed, which means an estimated cost of $150,000 to $250,000.
\n\nI guess everything above really brings me to my original point of writing this blog post, which is that this morning (5/22/17) after eating breakfast, I set out to visit 9 different stations and document how many bikes were at each location. Zagster clearly has information about inventory and usage orders of magnitude better than what this informal survey. And perhaps, the Zagster app might also have information about how many bikes each station has (but a cursory look at their app’s landing page suggests that it does not tell you which stations have available bikes). But I still just wanted to get out and see the stations with my own eyes.
\n\nGoing into this, I had a suspicion that a few of the nine stations would have zero bikes. I admit, my bias is towards being skeptical of the effectiveness of the service. But to cut the suspense, of the stations I visited, only one of them (Firestone Library) had no bikes. However, to be fair, this morning was slightly drizzling, so I suspect that the usage rate of the bike-share is lower than normal. I think there are a couple more stations farther out of the main campus that I didn’t visit, but I was able to see basically all of the stations that are located on the main campus of Princeton.
\n\nLocation | \nNumber of Bikes | \n
---|---|
Lakeside Apartments | \n9 | \n
Lawrence Apartments | \n9 | \n
Computer Science Building | \n4 | \n
Carl Icahn Laboratory | \n4 | \n
Princeton Station | \n3 | \n
Richardson Auditorium | \n2 | \n
Frist Campus Center | \n1 | \n
Firestone Library | \n0 | \n
Here’s pictures of all of them (in the order that I visited) and some commentary.
\n\nI see this station quite frequently on my way to the dining hall, and it’s been empty many times before. But today, there are 2 bikes here.
\n\nThis is a trend that I see fairly frequently: locking a non-bike-share bike to the bike-share location. I see why this happens as some places around campus don’t have convenient locking posts, and even of those locations that do have bicycle posts, the bike-share ones are often of higher quality.
\n\nThis was surprising. Lawrence apartments house graduate students and are slightly off the main campus, but they do have quite a few bikes. I’m surprised that at 11:00 AM in the morning there were this many bikes still at the apartments. I would have guessed people would ride them into central campus in the morning.
\n\nAlso graduate student housing. Also has a lot of bikes.
\n\nThis is where the picture of the truck redistributing bikes is from.
\n\nThis station is frequently empty.
\n\nOf course, it was the last station I saw of the day that I saw that had zero bikes.
\n", "url": "https://ericchen.cc/2017/05/23/thoughts-about-bike-sharing/", "summary": "As I was coming out of an exam a couple of days ago, I saw something I’ve seen a couple times before.
\n", "date_published": "2017-05-23 00:00:00 +0000", }, { "id": "/2017/02/03/two-safari-quibbles", "title": "Two Safari Quibbles", "content_html": "Update: Since I originally wrote this post, Safari has gained favicon support. This change was enough to make me switch to Safari from Chrome.
\n\nAs a student, the flexibility of a PC is indispensable. And judging by the technology used by my peers, this is true of near every student. However, it is still true that the majority of my computing (maybe ~60%) happens in the web browser.
\n\nOn macOS, Safari and Google Chrome are the two powerhouse web browsers. Both have support for modern web standards, and both are very extensible. But Safari has two clear advantages: two-finger scrolling responsiveness and power efficiency. The best comparison I can make between two-finger scrolling in the two web browsers is scrolling in Android and iOS. Chrome feels like scrolling in Android: not bad, but not good. Safari feels like scrolling in iOS: fantastic. Once you tinker around in iOS, you realize how janky Android scrolling is (I use a Moto X Android phone). And there might not be a more important feature than power efficiency. Because of how heavily I use the web browser it is often the largest consumer of battery life.
\n\nHaving listed Safari’s advantages over Chrome, there are still two quibbles I have with Safari that keep me using Chrome. And both have to do with the way tabs are displayed in Safari. Note that I have Increase Contrast selected in Accessibility.
\n\nSafari tabs.
\n\nChrome tabs.
\n\nFirst, the text contrast of website titles in Safari is lower than in Chrome, which makes them harder to read. This quibble might be a function of using a non-retina MacBook Air, as I could see a sharper, more color accurate screen alleviating these issues. Even though Safari has lower contrast than Chrome, its text contrast is still above the 7:1 contrast ratio recommended by Apple.
\n\nWeb Browser | \nContrast Ratio | \nText Color | \n\t\tBackground Color | \n
---|---|---|---|
Chrome (foreground tab) | \n19.1:1 | \nrgb(0, 0, 0) | \n\t\trgb(243, 243, 243) | \n
Safari (foreground tab) | \n\t\t14.4:1 | \n\t\trgb(0, 0, 0) | \n\t\trgb(214, 214, 214) | \n\t
Chrome (background tab) | \n\t\t14.3:1 | \n\t\trgb(0, 0, 0) | \n\t\trgb(213, 213, 213) | \n\t
Safari (background tab) | \n\t\t10.7:1 | \n\t\trgb(0, 0, 0) | \n\t\trgb(185, 185, 185) | \n\t
The decreased legibility of Safari tab labels wouldn’t be such a large issue if not exacerbated by my second quibble: no favicons next to website titles.
\n\nHere’s my reasoning for why Safari does not show favicons. Safari tabs are implemented using native macOS tabs that can be found in TextEdit, Finder, etc. And in every other application, tabs are labeled only with text.
\n\nEven so, for me, favicons are the single most important identifier for different tabs. With favicons, I can glance at an icon instead of reading text to figure out which tab is which. Even better, as you navigate to different pages of a website, often the title will change, but the favicon does not. So the favicon offers a certain degree of reliability that text labels do not.
\n\nTo my point, where appropriate, Apple features icons on many other labels around macOS.
\n\nIcons used in System Preferences, Finder, and the “Command-Tab” Application Switcher.
\n\nEven Safari uses the Touch Bar to display favicons, not text labels.
\n\nImage from Apple.
\n\nFingers crossed—🤞🤞—here’s to favicons and increased text contrast in Safari tabs.
\n\n", "url": "https://ericchen.cc/2017/02/03/two-safari-quibbles/", "summary": "Update: Since I originally wrote this post, Safari has gained favicon support. This change was enough to make me switch to Safari from Chrome.
\n", "date_published": "2017-02-03 00:00:00 +0000", }, { "id": "/2017/02/02/What-Makes-a-Good-Reddit-Post", "title": "What Makes a Good Reddit Post?", "content_html": "Note: This analysis of Reddit was created for the final project of ELE/COS 381 taught in Fall 2016. For this project, Alan Chen, Luis Gonzalez, and I decided to apply the topics learned in class to an analysis of Reddit to understand what makes a “good” Reddit post.
\n\nThe link to the PDF of the report is here.
\n\nBy Alan Chen, Eric Chen, and Luis Gonzalez-Yante
\n\nIn ELE/COS 381, we have studied various networks both physical and digital. In particular, Chapter 8 of Networked Life focused on the study of topology and functionalities of social networks like Facebook and Twitter. In this project, we are interested in the study of Reddit and what specifically makes a good Reddit post.
\n\nThe frontpage of reddit.com.
\n\nReddit is a bulletin board of user-generated content. As opposed to strictly ordering results by date, Reddit’s user interface strongly focuses on showing users content that is popular with other users. The site does this through a voting mechanism, where each user can upvote and downvote specific posts to the site. For individual users participating on the site, there is a motivation to create posts that resonate with other users, and thus generate a large number of upvotes.
\n\nAnother key aspect of Reddit is its emphasis on sub-communities, which are each their own fiefdom on Reddit. According to redditmetrics.com](http://redditmetrics.com/history), on January 14, 2017, there were 1,005,275 unique subreddits. Communities are organized around topic or interest. Some communities such as /r/pics and /r/news are very broad and general interest communities. Other communities appeal to much smaller groups, such as /r/vexillology, which is dedicated to discussion and commentary about flags.
\n\nIt is important to note that a large component of the Reddit community does involve commenting. Comments, like posts, can be upvoted and downvoted, and comments are sorted by popularity. However, in this project, we decided to focus only on analyzing posts—which subreddits they are in and how many upvotes they receive—to simplify analysis. Further and more extensive work would likely include the study of comments, as well as analysis of the content of posts and comments.
\n\nWe decided to look at three different methods of quantifying what makes a good Reddit post. Reddit contains many emergent phenomena not explicitly designed for in the site and not completely obvious to new users. The site can be difficult to approach from the outside, but from our experience with the site (and one shared by the many frequent users of the site), we believe Reddit can be a funny, insightful, and engaging online community. Thus, ultimately our goals from this analysis were to be able to provide a set of conclusions and actions that might be given to a new user of the site in a “How-to use Reddit” guide.
\n\nFor questions addressed in sections x.1 and and x.3 that follow, we are interested in characteristics of different types of subreddits. We decided on three types: top 100, small (< 10,000 subscribers), and original content.
\n\nTop 100 | \n25 Small (<10,000} subscribers) | \n10 Original Content | \n
---|---|---|
AskReddit | \nOCPoetry | \nlexington | \n
pics | \nimprov | \nReadmyStory | \n
Three categories of subreddits analyzed in x.1 and x.3
\n\nTop 100 subreddits are the biggest 100 subreddits based on the number of subscribers. The largest subreddit on the site is /r/AskReddit with over 15 million subscribers, and the 100th biggest subreddit has just under 500,000 subscribers. In contrast, we sampled 25 small subreddits that were randomly selected among subreddits with less than 10,000 subscribers. Finally, we hand selected 10 subreddits that contained significant amounts of original content, where we defined original content as a post the user creates the material of the post and is not aggregating external content. It is important to note that the original content subreddits had subscriber counts more similar to the small category, as well.
\n\nA particular user will often follow general-interest subreddits, but may also follow one for her municipality or for a niche RPG that she plays. So these three categories of subreddits provide a broad perspective of both general-interest and niche communities that we believe accurately reflects the usage patterns of users on the site.
\n\nGiven the focus of Reddit on distinct communities, and with the over 1 million unique subreddit communities on the site, we expect that there will be large variations in communities. There might be countless ways to quantify the differences, but we decided on examining subreddits along two dimensions: individual engagement and community engagement.
\n\nIndividual engagement is defined as follows for any subreddit:
\n\n\\[\\text{individual engagement} = \\frac{\\sum\\limits_{u \\in \\text{top users}} \\frac{u_{\\text{posts in this subreddit}}}{u_{\\text{total posts}}}}{|\\text{top users}|}, \\quad \\text{individual engagement} \\in [0,1].\\]\n\nFor example, we might look at the top posts of /r/politics and generate and individual engagement score of 0.7. We calculate this number by looking at the users who author the top posts in a subreddit. For each user, we examine their post history, seeing how many posts are in /r/politics and how many posts are in other subreddits. We then calculate the proportion of posts that are in /r/politics for each user, and then average across all of the users who author the top posts in /r/politics to generate the individual engagement for the subreddit. A value of 1 for a given subreddit means that users are very engaged in that subreddit—they only ever post there. Whereas a value of 0.05 means that the top users in that subreddit only post to that subreddit 5% of the time.
\n\nCommunity engagement is defined as follows for any subreddit:
\n\n\\[\\text{community engagement} = \\frac{\\sum\\limits_{p \\in \\text{top posts}} \\frac{p_{\\text{upvotes}}}{\\text{subreddit subscribers}}}{|\\text{top posts}|}.\\]\n\nThe intuition for the metric is as follows: if the top posts in subreddits A and B both receive 1,000 upvotes on average, but if A has 1 million subscribers while B has only 100,000 subscribers, then subreddit B has 10x the community engagement of subreddit A. Similar to individual engagement, community engagement is a metric that is calculated for any particular subreddit. For the top posts in the subreddit, we normalized the upvotes with the number of subscribers to that subreddit. Then, we averaged over all the top posts in the subreddit.
\n\nOur intention for creating individual and community engagement based on the strong suspicion that some subreddits, perhaps those of a more niche topic, attract higher individual engagement, while general interest subreddits might generally have lower individual engagement. Additionally, because community engagement corresponds to what proportion of the community a top post needs to appeal to, we thought it reasonable that it’s easier to create a top post in a subreddit with low community engagement because you have to appeal to a smaller fraction of the subscriber base.
\n\nReddit users are, like most people, multidimensional and likely to frequent multiple subreddits. As they move between communities, they bring with them the context of the different communities they participate in. This context could take the form of knowledge of memes or inside jokes.
\n\nBecause subreddits are composed solely through contributions to the subreddit, we implemented a metric that takes the top posts at the current time of the subreddit, and analyzes all the posts of the authors of those top posts—who we call top users—to see how often they post are in the original subreddit a, and how often they post in the target subreddit b.
\n\n\\[\\text{participation}_{a \\to b} = \\frac{ \\sum\\limits_{u \\in \\text{top users}} 4 \\cdot \\frac{u_{\\text{posts in } a}}{u_{\\text{total posts}}} \\cdot \\frac{u_{\\text{posts in } b}}{u_{\\text{total posts}}}}{|\\text{top users}|}, \\quad \\text{participation}_{a \\to b} \\in [0,1].\\]\n\nBy taking the concentration of posts in the first subreddit, taken as a representation of the degree the author is a member of the first subreddit, and multiplying by the concentration of posts in the second subreddit, taken as a representation of the degree the author is a member of the second subreddit, we receive a metric describing the crossover between subreddits for a given author. This participation function is maximized when a user participates 50% in subreddit a and 50% in subreddit b and contains an extra factor of four to possible values between 0 and 1. The metric for the subreddit is the average of the values for its top users.
\n\nModeling subreddits as nodes in a network is an extension of Chapter 8: How do I influence people on Facebook and Twitter. An important notion from that chapter is that some links between users more important than others, such as links that were included in many shortest paths. Here, we extend that intuition to our participation metric which generates a network of weighted links, and we use the weights to directly infer which links are important in the network representation.
\n\nBecause of the lack of real names on Reddit, it is not immediately clear who are the “top users” and who are not. For example, on Twitter, the users with the most followers are often celebrities and public figures. But who are these top users of Reddit, and how dominant are they? Specifically, is it possible for the average user of Reddit to have a post become very popular in a subreddit? Or are the frontpages of subreddits controlled by an elite group of users?
\n\nTo answer these questions, we were interested in quantifying how good the top users are. To do so, we created the a metric called the Reddit Power Index (RPI). Fundamentally, RPI is a metric that can be calculated on any post, and is defined as follows:
\n\n\\[\\text{RPI}_\\text{post} = \\frac{\\text{post upvotes}}{\\text{subreddit avg. upvotes}}.\\]\n\nThus, the RPI for any post is the number of upvotes it has divided by the average number of upvotes for a post in that subreddit. An RPI below 1 means that the post is below average, and any RPI above 1 means that the post is above average.
\n\n\n\nBecause we are interested in categorizing subreddits as a whole, we extend the definition of RPI first to users and then to subreddits as follows:
\n\n\\[\\text{RPI}_\\text{user} = \\frac{ \\sum\\limits_{p \\in \\text{posts}} \\text{RPI}_{p} }{|\\text{posts}|},\\]\n\n\\[\\text{RPI}_\\text{subreddit} = \\frac{ \\sum\\limits_{u \\in \\text{top users}} \\text{RPI}_{u} }{|\\text{top users}|}.\\]\n\nNaturally, the RPI for a user is the average RPI of its posts, and the RPI for a subreddit is the average RPI of its top users.
\n\nWith RPI, we created a heuristic which indicates the difficulty of creating a top post in any given subreddit. A subreddit with high RPI is one dominated largely by users who consistently have successful posts. Whereas a subreddit with lower RPI is in some sense more “democratic” because the community surfaces posts from users who are closer to the average Reddit user in terms of past post upvote performance.
\n\nWe collected our data using Python scripts and the Python package PRAW, the Python Reddit API Wrapper. The wrapper authenticates using OAuth, creating a Reddit instance that contains the client_id, client_secret, password, and username, which the rest of the PRAW API acts upon. While Reddit has its own public API that exposes data through JSON, we preferred using PRAW as an intermediary because it handled authentication, rate-limiting, and exceptions, and allowed us to focus on writing the data collection scripts.
\n\nDiagram of our data collection process.
\n\nThe Reddit Instance generated by PRAW can access any subreddit, post, or user that is accessible through reddit.com. Additionally, PRAW generates iterables which are very useful for data collection. For example, we can create a PRAW object that is a list of all of the current top posts in a subreddit, and this object can be iterated through like any other list in Python. We then can get key data for each submission, such as author, upvote count, and title. Similarly, each Reddit user can generate a PRAW object which is a list of its recent posts. This technique of (a) iterating through the top posts in a subreddit, (b) finding the upvotes and author of each top posts, and (c) finding the history of posts for that user forms the foundation of our data collection techniques.
\n\nWhile PRAW simplified using the Reddit API, we still encountered significant roadblocks in regards to rate-limiting. Each request to the API can return a maximum of 100 posts at a time, and PRAW delays 2 seconds between API requests. In all of our scripts, there was the trade off where collecting more data resulted in longer execution times. We were able to strike a reasonable balance by often choosing to look at samples of top posts of size 25 to 100.
\n\nHowever, when collecting data for the RPI, these rate-limits actually became a bottleneck factor (script execution time took hours). Our issue was that calculating the RPI for each user often requires collecting average upvote data for dozens of subreddits, each requiring a scraping of random posts that took several seconds. We largely overcame the issue by caching the average upvote value for subreddits and storing this data in a CSV file so it was persistent across different executions. Then, for example, if we had already scraped /r/AskReddit before, our code gets the average upvotes from the local cache instead of hitting the API, which makes a multi-second operation essentially free.
\n\nAlso, we should note that many subreddits closely follow power-law distributions for the number of upvotes on a random post, which had a noticeable impact on our sampling results.
\n\nLog-log plot of upvotes for 1,000 random submissions to /r/AskReddit, a Top 100 subreddit.
\n\nThus, because rate-limits constrained the number of posts we could sample, there was the potential for statistics to be moved wildly by sampling one outlier post out of 100. We believe that the reason for power-law distributions for upvotes on Reddit posts is because of how Reddit surfaces content to users. By default, users of the site are shown posts that are already popular, and then as a function of their increased visibility, these popular posts will continue to receive more upvotes. This is analogous to the model of preferential attachment studied in Chapter 10: Does the Internet Have an Achilles’ heel of Networked Life. In preferential attachment, new nodes added to a network tend to connect to nodes which already have high in-degree, similar to how new upvotes tend to accumulate on posts that already have a lot of upvotes. Thus, when analyzing the RPI for subreddits in 4.3, because RPI for a subreddit depends on the average upvotes from various other subreddits, we believe the median is a more explanatory measure of typical RPI.
\n\nPlot of individual and community engagement for various subreddits.
\n\nThe category of Top 100 subreddits is clustered in the lower-left quadrant, meaning the top posts are authored by users with low individual engagement and those posts receive low community engagement. Given that these subreddits are general-interest, this pattern is not unsurprising. For instance, while many people might be interested in the funny pictures from /r/Funny, very few people will only be interested in funny pictures.
\n\nFor Original Content subreddits, it is difficult to discern a significant difference from the other categories in terms of individual engagement, but it is clear that community engagement is relatively low on the whole. Perhaps this a factor of subreddits driven by Original Content not needing to appeal to a very wide fraction of the subreddit.
\n\nFor small subreddits, we do see relatively large individual engagement and community engagement compared to the Top 100 subreddits. We propose the following explanatory mechanism: smaller subreddits are about more niche topics, so while many people might subscribe to /r/Funny because they like funny pictures, the only users that will subscribe to /r/lexington are people who live in Lexington, KY. As a result, this self-selection means that the typical subscriber to /r/lexington is more invested in the subreddit than the typical subscriber to /r/funny. As the plot shows, these self-selecting groups tend to have top posts from users with higher individual engagement (frequently above 0.5), and they can have very high community engagement.
\n\nOne interesting datapoint separate from the three categories that we added was /r/The_Donald. This subreddit is for supporters of Donald Trump and has created quite a bit of controversy on Reddit, with Reddit coming under fire for possible censoring of the subreddit, and members of the subreddit being accused of being toxic and harmful to the overall community. The controversial status of /r/The_Donald carries over to the plot, as the subreddit is a datapoint with no peers. It has extremely high individual engagement of 0.81, which means that the users who have the top posts in /r/The_Donald contribute 81% of their total posts to that community. The community engagement is the second highest of the subreddits we investigated, coming second to a much smaller community dedicated to Star Wars prequel memes (335,783 subscribers /r/The_Donald, 5,910 to /r/PrequelMemes).
\n\nFor users new to Reddit looking to create posts that rise to the top, we suggest looking at subreddits in the lower-left quadrant, with low individual and community engagement. We suggest to look for low individual engagement subreddits because these subreddits likely surface top posts that don’t require extensive history of context and knowledge about the particular inside jokes and idiosyncrasies of that subreddit. We also suggest subreddits with low community engagement because these are communities posts can become popular while appealing to a smaller proportion of the community. An example of a subreddit that matches these criteria is /r/PersonalFinance, with scores of 0.06 and 2.6e-4 for individual and community engagement respectively.
\n\nFirst we had to decide which subreddits to measure against each other. We first observed top subreddits, such as /r/funny and /r/pics, but what we found was that in all tested cases, the connection value was either negligibly low or 0. This is probably because the posters to those subreddits are so varied and their interests so varied that the size of samples we were taking did did not discover cross-pollination.
\n\nThe second tests we made were across political subreddits, including /r/The_Donald and /r/hillaryclinton, trying to match them with subreddits with similar views, such as /r/dncleaks and /r/enoughtrumpspam. However, because the top posters in political subreddits seem to keep very heavily inside those subreddits, they received 0 scores even with subreddits with similar views.
\n\nWe then decided to test a network known to be based on geography, local sports, to see if we could gain information about the sports tendency of areas and locations. We examined the “big four” of Philadelphia—basketball, football, baseball, and hockey teams—and their relatedness to themselves, as well as some of the relevant league subreddits (e.g. /r/NBA).
\n\nCross-pollination graph for Philadelphia sports teams.
\n\nHere in the graphs, red lines indicate a participation from one subreddit in another of .1 or greater, green lines indicate a participation from one subreddit in another of .01 or greater, and blue indicate any other non-zero participation. What we found was interesting. Comparing to known information, we can confirm that /r/timberwolves and /r/sixers users are a subset of /r/NBA. We can also confirm that Philadelphia is a football town, as is known, and also make the interesting observations that /r/NBA is the biggest sports subreddit, even though local people are most likely to follow the football team in addition to any other sports of the area.
\n\nWe also applied to same analysis to Minnesota sports teams.
\n\nCross-pollination graph for Minnesota sports teams.
\n\nThe above graph suggests that if you want to become a star in /r/Timberwolves, then you should also participate in /r/NBA and /r/MinnesotaTwins. There is likely context in terms of discussion, memes, and knowledge among these communities that is shared among the top users.
\n\nOverall we can conclude that subreddits with users that stay only in the subreddit have 0 cross pollination, and that it is clear that larger communities have more participation in and less participation out.
\n\nFuture improvements that could be made include scraping all posts of a given subreddit, though it would take much more time, and also nuancing the conclusions reached here by interlacing these results with the previous analysis of engagement of the top posts.
\n\nRPI is a metric we created that can help us pinpoint the difficulty of creating a top post in any given subreddit. The RPI also allows us to clearly see what subreddits are dominated by users that consistently have successful posts. All this really means is that with the RPI we are able to find subreddits that will have more or less “alpha redditors” that one will have to compete with for upvotes.
\n\nCategory | \nAverage RPI | \nMedian RPI | \n
---|---|---|
Top 100 | \n149.8 | \n13.8 | \n
25 Small (<10,000) | \n3.0 | \n2.0 | \n
10 Original Content | \n15.2 | \n2.6 | \n
RPI scores for the three different categories of subreddits examined.
\n\nOn the whole, RPI did provide a reasonable number we could use in analysis. We implemented a 10% trimmed mean in calculating the RPIsubreddit to protect the metric against large outliers. However, there are still cases such as /r/gaming in the Top 100 which had an outlier RPI of 4767 that skewed the average RPI of Top 100 subreddits upwards. This is why we believe that median RPIs are the better way to quanitfy the typical subreddit RPI in our categories.
\n\nShifting our focus to the Figure 8, we can see that the median RPI are more reliable. Subreddits that fall into the Top 100 category have significantly higher RPI than small or original content subreddits. This tells us that the popular posts in the Top 100 subreddits are typically created by users that typically get more than 10x the average number of upvotes! Conversely, our data show that the RPI for top users in subreddits less than 10,000 subscribers has a median value of 2, which is much closer to being an average Reddit user. The Original Content subreddits also have low RPI, though we suspect that this might be largely a function of size of subscribers, as the Original Content subreddits we examined happened to have smaller subscriber bases.
\n\nNew users of Reddit should target subreddits with RPI scores closer to 1. These subreddits commonly surface content from more average users, not just from Reddit superstars. Our data tell us that smaller subreddits, such as /r/improv with an RPI of 1.4 and 7,332 subscribers, typically have lower RPI scores. Even some larger subreddits, such as /r/Frugal with an RPI of 1.7 and 613,046 subscribers and /r/GetMotivated with an RPI of 1.6 and 9,779,376 subscribers, have low RPI scores as well.
\n\nPosting to a subreddit with low RPI does not guarantee success. What it does mean is that you are competing against more average users of Reddit to get the top posts, hopefully giving yourself a chance to standout.
\n\n“Man naturally desires, not only to be loved, but to be lovely.” Adam Smith wrote that line in 1759 in The Theory of Moral Sentiments, and the same could be said of our behavior on Reddit today. We, the users of the this social bulletin board called Reddit, crave recognition and popularity. We want to be “loved” and “lovely”—we want our posts to become popular.
\n\nTo that end, our analysis produced a few steps of action. There are clear differences between large and small subreddits in terms of individual and community engagement, as well as the RPI of top users. These differences are important to keep in mind for new users coming to the site, as it might be beneficial to begin in subreddits tailored to your interests while also keeping an eye out for low RPI, low engagement subreddits. Additionally, understanding the importance of what we termed cross-pollination will help you tailor which sets of communities to participate in. By deliberately choosing a set of subreddits, you can benefit from the same shared context that top users in those subreddits already have. Hopefully, with these takeaways, the factors of what makes a good Reddit post have been made clearer, and we can all use them to be more “loved” and more “lovely” on the site.
\n", "url": "https://ericchen.cc/2017/02/02/What-Makes-a-Good-Reddit-Post/", "summary": "Note: This analysis of Reddit was created for the final project of ELE/COS 381 taught in Fall 2016. For this project, Alan Chen, Luis Gonzalez, and I decided to apply the topics learned in class to an analysis of Reddit to understand what makes a “good” Reddit post.
\n", "date_published": "2017-02-02 00:00:00 +0000", }, { "id": "/2017/01/14/Mini-Buses-and-uberPool", "title": "Mini-Buses and uberPool", "content_html": "Last month, I started reading A Pattern Language: Towns, Buildings, Construction. It’s a book about architecture that contains 253 rules for building everything from metropolitan areas (2. The Distribution of Towns) to houses (221. Natural Doors and Windows).
\n\nThere’s so much to talk about from this book, but one pattern in particular caught my attention: 20. Minibuses. Here’s a quote from the passage:
\n\n\n\n\nBuses and trains, which run along lines, are too far from most origins and destinations to be useful. Taxis, which can go from point to point, are too expensive.
\n\nTo solve the problem, it is necessary to have a kind of vehicle which is half way between the two—half like a bus, half like a taxi—a small bus which can pick up people at any point and take them to any other point, but which may also pick up other passengers on the way, to make the trip less costly than a taxi fare.
\n\n…
\n\nThe system hinges, to a certain extent, on the development of sophisticated new computer programs. As calls come in, the computer examines the present movements of all the various mini-buses, each with its particular load of passengers, and decides which bus can best afford to pick up the new passenger, with the least detour.
\n
Replace “mini-buses” with “uber cars” and that quote reads like a convincing pitch for uberPool.
\n\nI don’t think that anyone could have reasonably predicted when the book was published in 1977 that, while mini-buses would not become widespread, just give the idea a couple of decades until the internet takes hold and Moore’s Law makes it possible to build an iPhone, and then the concept of a mini-bus would be possible.
\n\nI think there are two takeaways here:
\n\nA Pattern Language: Towns, Buildings, Construction stands the test of time. Mini-buses didn’t take hold, but the idea was clearly in the right direction. There’s a lot of “I never thought about it that way” moments in this book.
\nHuman ingenuity is a very strong force. I’m generally not overly optimistic on any one specific technology (e.g. A.I., genetic sequencing, renewable energy). But on the whole, I am optimistic that things will be better in a decade than they are today—and I do mean that in the broadest sense. Who knew 40 years ago that mini-buses would become uberPool. I don’t know what’s coming tomorrow. Whatever it is, though, it will probably be better than what we have today.
\nLast month, I started reading A Pattern Language: Towns, Buildings, Construction. It’s a book about architecture that contains 253 rules for building everything from metropolitan areas (2. The Distribution of Towns) to houses (221. Natural Doors and Windows).
\n", "date_published": "2017-01-14 00:00:00 +0000", }, { "id": "/2016/07/31/intellectual-property-stifles-tech-innovation", "title": "Intellectual Property Stifles Tech Innovation", "content_html": "Note: This op-ed was originally published on page six of the January 21, 2016 print issue of the Middleton Times-Tribune. This 750-word piece is the distillation of a much longer research paper I wrote for my college freshman writing seminar, Property, Wealth, and Equality.
\n\n\n\nAs a kid, I loved sledding down the front yard on wintry Wisconsin afternoons. I remember that one winter a while back, my older brother got a neon-green SnowSlider. This sled, to use tech parlance, disrupted my family’s sledding ecosystem. My chintzy plastic sled (the kind that is built like a miniaturized kiddie pool) was obsoleted by the svelte SnowSlider with its slick and speedy coating.
\n\nBut as much as I wanted to try my brother’s speedy new ride, he had no intentions of sharing. Even when I went out sledding by myself, I was never allowed to the use his sled.
\n\nOf course, we all have stories like this. Whether it’s between siblings or coworkers or friends, everybody has a story about being on the wrong end of greed and selfishness.
\n\nAnd recently, we all have witnessed this same phenomenon develop in the tech industry. Instead of claiming exclusive sledding rights, tech companies are fighting over intellectual property rights—think patents and copyrights. They are making expansive intellectual property claims that threaten to stifle future improvements and innovations.
\n\nThe solution to this problem is clear: intellectual property law must promote, not prohibit, technological innovation by limiting the scope of property claims. Without making these changes, we might never see the Amazons, Facebooks, and Ubers of tomorrow because the moat of intellectual property will have blocked their entrance.
\n\nIn fact, news came out recently that Google plans to re-architect its implementation of the Java APIs in the next major version of Android. Because Java APIs are like the foundation of the house, replacing them is not trivial. Even though the courts have already found that Google’s codebase is original and unique, an ongoing appeal forced Google to dedicate considerable efforts to what essentially amounts to treading water.
\n\nIf not for Oracle’s lawsuit, Google could instead spend these man-hours improving its software in tangible ways. It’s hard to quantify and report the loss of future innovation. But these losses are real, and we are starting to feel the treacherous side-effects today.
\n\nI recognize, though, that changing the law is no easy task (and rightly so). But there is precedent that can be considered when thinking about intellectual property in technology: water rights in the 19th century West. While 21st century technologists and 19th century settlers have little in common beyond entrepreneurial spirit, lessons learned from the West are still applicable today.
\n\nIn the West at this time, the first settlers had many advantages. Processing precious ore required building mills along rivers. And as the first to arrive, these settlers had their choice of the best land and water sources. Plus, cordoning off a large swath of waterway had the knock on effect of limiting others’ mining potential.
\n\nWhen legislation finally caught up to the miners, the rules changed. A use principle was enacted, which simply stated that you had a right to water only if you made use of it. Rampant speculation and exorbitant claims, like my brother’s exclusive right to the SnowSlider, were no longer valid.
\n\nWe can apply this principle to software, too. One function—one idea—can have a myriad of uses. You can use a dynamic pricing algorithm to sell stadium tickets (SeatGeek), taxi rides (Uber), or diapers (Amazon). Under the use principle, because each company uses the pricing algorithm for specific applications with unique codebases, all of them have the right to innovate, but none of them can claim ownership over the bigger idea.
\n\nHowever, some say that weakening intellectual property protection will actually discourage innovation by lowering financial incentives. While this is a valid concern, the reality of the legal landscape suggests a more pressing issue. According to Unified Patents, an industry organization dedicated to reforming patent law with members including Google and Adobe, patent trolls accounted for 92% of all patent lawsuits in technology in 2015. These patent trolls take hundreds of companies, big and small, to court at once, relying on the current intellectual property law to exact money from legitimate companies.
\n\nInstead of giving out broad intellectual property to the first creators of technology, we must reform intellectual property along the use principle to allow future companies to continue to innovate. Amazon didn’t create the first online marketplace, Google wasn’t the first to make a search engine, and Apple’s iPhone wasn’t the first phone. Let’s make sure they’re not the last.
\n\n", "url": "https://ericchen.cc/2016/07/31/intellectual-property-stifles-tech-innovation/", "summary": "Note: This op-ed was originally published on page six of the January 21, 2016 print issue of the Middleton Times-Tribune. This 750-word piece is the distillation of a much longer research paper I wrote for my college freshman writing seminar, Property, Wealth, and Equality.
\n", "date_published": "2016-07-31 00:00:00 +0000", }, { "id": "/2016/07/30/wait-on-matlab-timer-objects", "title": "Wait() on MATLAB Timer Objects", "content_html": "Unsurprisingly, MATLAB’s timer object is great for scheduling code to run at specified time intervals.
\n\nHowever, consider the following test case:
\n\nfunction [] = timerTest\n\n% initialize a timer\nt = timer('TimerFcn',@counter,'Period',1,'TasksToExecute',5,'ExecutionMode','fixedRate');\n\n% keep track of how many times the timer has run\ncount = 0;\n\n% start timer\nstart(t);\n\n % code that is executed on every timer cycle\n function counter(~,~)\n count = count + 1;\n end\n\n% display how many times the timer has run\ndisp(count);\nend\n
I expected that the Command Window would display 5. But it actually displays 1.
\n\nWhy? It turns out that timers don’t automatically block the main thread. After the timer is started, MATLAB continues to execute more code, while keeping track of the timer on the side. When the timer needs execute again, MATLAB stops, executes the timer callback function, and then continues on its way.
\n\nBut I wanted to process a matrix that the timer first had to populate, meaning that I needed the program to wait until the timer finished until continuing executing more code.1
\n\nTurn out, it’s just this simple:
\n\nstart(t)\nwait(t)\n
Add wait()
, and MATLAB will not continue executing code until the timer has finished.
My temporary hack (and I suspect for many others as well) was to add a pause()
that was as just longer than the expected timer duration. ↩
Unsurprisingly, MATLAB’s timer object is great for scheduling code to run at specified time intervals.
\n", "date_published": "2016-07-30 00:00:00 +0000", }, { "id": "/2016/04/30/Some-Thoughts-About-Institutions-and-Bureaucracy", "title": "Some Thoughts About Institutions and Bureaucracies", "content_html": "Out of curiosity, recently I’ve been reading through some of the reports included in Princeton University’s strategic planning framework, which “identifies key goals and major priorities for the University and that will serve as a guide for allocating University resources and prioritizing new initiatives.”
\n\nAlas, there is nothing of exceptional note to be found in these reports. They contain guidance and suggestions that, at least to someone engrossed in that bubble of higher-education right now, are exactly what you would expect an elite university would be planning to do in the next 5 to 10 years. There are recommendations to bolster study abroad efforts, to create a new interdisciplinary environmental institute, and to establish a statistics and machine learning department.
\n\nWhile I was somewhat disappointed that there were no juicy details to be discovered, I suppose that because universities are institutions that hope to exist in perpetuity, that necessarily demands a certain level of patience, deliberateness, and boringness in the overall decision-making process.
\n\nThat being said, there were a couple of nuggets that I picked out that are of passing interest—mostly in the “Wow, this is how bureaucracy works” sort of way.
\n\nFrom the report submitted by the Task Force on the Residential College Model
\n\n\n\n\nEncourage Community – During “study breaks,” place board games around in order to encourage students to engage socially beyond merely grabbing food and departing.
\n
First of all, “study break” is in quotes because it is a blatant misnomer. The correct phrase to use would be: university-sponsored-free-food-event.1 Something in this quote is just odd about how the university is so explicit in social-engineering. Of course, I recognize that everything the university sets up in its undergraduate education—even just the very idea of a centralized campus where people live and learn together—is to some degree in the name of socially engineering certain desirable outcomes and results. But is it necessary to go into such minutiae as to officially recommend placing out board games during a study break?
\n\nMore from that same report:
\n\n\n\n\nNomenclature: Encourage professional staff and students to refer to what is currently considered “upperclass housing” to “non-affiliated housing” to breakdown the bifurcated experience students feel as their “childhood” being in the colleges” and “adulthood” being outside of the colleges.
\n
“Childhood” and “adulthood”? I’ve been at the university for less than a year, but I’ve never once heard anyone at Princeton classify their experience with this dichotomy. If “childhood” is living in the residential college and having a university-sponsored meal plan. Then is we “adulthood” really moving into a different university dorm and joining an eating club?2 There are many ways in which the experiences of freshmen and sophomores differ from juniors and seniors, but moving to a new on-campus dorm and paying upwards of $9,300 for board doesn’t strike me as “adulthood.”
\n\nFrom the report submitted by the Committee on the Future of Sponsored Research:
\n\n\n\n\nNew internal support for research will strengthen the proposals of Princeton faculty, increasing the likelihood of positive funding decisions, with the overall outcome of a virtuous cycle of more funds for research, enhanced research activity, and improved prospects for acquiring additional funding.\n[…]\nThere is a real possibility that this could place Princeton at a considerable competitive advantage vis-à-vis its peers. Now is not\tthe time to retrench. There has hardly been a more exciting time in the life sciences.
\n
There has been much discussion recently about the flatlining of government research dollars flowing into universities. From the committee’s report, it appears that Princeton has been relatively resilient against these stresses, but the committee recommends that the university create robust mechanisms for internal research funding to foster bold research, and to better position Princeton researchers to obtain the limited government funds still available.
\n\nThis plan for increased internal funding does seem quite sensible: Princeton should use its current competitive advantage to position itself for future success. That’s the general formula to creating an institution in perpetuity. However, the scale of the endeavor cannot be underestimated. In the spring of 2014, the university had 495 professors, 80 associate professors, and 180 assistant professors. Under the proposal from the committee, four internal funds geared toward both early-career and mic-career professors would fund $7.4 million to 60 faculty members per year, partially funding approximately 8% of the faculty. And I say partially because in 2013 the average chemistry professor received $596,200 in new awards, the average physics professor received $363,800, and the average molecular biology professor received $688,400. Arguably, if government funding were to hold at current levels or even decrease by 10%, Princeton would have to do more than what is proposed here to meaningfully offset those effects. Of course, the good news is that $7.4 million per year can be sustained by a $148 endowment, and $148 million is only 0.63% of Princeton’s actual $22.3 billion endowment.
\n\nThis is all to say that while it’s possible for the university to become more self-reliant in research funding, it’s inconceivable in the foreseeable future that Princeton, and other universities, will not be primarily dependent on government dollars for research.
\n\nThat’s all of the thoughts I have for now about these reports, but I’ll probably come back to more of the strategic planning reports in the future. I’m starting to find that these sorts of primary source documents are much more interesting to read than the news releases that surround them. There is some level of exposure to the inner thought process of the authors of the reports that you can only understand from reading the reports yourself. So wherever your interests are, I would suggest that you do the same: find the relevant primary source documents and dig in. You don’t always need a journalist to stand between you and the information. If you want to be a journalist, read journalism. If you want to be a business executive, technologist, or anything else, read what the people in the industry are writing themselves.
\n\nActually, the fully-correct phrase is more like: university-sponsored-free-food-event-that-isn’t-actually-free-because-it-comes-out-of-ballooning-student-fees. (I see how “study break” is the more attractive word). ↩
\nJuniors and seniors typically join eating clubs, which, as their names suggest, are an upperclass dining option. They’re an anachronism of a time gone-by, and now occupy an often contested position as partial-substitutes to fraternities and sororities in the university’s social scene. ↩
\nOut of curiosity, recently I’ve been reading through some of the reports included in Princeton University’s strategic planning framework, which “identifies key goals and major priorities for the University and that will serve as a guide for allocating University resources and prioritizing new initiatives.”
\n", "date_published": "2016-04-30 00:00:00 +0000", }, { "id": "/2016/01/25/email-as-a-bulletin-board", "title": "Email as a Bulletin Board", "content_html": "Here is how to subscribe to a bulletin board type mailing list, high-volume and low importance, while maintaining your email sanity.
\n\nOn a high-volume email list like this, most of the emails are totally irrelevant to me. While I want to be able to scan through and catch any interesting upcoming events, I don’t want them cluttering up my inbox. So I hide them away.
\n\nThis is a typical email.
\n\nFor this, I use Google Inbox’s bundling feature. I filter out all emails from this email list and show the bundle once per day. Every morning, this bundle will show up in my inbox, and I can quickly scan the subjects and see if anything is of interest. It’s a nice daily briefing of interesting events happening around campus.
\n\nIt’s important to make sure that these emails don’t reach my viewable inbox as they arrive. It’s too easy to check for new emails every 15 minutes, even though they are of minimal importance.
\n\nHere are my bundle settings.
\n\nThis is the critical step. After a month on a real bulletin board, flyers are torn-off, covered up, or rained away. And because people treat these mailing lists like bulletin boards, it makes sense to treat these emails the same. I don’t want all of this cruft sitting around in my email. It makes searching more cumbersome and takes up storage space in my inbox.
\n\nIf Google stores your email, then you can do this by running a script on Google App Engine. I’m using a script from John Day, and I’ve set the automatic delete time to be 5 days.
\n\nPublic bulletin boards are a staple on college campuses.
\n\nBut more and more, information about campus happenings are advertised online through a combination of mostly Facebook and Email.
\n\nHearing about events through Facebook is great. Things come and go on the timeline. I can scroll through events, keeping note of the interesting ones. There is a very low mental burden of having all these events on my timeline. They just come and go without any hassle.
\n\nBut using email as a bulletin board is a little more problematic. Fundamentally, email is more like a physical mailbox than a public bulletin board. Ideally, I would like to keep the ratio of information-to-junk as high as possible, which is why I don’t subscribe to many promotional mailing lists. I have a higher standard of quality for emails than I do something on Facebook. Keeping the information density as high as possible means that the important emails aren’t loss through the fluff.
\n\nPlus, coming through a high-volume email list can start to feel like work. Once every two weeks there is an interesting email, but every other email puts an administrative and mental burden on you—forcing me to read, skim, and delete it.
\n\nThis solution takes that burden away. It coalesces what could be 10s of context switches during the day into one nice morning roundup. And you never have to worry about managing any of it.
\n\nInbox zero. 😊
\n\n", "url": "https://ericchen.cc/2016/01/25/email-as-a-bulletin-board/", "summary": "Here is how to subscribe to a bulletin board type mailing list, high-volume and low importance, while maintaining your email sanity.
\n", "date_published": "2016-01-25 00:00:00 +0000", }, { "id": "/2016/01/07/googles-share-button", "title": "Google’s Inconsistent Share Button", "content_html": "Google’s Material Design promises not only to bring consistency across products, but also to bring consistency within Google’s design language itself.
\n\nIn the Material Design documentation, there are best practices for system icons:
\n\n\n\n\nConsistency aids user comprehension of icons. Use the existing system icons whenever possible and across different applications.
\n
Which is all well and good. That’s what system icons should do. They should be consistent and clear. But as we start 2016, not everyone within Google has gotten the message yet.
\n\nThis is Google’s sanctioned share icon.
\n\n\n\nThe icon is three nodes with lines connecting each of them. There isn’t any universal icon for sharing, but this one is reasonably clear.
\n\nAnd most Google properties use the right one.
\n\nThe correct share icon in Google Play Newsstand, Google Photos, and Google Plus.
\n\nThis is Google’s reply icon.
\n\n\n\nHere is the reply icon in Gmail.
\n\n\n\nBut the Android YouTube app uses this.1
\n\n\n\nHere it is in action.
\n\nIt’s also next to the like/dislike buttons below the video (h/t Gazalan on Reddit).
\n\nThis icon is worse than being obscure. It’s confusing because it’s almost identical to the reply icon.
\n\nA bad icon points in the general direction of its functionality. A terrible icon boldly points in the wrong way.
\n\nUPDATE: It is now 2017, and Youtube still uses the reverse reply icon as its non-conforming share button. The Youtube app on Android just updated a couple of days ago, and now it even has the text “Share” below the icon. 😕
\n\n\n\nGoogle Trends also falls victim to the non-standard share icon. YouTube on the web does as well (it’s under the subscribe button). ↩
\nGoogle’s Material Design promises not only to bring consistency across products, but also to bring consistency within Google’s design language itself.
\n", "date_published": "2016-01-07 00:00:00 +0000", }, { "id": "/2015/02/06/clean-air-is-cheap-for-the-us", "title": "Clean Air is Cheap for the U.S.", "content_html": "In today’s print edition of The Economist, “The Cost of Clean Air” details the other side of the proverbial environmental coin:
\n\n\n\n\nZhang Minsheng, the owner, still gets some business from passing traffic. But the recent closure of nearby rock quarries, because of air-pollution restrictions, has taken its toll. He reckons his monthly income has fallen by 30-40% to around 4,000 yuan ($640). Next door a wholesale coal business has closed. So too have a small family-owned barbecue restaurant and an alcohol, tobacco and grocery store.
\n
To be clear, there is zero doubt in my mind that climate change is both real and caused by humans.
\n\nBut in the course of the United States’s Industrial Revolution, coal burned dirty and often. Environmental regulations were inmaterial. Simply put, the United States was in many ways like China is now.
\n\nMaking note of that simple fact—that the U.S. was once in present-day China’s very own shoes—is critical to informing the public debate around environmental regulations. While this doesn’t abolish China and other developing countries of responsibility in the matter of global warming, I do believe that it places the onus on the U.S. and other developed economies to lead the way with new technology and policy. To expect China to match the U.S. in efforts to reduce pollution is to multiply the story from The Economist millions of times over. The Chinese economy, like the U.S.’s did, relies heavily on coal and other environmentally unfriendly practices. While every country plays a role in the solving the climate change crisis, maybe the richest countries—who got their starts burning the same dirty coal—need to bear a larger burden than they currently do.
\n", "url": "https://ericchen.cc/2015/02/06/clean-air-is-cheap-for-the-us/", "summary": "In today’s print edition of The Economist, “The Cost of Clean Air” details the other side of the proverbial environmental coin:
\n\n\n\n\nZhang Minsheng, the owner, still gets some business from passing traffic. But the recent closure of nearby rock quarries, because of air-pollution restrictions, has taken its toll. He reckons his monthly income has fallen by 30-40% to around 4,000 yuan ($640). Next door a wholesale coal business has closed. So too have a small family-owned barbecue restaurant and an alcohol, tobacco and grocery store.
\n
To be clear, there is zero doubt in my mind that climate change is both real and caused by humans.
\n\nBut in the course of the United States’s Industrial Revolution, coal burned dirty and often. Environmental regulations were inmaterial. Simply put, the United States was in many ways like China is now.
\n\nMaking note of that simple fact—that the U.S. was once in present-day China’s very own shoes—is critical to informing the public debate around environmental regulations. While this doesn’t abolish China and other developing countries of responsibility in the matter of global warming, I do believe that it places the onus on the U.S. and other developed economies to lead the way with new technology and policy. To expect China to match the U.S. in efforts to reduce pollution is to multiply the story from The Economist millions of times over. The Chinese economy, like the U.S.’s did, relies heavily on coal and other environmentally unfriendly practices. While every country plays a role in the solving the climate change crisis, maybe the richest countries—who got their starts burning the same dirty coal—need to bear a larger burden than they currently do.
", "date_published": "2015-02-06 00:00:00 +0000", }, { "id": "/2014/10/25/mathematical-induction-divisibility-by-7-example", "title": "Mathematical Induction: Divisibility by 7 Example", "content_html": "Prove that \\(3^{2n+1}+2^{n+2}\\) is divisible by \\(7\\) for every nonnegative integer \\(n\\).
\n\nLet \\(P(n)\\) be: \\(7\\mid3^{2n+1}+2^{n+2}\\).
\n\nWe wish to show that \\(P(n)\\) is true \\(\\forall n \\geq0\\).
\n\nWe first verify the base \\(n=0\\) case:\n\\[3^{2\\times0+1}+2^{0+2}=3^1+2^2=7\\]\n\\[1 \\times 7 = 7\\]
\n\nWe wish to show that \\(P(k)\\implies P(k+1) \\, \\forall k \\geq 0\\).
\n\nWe assume \\(P(k)\\) to be true for some \\(k \\geq 0\\).
\n\nThus \\(7q=3^{2k+1}+2^{k+2}\\) for some \\(q \\in \\mathbb{Z}\\).
\n\nConsider
\n\n\\[\\begin{align*}\n& 3^{2(k+1)+1}+2^{(k+1)+2} \\\\\n&= 3^{(2k+1)+2}+2^{(k+2)+1} \\\\\n&= 9(3^{2k+1})+2(2^{k+2}) \\\\\n&= 9{(3^{2k+1}+2^{k+2})}-7(2^{k+2})\n\\end{align*}\\]\n\nWe have shown \\(3^{2(k+1)+1}+2^{(k+1)+2}=9(3^{2k+1}+2^{k+2})-7(2^{k+2})\\).
\n\nBecause \\(7\\mid3^{2k+1}+2^{k+2}\\) as assumed, and \\(7\\mid7\\) trivially, it follows that \\(7\\mid3^{2(k+1)+1}+2^{(k+1)+2} \\).
\n\nThus, \\(P(k)\\implies P(k+1) \\) as needed, so \\(P(n) \\) is true \\(\\forall n \\geq 0\\)
\n", "url": "https://ericchen.cc/2014/10/25/mathematical-induction-divisibility-by-7-example/", "summary": "Prove that \\(3^{2n+1}+2^{n+2}\\) is divisible by \\(7\\) for every nonnegative integer \\(n\\).
\n\nLet \\(P(n)\\) be: \\(7\\mid3^{2n+1}+2^{n+2}\\).
\n\nWe wish to show that \\(P(n)\\) is true \\(\\forall n \\geq0\\).
\n\nWe first verify the base \\(n=0\\) case:\n\\[3^{2\\times0+1}+2^{0+2}=3^1+2^2=7\\]\n\\[1 \\times 7 = 7\\]
\n\nWe wish to show that \\(P(k)\\implies P(k+1) \\, \\forall k \\geq 0\\).
\n\nWe assume \\(P(k)\\) to be true for some \\(k \\geq 0\\).
\n\nThus \\(7q=3^{2k+1}+2^{k+2}\\) for some \\(q \\in \\mathbb{Z}\\).
\n\nConsider
\n\n\\[\\begin{align*}\n& 3^{2(k+1)+1}+2^{(k+1)+2} \\\\\n&= 3^{(2k+1)+2}+2^{(k+2)+1} \\\\\n&= 9(3^{2k+1})+2(2^{k+2}) \\\\\n&= 9{(3^{2k+1}+2^{k+2})}-7(2^{k+2})\n\\end{align*}\\]\n\nWe have shown \\(3^{2(k+1)+1}+2^{(k+1)+2}=9(3^{2k+1}+2^{k+2})-7(2^{k+2})\\).
\n\nBecause \\(7\\mid3^{2k+1}+2^{k+2}\\) as assumed, and \\(7\\mid7\\) trivially, it follows that \\(7\\mid3^{2(k+1)+1}+2^{(k+1)+2} \\).
\n\nThus, \\(P(k)\\implies P(k+1) \\) as needed, so \\(P(n) \\) is true \\(\\forall n \\geq 0\\)
", "date_published": "2014-10-25 00:00:00 +0000", }, { "id": "/2014/10/17/taylor-swifts-top-5-not-billboard-Top-20", "title": "Taylor Swift’s Top 5 Not on the Billboard Top 20 Songs", "content_html": "This also appeared in the October 17 Parents’ Weekend edition of The Lawrence, which you can read here.
\n\nThere’s no need to beat around the bush; I’m a Swiftie. I am a member and occasional contributor to /r/TaylorSwift, frequently peruse AnalyzingTaylor.tumblr.com, and have created, for personal discovery and enrichment purposes, a histogram detailing my play counts of every single Taylor Swift song—the histogram is skewed right, if you were wondering. Needless to say, I know Taylor’s music all too well. Those of you who don’t share the same passion could still benefit from adding some diversity to your musical selections. In honor of Swift’s upcoming album release, I urge you to shake off the mainstream and be a little hipster with this list of the top 5 Taylor Swift songs that never cracked the Billboard Top 20.
\n\nHe said the way my blue eyes shined
\nPut those Georgia stars to shame that night
\nI said: \"That's a lie.\"
“Tim McGraw” is easily the best song on Swift’s 2008 eponymous debut album. In the song, Taylor reminisces over moments with her boyfriend who’s going off to college. The lyrics are rough and raw, but surprisingly mature. I shiver a bit every time I hear “The moon like a spotlight on the lake.” Don’t miss the music video for “Tim McGraw.” It’s cheesy in all the right ways and there’s this shot of Taylor leaning against a rustic fence with leaves all around her while playing the guitar that is honestly too perfect.
\n\nI'm not a princess, this ain't a fairy tale,
\nI'm not the one you'll sweep off her feet,
\nLead her up the stairwell.
Off Taylor’s second studio album Fearless, “White Horse” has brought me close to tears many, many times. It’s emotional from start to finish, but listen for that moment when the instruments drop off and it’s just Taylor; you will never forget that moment. I’ve forgotten everything about the day my little brother was born. I’ve forgotten to call my mom so many times (I’ll call you tomorrow mom, I promise). But this moment, this is something that I will never forget.
\n\nThere I was again tonight
\nForcing laughter, faking smiles
\nSame old tired lonely place
“Enchanted” from Taylor’s third studio album Speak Now, is hands down my favorite. The first 13 seconds are magical. I have no idea what synthesizer/instrument makes that sound, but it’s amazing.
\n\nAnd all I feel in my stomach is butterflies
\nThe beautiful kind, making up for lost time,
\nTaking flight, making me feel right.
“Everything has Changed,” a collaboration with the fiery Ed Sheeran on Taylor’s fourth studio album Red, is a moment when 2 plus 2 really does equal 5. The warmth of Ed’s voice and the treble of Taylor’s pair to create something as tender and delicate—in the best way—as a warm brioche.
\n\nShe almost called him on the night that he wrote
\nThese simple words on his goodbye note.
Because “Dark Blue Tennessee” was never officially released, the only way to listen to it is on YouTube with that terrible pitch shifting. In spite of that, the song still shines. It’s perhaps the most country of any song on this list. Taylor’s voice has a little twang, and the instrumentation is unapologetically small-town southern.
\n\n", "url": "https://ericchen.cc/2014/10/17/taylor-swifts-top-5-not-billboard-Top-20/", "summary": "This also appeared in the October 17 Parents’ Weekend edition of The Lawrence, which you can read here.
\n", "date_published": "2014-10-17 00:00:00 +0000", }, { "id": "/2014/09/24/graphing-the-play-counts-of-taylor-swift-songs", "title": "Graphing the Play Counts of Taylor Swift Songs", "content_html": "\n\nThis is a histogram showing the play counts in my music library for all of the songs on all four of Taylor Swift’s studio albums.
\n\nThe histogram is unimodal, which is unsurprising. I like most of the songs, dislike few, and really like a few. Also, the histogram is skewed right. This makes sense as I often put all four albums on shuffle, so every song gets a fair amount of plays. There are, however, some songs that I will purposely listen to, and they make up the tail of the histogram.
\n\nThere are no outliers, which means that even my favorite songs—Enchanted, White Horse, Fifteen1—don’t get played that much more than the other songs. I think having no outliers is my proof that I’m not that self-proclaimed “superfan” who listens only to her number one hits; I’m the real deal.
\n\nOn average, I have played each song 57 times.
\n\nThe standard deviation is about 22.
\n\nA non-exhaustive list. ↩
\nThis is a histogram showing the play counts in my music library for all of the songs on all four of Taylor Swift’s studio albums.
\n\nThe histogram is unimodal, which is unsurprising. I like most of the songs, dislike few, and really like a few. Also, the histogram is skewed right. This makes sense as I often put all four albums on shuffle, so every song gets a fair amount of plays. There are, however, some songs that I will purposely listen to, and they make up the tail of the histogram.
\n\nThere are no outliers, which means that even my favorite songs—Enchanted, White Horse, Fifteen1—don’t get played that much more than the other songs. I think having no outliers is my proof that I’m not that self-proclaimed “superfan” who listens only to her number one hits; I’m the real deal.
\n\nOn average, I have played each song 57 times.
\n\nThe standard deviation is about 22.
\n\nA non-exhaustive list. ↩
\nNote, I will use the past tense to refer to everything previous to Taylor’s new album, 1989, and the present tense to refer to the 1989 era.1
\n\nI haven’t been on the Taylor Swift bandwagon forever. When I first heard “Teardrops on My Guitar,” I wrote her off as kitschy and “a fad.”
\n\nBut as things have their way of happening, no longer than two years after my initial sour reaction, my opinions changed. I saw past my first impressions and fell in love with her music.
\n\nFive years ago I couldn’t have told you what made Taylor’s music was so magnetic. It just was.
\n\nNow, though, I know what that “X-Factor” was.2
\n\nHere are two examples:3
\n\n\n\n\n\nI was attracted to the anecdotes.
\n\nThose moments full of feeling and depth. I was a part of Taylor’s life. I was a part of her freshmen year. I knew about her crush on that guy named Drew. I knew about the time when she thought she fell in love. She could match the massive appeal of Katy Perry while writing about one, single guy.
\n\nWhich is way “Shake It Off” worries me.
\n\n\n\nWhat happened? I suppose this change was a long time coming. Taylor has sold more than 34 million albums,4 has won 226 awards, and lives in a New York City penthouse. So should I be surprised that Shake It Off lacks the small-town stories of yesteryear?
\n\nNo. I shouldn’t be. It was clear from Red that change was afoot.
\n\n\n\nWe Are Never Ever Getting Back Together (WANEGBT) was still personal, but all signs turned toward pop. The lyrics lacked passion. The days of faded blue jeans, old Chevy trucks, and princesses were gone. The song was no longer driven by songwriting. Instead, WANEGBT relied on the incessant beat of an electronic drum. Taylor even seemed to admit this herself by releasing a revised version of the song to country music radio stations.
\n\nHowever, even though Red signaled the impending change, it also reaffirmed Taylor’s country roots.
\n\n\n\nFrom all accounts, 1989 will be an all-pop album.5 But can I blame Taylor? No. Her songwriting has always been based on her personal experiences. Living in New York City, flying in a private jet, and hanging out with other pop stars is her new life. But for Taylor, that creates a paradox. She got her start writing about issues common to all teenagers. She, more than anyone else, could capture the emotion, intimacy, and heartbreak of growing up.
\n\nBut now, Taylor doesn’t live that life, which explains why Shake It Off lacks the “X-Factor” of her previous endeavors.
\n\nDon’t get me wrong, I think “Shake It Off” is a great song.6 But at its heart, it’s pop music.
\n\nIn two years, I wonder if I will still be listening to “Shake It Off” and the rest of 1989. My guess is probably not. I will, however, still be listening to Taylor Swift, Fearless, Speak Now, and half of Red. Thinking about it that way, there really is nothing to lament about Taylor’s move to pop music. She has given me, and millions of others, hours of music that I love, so who am I to complain?
\n\n1989 era meaning late 2014 onwards. ↩
\nNote, though, that what I’m going to say is far from novel. It’s been said many times before. ↩
\nCan we all also agree that her old music videos were so much better? ↩
\n26 million from this 2011 press release plus 8 million from Red. ↩
\nIt won’t even be officially released to country music stations. ↩
\nI’ve already played it 44 times according to my music app. ↩
\nNote, I will use the past tense to refer to everything previous to Taylor’s new album, 1989, and the present tense to refer to the 1989 era.1
\n\nI haven’t been on the Taylor Swift bandwagon forever. When I first heard “Teardrops on My Guitar,” I wrote her off as kitschy and “a fad.”
\n\nBut as things have their way of happening, no longer than two years after my initial sour reaction, my opinions changed. I saw past my first impressions and fell in love with her music.
\n\nFive years ago I couldn’t have told you what made Taylor’s music was so magnetic. It just was.
\n\nNow, though, I know what that “X-Factor” was.2
\n\nHere are two examples:3
\n\n\n\n\n\nI was attracted to the anecdotes.
\n\nThose moments full of feeling and depth. I was a part of Taylor’s life. I was a part of her freshmen year. I knew about her crush on that guy named Drew. I knew about the time when she thought she fell in love. She could match the massive appeal of Katy Perry while writing about one, single guy.
\n\nWhich is way “Shake It Off” worries me.
\n\n\n\nWhat happened? I suppose this change was a long time coming. Taylor has sold more than 34 million albums,4 has won 226 awards, and lives in a New York City penthouse. So should I be surprised that Shake It Off lacks the small-town stories of yesteryear?
\n\nNo. I shouldn’t be. It was clear from Red that change was afoot.
\n\n\n\nWe Are Never Ever Getting Back Together (WANEGBT) was still personal, but all signs turned toward pop. The lyrics lacked passion. The days of faded blue jeans, old Chevy trucks, and princesses were gone. The song was no longer driven by songwriting. Instead, WANEGBT relied on the incessant beat of an electronic drum. Taylor even seemed to admit this herself by releasing a revised version of the song to country music radio stations.
\n\nHowever, even though Red signaled the impending change, it also reaffirmed Taylor’s country roots.
\n\n\n\nFrom all accounts, 1989 will be an all-pop album.5 But can I blame Taylor? No. Her songwriting has always been based on her personal experiences. Living in New York City, flying in a private jet, and hanging out with other pop stars is her new life. But for Taylor, that creates a paradox. She got her start writing about issues common to all teenagers. She, more than anyone else, could capture the emotion, intimacy, and heartbreak of growing up.
\n\nBut now, Taylor doesn’t live that life, which explains why Shake It Off lacks the “X-Factor” of her previous endeavors.
\n\nDon’t get me wrong, I think “Shake It Off” is a great song.6 But at its heart, it’s pop music.
\n\nIn two years, I wonder if I will still be listening to “Shake It Off” and the rest of 1989. My guess is probably not. I will, however, still be listening to Taylor Swift, Fearless, Speak Now, and half of Red. Thinking about it that way, there really is nothing to lament about Taylor’s move to pop music. She has given me, and millions of others, hours of music that I love, so who am I to complain?
\n\n1989 era meaning late 2014 onwards. ↩
\nNote, though, that what I’m going to say is far from novel. It’s been said many times before. ↩
\nCan we all also agree that her old music videos were so much better? ↩
\n26 million from this 2011 press release plus 8 million from Red. ↩
\nIt won’t even be officially released to country music stations. ↩
\nI’ve already played it 44 times according to my music app. ↩
\n