Frenetic Array

A canvas for logic and imagination.

r/all Trends

When considering going viral, one of the critical things to consider is timing. For reddit, there are many tools and studies for this; but they’re all subreddit specific. This is for good cause, different subreddits are more active at different times of the day. But what about “r/all”?

So, skipping the implementation details (you can read about them below), I wrote a script that parsed 165,607 posts to get a general idea of what “r/all” does in a given day. Note the times are for Pacific Time (PT) zone.

Let’s start with number of posts created at any given time.

Number of Posts vs. Time of Day

So a pretty good trend. What about the mean upvote count?

Average Number of Upvotes vs. Time

Okay, not so pretty. Let’s filter out the outliers by replacing anything above 300 upvotes with a -1.

Average Number of Upvotes (Filtered) vs. Time

A decent trend until around 12:00 to 14:00, where everything goes sporadic. Let’s take a look at the median upvotes.

Median Number of Upvotes vs. Time

Again, let’s filter the outliers (anything above 1000).

Median Number of Upvotes (Filtered) vs. Time

So we can safely assume a lot of reddit posts are upvote counts are zero (actually, the mode — the most commonly occurring upvote count — is by far zero).

But what about the max?

Max Number of Upvotes vs. Time

So far, these are pretty useless. But, there is one good litmus test for popularity: reaching over 1,000 upvotes. Let's take a look at that.

Over 1000 Upvotes vs. Time

There looks like there could be something here. Let's switch to line graph and only sum to the hour.

Over 1000 Upvotes vs. Time

Now that's a trend. Adding a cubic spline, we get a beautiful line graph:

Over 1000 Upvotes vs. Time (Spline)

It seems quite apparent that the best time to post is around 14:00 to 15:00. Neat.

Implementation

I used the praw python package to download and parse all of the data. Unfortunately Reddit was having issues handling over 100,000 consecutive request (around 100/second), so my connection would be severed constantly. I had to manually restart where I left off.

The code is as follows

import praw  
import datetime  
from functools import reduce  
import statistics  
import time


def get_date(submission):  
    time = submission.created
    return datetime.datetime.fromtimestamp(time)


def date_to_unix_second(t):  
    return (t-datetime.datetime(1970,1,1)).total_seconds()


# Thanks https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
def mode(numbers, out_mode):  
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes



reddit = praw.Reddit(user_agent='bot1', client_id='', client_secret='', redirect_uri='' 'authorize_callback')  
subreddit = reddit.subreddit('all')

day1 = datetime.datetime(2017, 5, 26, hour=0, minute=0, second=0)  
day2 = datetime.datetime(2017, 5, 27, hour=0, minute=0, second=0)

current_day = day2

while current_day != day1:  
    current_minute_upvotes = []
    current_minute = 0
    total = 0

    for submission in subreddit.submissions(date_to_unix_second(day1), date_to_unix_second(current_day) - 60*60):
        submission_date = get_date(submission)

        if submission_date.minute != current_minute:
            current_minute = submission_date.minute

            print("{0}:{1},{2},{3},{4},{5},{6},{7}".format(
                submission_date.hour if submission_date.hour > 9 else "0" + str(submission_date.hour),
                submission_date.minute if submission_date.minute > 9 else "0" + str(submission_date.minute),
                len(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else reduce(lambda x, y: x + y, current_minute_upvotes, 0) / len(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else current_minute_upvotes[int(len(current_minute_upvotes)/2)],
                0 if not len(current_minute_upvotes) else mode(current_minute_upvotes,"largest"),
                0 if not len(current_minute_upvotes) else max(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else min(current_minute_upvotes),
            ))
            total += len(current_minute_upvotes)
            current_minute_upvotes = []

        current_minute_upvotes += [submission.score]
        current_day = submission_date

    print(total)

For the spline, the code is as follows:

    delta_t = max(x) - min(x)
    N_points = 300
    xnew = [min(x) + i*delta_t/N_points for i in range(N_points)]

    x_ts = [x_.timestamp() for x_ in x]
    xnew_ts = [x_.timestamp() for x_ in xnew]
    ynew = spline(x_ts, y, xnew_ts)

The graphs were produced by matplotlib, the code is below.

#!/usr/local/bin/python3
#
# plot.py
# Desktop
#
# Created by Illya Starikov on 05/16/17.
# Copyright 2017. Illya Starikov. All rights reserved.
#

import matplotlib.pyplot as plt  
import matplotlib.dates as mdates  
from datetime import datetime

import csv


def import_from_csv(filename):  
    with open(filename, "rt") as csvfile:
        csv_parsor = csv.reader(csvfile, delimiter=',')

        date_times = []
        values = []

        for date_and_time_string, value in csv_parsor:
            datetime_object = datetime.strptime(date_and_time_string, '%H:%M')

            date_times += [datetime_object.replace(day=26, month=5, year=2017)]
            values += [value]

        return (date_times, values)


def main():  
    # Get the data and the subplots
    x, y = import_from_csv('data.csv')
    fig, ax = plt.subplots()

    # Make the figure wide and draw it
    fig.set_size_inches((12, 5))
    ax.plot_date(x, y, color='b', marker='+', markersize=6)

    # Custom labels, since I have milestones
    labels = ["1:00", "2:00", "3:00", "4:00", "5:00", "6:00", "7:00", "8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"]

    locs = [
        mdates.date2num(datetime(2017, 5, 26, 1, 00)),
        mdates.date2num(datetime(2017, 5, 26, 2, 00)),
        mdates.date2num(datetime(2017, 5, 26, 3, 00)),
        mdates.date2num(datetime(2017, 5, 26, 4, 00)),
        mdates.date2num(datetime(2017, 5, 26, 5, 00)),
        mdates.date2num(datetime(2017, 5, 26, 6, 00)),
        mdates.date2num(datetime(2017, 5, 26, 7, 00)),
        mdates.date2num(datetime(2017, 5, 26, 8, 00)),
        mdates.date2num(datetime(2017, 5, 26, 9, 00)),
        mdates.date2num(datetime(2017, 5, 26, 10, 00)),
        mdates.date2num(datetime(2017, 5, 26, 11, 00)),
        mdates.date2num(datetime(2017, 5, 26, 12, 00)),
        mdates.date2num(datetime(2017, 5, 26, 13, 00)),
        mdates.date2num(datetime(2017, 5, 26, 14, 00)),
        mdates.date2num(datetime(2017, 5, 26, 15, 00)),
        mdates.date2num(datetime(2017, 5, 26, 16, 00)),
        mdates.date2num(datetime(2017, 5, 26, 17, 00)),
        mdates.date2num(datetime(2017, 5, 26, 18, 00)),
        mdates.date2num(datetime(2017, 5, 26, 19, 00)),
        mdates.date2num(datetime(2017, 5, 26, 20, 00)),
        mdates.date2num(datetime(2017, 5, 26, 21, 00)),
        mdates.date2num(datetime(2017, 5, 26, 22, 00)),
        mdates.date2num(datetime(2017, 5, 26, 23, 00))
    ]

    # Assign the data, change fonts to Bebas Neue, make titles and labels
    ax.set_xticklabels(labels, fontname="Bebas Neue")
    ax.set_yticklabels([int(i) for i in ax.get_yticks()], fontname="Bebas Neue")

    ax.set_title("Title", fontsize=22, fontname="Bebas Neue")
    ax.set_ylabel("Upvotes", fontsize=14, fontname="Bebas Neue")
    ax.set_xlabel("Time of Day", fontsize=14, fontname="Bebas Neue")
    ax.set_xticks(locs)

    # Make nicer x axis
    plt.gcf().autofmt_xdate()

    # Export that shit
    ax.grid()
    plt.draw()
    plt.savefig("figure.png", format='png', dpi=280)


if __name__ == "__main__":  
    main()