Frenetic Array

A canvas for logic and imagination.

House of Cards (Kattis Problem)

It’s not so often I come across a problem that works out so beautifully yet requires so many different aspects in competitive programming. One particular problem, courtesy of Kattis, really did impress me.

The problem was House of Cards, and the problem was this (skip for TL;DR at bottom):

Brian and Susan are old friends, and they always dare each other to do reckless things. Recently Brian had the audacity to take the bottom right exit out of their annual maze race, instead of the usual top left one. In order to trump this, Susan needs to think big. She will build a house of cards so big that, should it topple over, the entire country would be buried by cards. It’s going to be huge! The house will have a triangular shape. The illustration to the right shows a house of height $6$ and Figure 1 shows a schematic figure of a house of height $5$.

Figure 1

For aesthetic reasons, the cards used to build the tower should feature each of the four suits (clubs, diamonds, hearts, spades) equally often. Depending on the height of the tower, this may or may not be possible. Given a lower bound $h_0$ on the height of the tower, what is the smallest possible height $h \geq h_0$ such that it is possible to build the tower?

TL;DR: Using Figure 1 as a reference, you are given a lower bound on a height for a tower of cards. However, there must be an equal distribution of all four suites; clubs, diamonds, hearts, and spades.

This implies that the number of cards have to be divisible by $4$. Seeing as the input was huge $1 \leq h_0 \leq 10^{1000}$, there was no brute forcing this. So, first thought: turn this into a closed-form series, and solve the series.

Getting the values for the first five heights, I got the following set:

$${2, 7, 15, 25, 40, \ldots}$$

I was able to turn this set into a series quite easily:

$$\sum_{n = 1} ^{h_0} \left(3n - 1\right)$$

This turned into the following equation:

$$\frac{1}{2} h_0(3h_0 + 1)$$

So, all I had to do was plug $h_0$ in the equation, and increment while the number was not divisible by $4$. Then, I realized how large the input really was. The input size ($1 \cdot 10^{1000}$) was orders of magnitudes larger than typical, large data types would allow ($1.84 \cdot 10^{19}$).

I realized this couldn’t be tested against a intensive data set, because there is only one number to calculate. I thought, since the series always subtracts one, the minimum times I must increment should roughly be four. Keeping this in mind, I decided to use Python. Python can work with arbitrarily large numbers, making it ideal in this situation.

I sat down, hoped for the best, and wrote the following code.

def getNumberOfCards(x):  
    return (3*pow(x, 2) + x) // 2


height = int(input())  
while (getNumberOfCards(height) % 4 != 0):  
    height += 1

print(height)  

With a run time of 0.02 seconds, it worked.

Avoid Mac App Store via The Terminal

I avoid the Mac App store as much as possible. It’s buggy. It’s slow. Apps are getting pulled from it left and right. But, I could not avoid it permanently because of software updates. As macOS Sierra 10.5 rolled out, I clicked on the App Store, followed by updates, and

I could not get the updates page opened. Killed App Store. Restart computer. Everything.

After doing some research, I discovered there is a way to update software from the terminal: softwareupdate. So, after running one command (sudo softwareupdate -iv), I am writing this for the latest version of macOS Sierra.

Quick Compile in Vim

One of the reasons I kept CodeRunner handy was its ability to quickly compile code. With a single click of a button, I could run any of my frequently used languages. It didn’t matter if it was an interpreted or compiled language, so I could virtually run anything, like:

  • C/C++
  • Python
  • Lua
  • LaTeX
  • Perl

However, in the last few months I started using Vim. Heavily. So much so I was trying to use Vim command in the CodeRunner buffers. So I decided I wanted to have the functionality, and in vim-esque fashion, I mapped to my leader key: <leader>r. The mnemonic <leader>run helped me remember the command on the first few tries.

To get the functionality, just add the following to your .vimrc.

function! MakeIfAvailable()  
    if filereadable("./makefile")
        make
    elseif (&filetype == "cpp")
        execute("!clang++ -std=c++14" + bufname("%"))
        execute("!./a.out")
    elseif (&filetype == "c")
        execute("!clang -std=c11" + bufname("%"))
        execute("!./a.out")
    elseif (&filetype == "tex")
        execute("!xelatex" + bufname("%"))
        execute("!open" + expand(%:r) + ".pdf")
    endif
endfunction

augroup spaces  
    autocmd!
    autocmd FileType c nnoremap <leader>r :call MakeIfAvailable()<cr>
    autocmd FileType cpp nnoremap <leader>r :call MakeIfAvailable()<cr>
    autocmd FileType tex nnoremap <leader>r :call MakeIfAvailable()<cr>
    autocmd FileType python nnoremap <leader>r :exec '!python' shellescape(@%, 1)<cr>
    autocmd FileType perl nnoremap <leader>r :exec '!perl' shellescape(@%, 1)<cr>
    autocmd FileType sh nnoremap <leader>r :exec '!bash' shellescape(@%, 1)<cr>
    autocmd FileType swift nnoremap <leader>r :exec '!swift' shellescape(@%, 1)<cr>
    nnoremap <leader>R :!<Up><CR>
augroup END  

r/all Trends

When considering going viral, one of the critical things to consider is timing. For reddit, there are many tools and studies for this; but they’re all subreddit specific. This is for good cause, different subreddits are more active at different times of the day. But what about “r/all”?

So, skipping the implementation details (you can read about them below), I wrote a script that parsed 165,607 posts to get a general idea of what “r/all” does in a given day. Note the times are for Pacific Time (PT) zone.

Let’s start with number of posts created at any given time.

Number of Posts vs. Time of Day

So a pretty good trend. What about the mean upvote count?

Average Number of Upvotes vs. Time

Okay, not so pretty. Let’s filter out the outliers by replacing anything above 300 upvotes with a -1.

Average Number of Upvotes (Filtered) vs. Time

A decent trend until around 12:00 to 14:00, where everything goes sporadic. Let’s take a look at the median upvotes.

Median Number of Upvotes vs. Time

Again, let’s filter the outliers (anything above 1000).

Median Number of Upvotes (Filtered) vs. Time

So we can safely assume a lot of reddit posts are upvote counts are zero (actually, the mode — the most commonly occurring upvote count — is by far zero).

But what about the max?

Max Number of Upvotes vs. Time

So far, these are pretty useless. But, there is one good litmus test for popularity: reaching over 1,000 upvotes. Let's take a look at that.

Over 1000 Upvotes vs. Time

There looks like there could be something here. Let's switch to line graph and only sum to the hour.

Over 1000 Upvotes vs. Time

Now that's a trend. Adding a cubic spline, we get a beautiful line graph:

Over 1000 Upvotes vs. Time (Spline)

It seems quite apparent that the best time to post is around 14:00 to 15:00. Neat.

Implementation

I used the praw python package to download and parse all of the data. Unfortunately Reddit was having issues handling over 100,000 consecutive request (around 100/second), so my connection would be severed constantly. I had to manually restart where I left off.

The code is as follows

import praw  
import datetime  
from functools import reduce  
import statistics  
import time


def get_date(submission):  
    time = submission.created
    return datetime.datetime.fromtimestamp(time)


def date_to_unix_second(t):  
    return (t-datetime.datetime(1970,1,1)).total_seconds()


# Thanks https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
def mode(numbers, out_mode):  
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes



reddit = praw.Reddit(user_agent='bot1', client_id='', client_secret='', redirect_uri='' 'authorize_callback')  
subreddit = reddit.subreddit('all')

day1 = datetime.datetime(2017, 5, 26, hour=0, minute=0, second=0)  
day2 = datetime.datetime(2017, 5, 27, hour=0, minute=0, second=0)

current_day = day2

while current_day != day1:  
    current_minute_upvotes = []
    current_minute = 0
    total = 0

    for submission in subreddit.submissions(date_to_unix_second(day1), date_to_unix_second(current_day) - 60*60):
        submission_date = get_date(submission)

        if submission_date.minute != current_minute:
            current_minute = submission_date.minute

            print("{0}:{1},{2},{3},{4},{5},{6},{7}".format(
                submission_date.hour if submission_date.hour > 9 else "0" + str(submission_date.hour),
                submission_date.minute if submission_date.minute > 9 else "0" + str(submission_date.minute),
                len(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else reduce(lambda x, y: x + y, current_minute_upvotes, 0) / len(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else current_minute_upvotes[int(len(current_minute_upvotes)/2)],
                0 if not len(current_minute_upvotes) else mode(current_minute_upvotes,"largest"),
                0 if not len(current_minute_upvotes) else max(current_minute_upvotes),
                0 if not len(current_minute_upvotes) else min(current_minute_upvotes),
            ))
            total += len(current_minute_upvotes)
            current_minute_upvotes = []

        current_minute_upvotes += [submission.score]
        current_day = submission_date

    print(total)

For the spline, the code is as follows:

    delta_t = max(x) - min(x)
    N_points = 300
    xnew = [min(x) + i*delta_t/N_points for i in range(N_points)]

    x_ts = [x_.timestamp() for x_ in x]
    xnew_ts = [x_.timestamp() for x_ in xnew]
    ynew = spline(x_ts, y, xnew_ts)

The graphs were produced by matplotlib, the code is below.

#!/usr/local/bin/python3
#
# plot.py
# Desktop
#
# Created by Illya Starikov on 05/16/17.
# Copyright 2017. Illya Starikov. All rights reserved.
#

import matplotlib.pyplot as plt  
import matplotlib.dates as mdates  
from datetime import datetime

import csv


def import_from_csv(filename):  
    with open(filename, "rt") as csvfile:
        csv_parsor = csv.reader(csvfile, delimiter=',')

        date_times = []
        values = []

        for date_and_time_string, value in csv_parsor:
            datetime_object = datetime.strptime(date_and_time_string, '%H:%M')

            date_times += [datetime_object.replace(day=26, month=5, year=2017)]
            values += [value]

        return (date_times, values)


def main():  
    # Get the data and the subplots
    x, y = import_from_csv('data.csv')
    fig, ax = plt.subplots()

    # Make the figure wide and draw it
    fig.set_size_inches((12, 5))
    ax.plot_date(x, y, color='b', marker='+', markersize=6)

    # Custom labels, since I have milestones
    labels = ["1:00", "2:00", "3:00", "4:00", "5:00", "6:00", "7:00", "8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"]

    locs = [
        mdates.date2num(datetime(2017, 5, 26, 1, 00)),
        mdates.date2num(datetime(2017, 5, 26, 2, 00)),
        mdates.date2num(datetime(2017, 5, 26, 3, 00)),
        mdates.date2num(datetime(2017, 5, 26, 4, 00)),
        mdates.date2num(datetime(2017, 5, 26, 5, 00)),
        mdates.date2num(datetime(2017, 5, 26, 6, 00)),
        mdates.date2num(datetime(2017, 5, 26, 7, 00)),
        mdates.date2num(datetime(2017, 5, 26, 8, 00)),
        mdates.date2num(datetime(2017, 5, 26, 9, 00)),
        mdates.date2num(datetime(2017, 5, 26, 10, 00)),
        mdates.date2num(datetime(2017, 5, 26, 11, 00)),
        mdates.date2num(datetime(2017, 5, 26, 12, 00)),
        mdates.date2num(datetime(2017, 5, 26, 13, 00)),
        mdates.date2num(datetime(2017, 5, 26, 14, 00)),
        mdates.date2num(datetime(2017, 5, 26, 15, 00)),
        mdates.date2num(datetime(2017, 5, 26, 16, 00)),
        mdates.date2num(datetime(2017, 5, 26, 17, 00)),
        mdates.date2num(datetime(2017, 5, 26, 18, 00)),
        mdates.date2num(datetime(2017, 5, 26, 19, 00)),
        mdates.date2num(datetime(2017, 5, 26, 20, 00)),
        mdates.date2num(datetime(2017, 5, 26, 21, 00)),
        mdates.date2num(datetime(2017, 5, 26, 22, 00)),
        mdates.date2num(datetime(2017, 5, 26, 23, 00))
    ]

    # Assign the data, change fonts to Bebas Neue, make titles and labels
    ax.set_xticklabels(labels, fontname="Bebas Neue")
    ax.set_yticklabels([int(i) for i in ax.get_yticks()], fontname="Bebas Neue")

    ax.set_title("Title", fontsize=22, fontname="Bebas Neue")
    ax.set_ylabel("Upvotes", fontsize=14, fontname="Bebas Neue")
    ax.set_xlabel("Time of Day", fontsize=14, fontname="Bebas Neue")
    ax.set_xticks(locs)

    # Make nicer x axis
    plt.gcf().autofmt_xdate()

    # Export that shit
    ax.grid()
    plt.draw()
    plt.savefig("figure.png", format='png', dpi=280)


if __name__ == "__main__":  
    main()

Trailing Spacing

One of the most frustrating things to deal with is trailing whitespace. Trailing whitespace is such a pain because:

  • It can screw up string literals
  • It can break expectations in a text editor (i.e. jumping to a new line or the end of the line)
  • It can actually break programming languages
  • It is just unflattering

However, in Vim, it takes one autocmd to alleviate this.

augroup spaces  
    autocmd!
    autocmd BufWritePre * %s/\s\+$//e
augroup END  

On every buffer save substitute spaces at the end of the line with nothing. Easy!