Scrape all email addresses from a webpage

Updated: 10 January 2025

The bash script. Outputs a sorted and unique list of email addresses

#!/bin/bash

curl --silent https://lots-of-emails.example.com | \
grep -Po --file=email-regex.txt | sort | uniq

The --file option to grep reads the pattern to find from a file.

See here for an example email matching regex.

The perfect Git Graph GUI

Updated: 16 December 2024

I’m kicking-off an open-source project to create a Git Graph GUI tool, which has a feature set and functionality perfect for my requirements. If you would like to assist, please email me and / or create a pull request.

My project is Sourcegit Stripdown, a fork of https://github.com/sourcegit-scm/sourcegit. The objective is to remove from the upstream all the functionality which can be performed from the command line and leave only:

  1. A git commit graph.
  2. A diff view.

The project uses the .NET Avalonia UI framework and the C# programming language.

Getting started with Latex on Ubuntu

Updated: 02 September 2024

The number of possible options for getting started with Latex can be a little overwhelming! This is what I did:

Install texlive at approx 250MB

sudo apt install texlive

Then, from this answer on StackOverflow, create file test.tex

\documentclass[a4paper,12pt]{article}
\begin{document}

The foundations of the rigorous study of \emph{analysis}
were laid in the nineteenth century, notably by the
mathematicians Cauchy and Weierstrass. Central to the
study of this subject are the formal definitions of
\emph{limits} and \emph{continuity}.

Let $D$ be a subset of $\bf R$ and let
$f \colon D \to \mathbf{R}$ be a real-valued function on
$D$. The function $f$ is said to be \emph{continuous} on
$D$ if, for all $\epsilon > 0$ and for all $x \in D$,
there exists some $\delta > 0$ (which may depend on $x$)
such that if $y \in D$ satisfies
\[ |y - x| < \delta \]
then
\[ |f(y) - f(x)| < \epsilon. \]

One may readily verify that if $f$ and $g$ are continuous
functions on $D$ then the functions $f+g$, $f-g$ and
$f.g$ are continuous. If in addition $g$ is everywhere
non-zero then $f/g$ is continuous.

\end{document}

Create a .dvi file

latex test.tex

xdvi is an open-source computer program for displaying TeX-produced .dvi files under the X Window System on Linux.

Optionally, view the output file with xdvi

xdvi test.dvi &

Produce a pdf

pdflatex test.tex

Nginx reverse proxy to multiple docker apps

Updated: 11 August 2024

I often create and edit Customer demo apps, with Docker. I’ve found it very useful to serve all of these apps behind one Nginx reverse proxy webserver. The notes below show how I achieve this with the Digital Ocean Nginx config tool and deployment of the config with git.

Getting started

  1. Create a Digital Ocean LEMP Droplet.
  2. Install Docker on the LEMP Droplet.
  3. Install and run your Docker applications, binding each to a different host port.

Domains

Each demo needs a URL which can be given to the Customer. I tend to use subdomains of my business domain e.g. http://a-demo.christaylordeveloper.co.uk. For each subdomain I configure an A-record, pointing at the IP address of the LEMP server.

NGINXConfig

  1. Create nginx config for your site(s) with this Digital Ocean tool https://www.digitalocean.com/community/tools/nginx
  2. On the Reverse proxy tab, insert into the proxy_pass field the port numbers you set earlier when running-up your docker apps.
  3. Download your Nginx config.

Git

  1. Make a local git repo of the downloaded config and push to a remote.
  2. Add to your repo a .gitignore file which will ignore everything else in the LEMP droplet /etc/nginx directory.
  3. Clone the git repo into the LEMP server /etc/nginx/ directory
  4. Restart the Nginx service.

Edits

At some point you will need to add, edit or remove the Docker applications:

  1. Load the URL in file nginxconf.txt. This restores your configuration into the Digital Ocean tool.
  2. Make the required edits and download again.
  3. Commit the changes into your local git repo and push to remote.
  4. Pull changes into your LEMP server.

Identifying and Non-Identifying relationships

Updated: 06 July 2024

Non-identifying
Consider books and owners. A book can exist without an owner or change it’s owner. The relationship between a book and owner is non-identifying. The primary key of the parent is included in the child but not as part of the child’s primary key.
Identifying
Consider books and chapters. A chapter only exists when a book exists. The relationship between a book and its chapters is an identifying relationship. The primary key of the parent is included in the primary key of the child entity.

Store dotfiles in a bare git repo

Updated: 08 April 2024

See https://www.atlassian.com/git/tutorials/dotfiles
See https://news.ycombinator.com/item?id=11070797

Create a bare repo in directory dfs

git init --bare $HOME/dfs/.gitdfs

Create a handy alias in your .bashrc file

gitdfs='git --git-dir=$HOME/dfs/.gitdfs --work-tree=$HOME'

Set a flag for this repo only, which hides untracked files

gitdfs config --local status.showUntrackedFiles no

Now, any file in $HOME can be versioned with normal commands

gitdfs status
gitdfs add .vimrc
gitdfs commit -m "Track my .vimrc file"
gitdfs push

Show the files tracked so far

gitdfs ls-tree -r the_branch --name-only

Application deployments

Updated: 14 June 2024

Acme app

Deployments of Acme app:

deployment url environment sandbox purpose
Live https://greatapp.com Production No Host a public site.
Stage deploy https://uat.greatapp.com Staging Yes Review and user testing.
Laptop deployment e.g. http://localhost Development Yes Build & experiment.

Glossary

  • App: synonym for project.
  • Deployment: an instance of the project running on a host.
  • Environment: configuration settings which enable the deployment to fulfil its purpose.
  • Release: the word release is the name of a process. It is the process which makes a Version available for Deployment.
  • Sandbox: a deployment where it’s safe to use current and un-released features, without risk of altering Production environment data or incurring costs.
  • Version: a version numbered piece of software.

Agile Jira

Updated: 07 April 2024

How we use Jira:

  • Epic – Something the client has asked for or we are recommending. It is a large Story which will be broken down. An Epic groups together all other issue types. Represents a significant deliverable e.g. a new feature or experience. An Epic has an end point. Epics can appear on Roadmaps.
  • Story – Stories are the smallest units of work that an Epic can be broken down into. Could possibly be expressed “As a [persona], I [want to], [so that].” A Story can exist in isolation without a parent Epic.
  • Task – An engineering or administrative task, added by us.
  • Bug – A serious problem which impairs or prevents the functioning of a product.
  • Subtask – A smaller piece of work required to resolve a Bug, Story or Task
  • Version / Release – An objective point in time. Will sometimes correspond to the release (but not necessarily the deployment) of a version numbered piece of software. The name should reflect what will have been accomplished e.g.
    • Vendor API capabilities understood.
    • Customer on-boarding behaviour tests added.
    • Duplicate pdf invoice code refactored.