### Log-Rank Test ### Kaplan-Meier Survival Curves ## Proportional Hazard Model Proportional hazard model is similar to a GLM, assuming that the hazard rate $h_{i}(t)$ of individual $i$ is linked by the link function $\eta \mapsto \log \eta$ with the linear estimator $\eta_{i}=X_{i}^{T}\beta$: $h_{i}(t)=h_{0}(t)\exp(X_{i}^{T}\beta),$where $h_{0}$ is a baseline hazard rate, and $\beta$ are coefficients shared by all individuals. - Hence the proportional hazard model assumes that everyone's curve has the same shape $h_{0}$, but has different scales $\exp(X_{i}^{T}\beta)$. - Equivalently, the survival function is $S_{i}(t)=S_{0}(t)^{\eta_{i}}$. ### Coefficient Estimation The coefficients $\beta$ are estimated with its MLE, but for that we need a notion of likelihood under the proportional hazard model. To start, let $T_{(1)}<\cdots<T_{(J)}$ be the order statistics of observed death times, and right before $T_{(j)}$, there is a **risk set** $\mathcal{R}_{j}$ of individuals who are still alive: $\mathcal{R}_{j}:= \{ i\,|\, t_{i} \ge T_{(j)}\}.$ Then the key result for computing the likelihood is that, given that some individual $i_{j}$ is the $j$th death, the probability that it is $i$ is proportional to its hazard rate: $\mathbb{P}[i=i_{j}\,|\,\mathcal{R}_{{j}}]=\frac{\exp(\eta_{i})}{\sum_{k \in \mathcal{R}_{{j}}}\exp(\eta_{k})}.$So multiplying it for all individuals who died, we get the **partial likelihood** $L(\beta):= \prod_{j=1}^{J}\mathbb{P}[i_{j} \,|\, \mathcal{R}_{j}]=\prod_{j=1}^{J} \frac{\exp(\eta_{i_{j}})}{\sum_{k \in \mathcal{R}_{j}}\exp(\eta_{k})},$and $\beta$ is estimated with its MLE: $\hat{\beta}=\underset{\beta}{\arg\max}\,L(\beta) \overset{D}{\approx} N\left( \beta,-\frac{ \partial ^{2}}{ \partial \beta } \log L(\beta) \right),$where the variance is just the observed information analog for the partial likelihood $L(\beta)$.