## LME4 GLMMs are different when constructed as success | trials vs raw data?

Why are these GLMMs so different? Both are made with lme4, both use the same data, but one is framed in terms of successes and trials (m1bin) while one just uses the raw accuracy data (m1). Have I been completely mistaken thinking that lme4 figure...
more »

## Numerically stable evaluation of log of integral of a function with extremely small values

If I have a random number Z that is defined as the sum of two other random numbers, X and Y, then the probability distribution of Z is the convolution of the probability distributions for X and Y. Convolution is basically to an integral of the produc...
more »

## Generating data from exponential distribution by incorporating correlation between two random variables

Suppose X~exp(.67) , Y~exp(.45) and Z~exp(.8). Now X is correlated with Y with a correlation coefficient -0.6. Again, X is correlated with Z with a correlation coefficient -0.6. How can I incorporate this correlations to generate random variables X, ...
more »

## Statistical difference between two curves

I have prey consumption data on two species and I am trying to determine the degree to which their functional response curves differ. Here is a small example of my data (one separate Excel file for each species, the non-native had 6 repetitions for ...
more »

## Issue with statistical test

I use the Kolmogorov-Smirnov test to test for normality in a sample. For example, when I do x <- rnorm(1e4, 10, 5) ks.test(x, "pnorm") I get the following result: D = 0.4556, p-value < 2.2e-16 alternative hypothesis: two-sided The p-value...
more »

## sql - for every unique value in column sample 2 different values from another column

I am stuck on a difficult sql aggregate problem. Consider the following table/view: Column1 Column2 1 2564 2 6550 1 3578 2 6548 2 4789 1 9876 I would like to design a query to do the following: For ever...
more »

## Multiple t-test comparisons

I would like to know how I can use t.test or pairwise.t.test to make multiple comparisons between gene combinations. First, how can I compare all combinations Gene 1 vs. Gene 3, Gene 3 vs Gene 4, etc.? Second, how would I be able to only compare comb...
more »

## Combining two Weibull distributions in R

I am working on a project which involves combining two Weibull distributions and thus creating a double peaked curve. I then aim to make predictions using this. I have searched online and I can't seem to find anything on this or if R has a function t...
more »

## apply a rolling mean to a database by an index

I would like to calculate a rolling mean on data in a single data frame by multiple ids. See my example dataset below. date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", "2015-02-07", ...
more »

## Efficient calculation of var-covar matrix in R

I'm looking for efficiency gains in calculating the (auto)covariance matrix from individual measurements over time t with t, t-1, etc.. In the data matrix, each row represents an individual and each column represents monthly measurements (the column...
more »

## Defining exponential distribution in R to estimate probabilities

I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research...
more »

## Statistics of multiple similarly named columns

I have a huge dataset with multiple columns like x1, x2, x3......x25, y1, y2, y3......y50, z1, z2.......z10 etc. which looks something like this: x1 x2 x3 x4 y1 y2 y3 1 2 1 2 1 1 2 2 1 1 1 3 1 1 1 2 2 1 1...
more »

## Standard errors discrepancies between SAS and R for GLM gamma distribution

I am comparing GLM output from R and SAS of a model with a Gamma distribution. The point estimations are identical, but they have different estimation of the standard error and therefore different p-values. Does anyone know why? I am wondering if R...
more »

## What does it mean to put an `rnorm` as an argument of another `rnorm` in R?

I have difficulty understanding what it means when an rnorm is used as one of the arguments of another rnorm? (I'll explain more below) For example, below, in the first line of my R code I use an rnorm() and I call this rnorm(): mu. mu consists of...
more »

## curve() works in the first example but not the second, but they look identical. Why?

I'm new to R and try to do some exercises. I wonder why the first code works fine while the second code doesn't. When I try to run the second code, it says he can't find function a. 1.) x = seq(from = - 9, to = 9, len = 100) curve(dnorm(x,0,2),add ...
more »

## How to do histograms of this row-column table in R ggplot?

I am trying to plot the descriptive variables in the first row by the following procedure. I also tried unsuccessfully with quoting the column/row names rotate rows and columns in the CSV data for the correposding data structure (tall table) requ...
more »

## R: How do you algebraically undo a weighted mean?

Let's say I have the following values: ue <- c(0.1784545, 0.2248318, 0.2561000, 0.2722773, 0.2629545, 0.2797364 0.2294227) ff <- c(679, 631, 588, 514, 380, 192 , 60) r <- c(0.6167, 0.8099, 0.9902, 1.0767, 1.1359, 1.2550, 1.6187) I want t...
more »

## Fitting a Lognormal Distribution in Python using CURVE_FIT

I have a hypothetical y function of x and trying to find/fit a lognormal distribution curve that would shape over the data best. I am using curve_fit function and was able to fit normal distribution, but the curve does not look optimized. Below are...
more »

## Applying a function to find high density area of a distribution (coding)

Background: Recently, I came across an R function called HDIofICDF (also see below) that provides two limit-values for any distribution (a unimodal curve) such that from one limit-value to the other limit-value covers 95% highly dense area of that d...
more »

## NumPy: calculate cumulative median

I have sample with size = n. I want to calculate for each i: 1 <= i <= n median for sample[:i] in numpy. For example, I counted mean for each i: cummean = np.cumsum(sample) / np.arange(1, n + 1) Can I do something similar for the median wit...
more »

## Python Negative Binomial Regression - Results Don't Match those from R

I'm experimenting with negative binomial regression using Python. I found this example using R, along with a data set: http://www.karlin.mff.cuni.cz/~pesta/NMFM404/NB.html I tried to replicate the results on the web page using this code: import pa...
more »

## Interpreting the sum of TF-IDF scores of words across documents

First let's extract the TF-IDF scores per term per document: from gensim import corpora, models, similarities documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system respon...
more »

## Get area size from geom_area (discrete values)

I would like to get the area under a curve using ggplot2. The problem is that I have just discrete values (measurements, dependent variable) on a continuous scale (time), but measurements are not equally distant. I am not interested in fitting a func...
more »

## Automatically solve an equation of `pt` for `ncp`

I wonder if it is possible to efficiently change ncp in the below code such that x becomes .025 and .975 (within rounding error). x <- pt(q = 5, df = 19, ncp = ?) ---------- Clarification q = 5 and df = 19 (above) are just two hypothetical n...
more »

## Python scipy - specify custom discrete distribution

I use various continuous distributions from scipy.stats (e.g. norm). So if I want to find P(Z < 0.5) I would do: from scipy.stats import norm norm(0, 1).cdf(0.5) # Z~N(0,1) Is there a tool (scipy.stats or statsmodels or else) that I can use to...
more »

## Generating random numbers from arbitrary probability density function

I would like to be able to generate random numbers with a probability density function that comes from a drawn curve. These two below have the same area under the curve but should produce lists of random numbers with different characteristics. My ...
more »

## Finding stationary distribution of a markov process given a transition probability matrix

There has been two threads related to this issue on Stack Overflow: How can I obtain stationary distribution of a Markov Chain given a transition probability matrix describes what a transition probability matrix is, and demonstrate how a stationary...
more »

## Determine a normal distribution given its quantile information

I was wondering how I could have R tell me the SD (as an argument in the qnorm() built in R) for a normal distribution whose 95% limit values are already known? As an example, I know the two 95% limit values for my normal are 158, and 168, resp...
more »

## Explanation for interesting phenomenom with Random() and colors

Ok, a student of mine had a cool result which I just couldn't explain. In her code, she wants to create random circles (in WPF) with Random colors. She made the typical beginner's mistake of creating more than one Random generator, but bare with me, ...
more »

## Compute similarity percentage OR Compute correlation between more than 2 objects

Consider I have four objects (a,b,c,d), and I ask five persons to label them (category 1 or 2) according to their physical appearance or something else. The labels provided by five persons for these objects are shown as df <- data.frame(a = c(1,2...
more »

## How do I find the required maxima in acceleration data obtained from an iPhone?

I need to find the number of times the accelerometer value stream attains a maximum. I made a plot of the accelerometer values obtained from an iPhones against time, using CoreMotion method to obtain the DeviceMotionUpdates. When the data was being...
more »

## Caculating probability using normal distribution. and t-distribution in R

I have this sample: x=c(92L, 9L, 38L, 43L, 74L, 16L, 75L, 55L, 39L, 77L, 76L, 52L, 100L, 85L, 62L, 60L, 49L, 28L, 6L, 27L, 63L, 22L, 23L, 99L, 61L, 25L, 19L, 48L, 91L, 57L, 97L, 84L, 31L, 87L, 1L, 21L, 30L, 41L, 13L, 72L, 68L, 95L, 47L, 11L, 24L,...
more »

## Toy R code on Bayesian inference for mean of a normal distribution [data of snowfall amount]

I have a number of snowfall observations: x <- c(98.044, 107.696, 146.050, 102.870, 131.318, 170.434, 84.836, 154.686, 162.814, 101.854, 103.378, 16.256) and I was told that it follows normal distribution with known standard deviation at...
more »

## Calculating miniscule numbers for chi-squared distribution in R -- numerical precision

I am using the pchisq function in R to calculate the cumulative distribution function for the chi-squared distribution. I would like to calculate very small values, such that 1-pchisq(...) can have a value smaller than 2.2e-16 (which is the numerical...
more »