Hi wondering if anyone can recommend an approach to carrying out regression analysis..
I am trying to understand the interaction between a set of non-continuous variables (i.e. TRUE/FALSE etc.) on a response output which is continuous (1-200).
- Number of input variables are between 20-30, all of which are non-continuous
- Each input variable can contain 1 or may values
- Response output is a continues variable
- Analysis scope are datasets of between 30k - 100k observations (see below for the structure
- Objective - understand what input variable values (or combinations of input variable values) can be used to predict the response output.
Based on the above.
- Can anyone suggest the most appropriate statistical approach to meet the objective
- How could this be achieved using a python-based environment?
| Response attribute | Input variable 1 | Input variable 2 | Input variable 3 |
|---|---|---|---|
| 2 | Y | blue | plane |
| 100 | N | green | car |
What has been tried? Various linear regression tests
What was the expected output? A plot or table stating the various input variables, or combination thereof and an interaction rating/coefficient