@@ -7,7 +7,8 @@ Unfold: Rotate one field to many
77
88 The ``Unfold `` transform unfolds (pivots) a set of fields.
99 Simple unfolding consists of rotating a single input field into multiple output fields.
10- This can be generalised to multiple input fields where the output fields are broken up into equal-sized groups,
10+
11+ This can be generalised to multiple input fields where the output fields are broken up into equal-sized *groups *,
1112 and each group is generated from one of the input fields.
1213 ``Unfold `` is the inverse of :py:class: `Fold `.
1314
@@ -21,73 +22,167 @@ Unfold: Rotate one field to many
2122
2223 The list of fields to be unfolded.
2324 They will be dropped from the output, so use :py:class: `Copy ` to preserve them.
24- Each input field contains the values for an entire output group.
25+ The first field is the *tag * field and is used to identify wich element of the group the row belongs to.
26+ Each subsequent input field contains the values for an entire group.
2527
2628 .. py :attribute :: outputs
2729 :type: tuple(str)
2830
29- The output fields receiving the unfolded fields.
31+ The output fields receiving the unfolded input fields.
3032 The output fields are broken into equal-sized groups, one per input field.
3133 The number of *inputs * must be an even multiple of the number of *outputs *.
3234 They cannot overwrite existing fields, so use :py:class: `Drop ` to remove unwanted fields.
3335
34- Limitations
35- ^^^^^^^^^^^
36- The current implementation assumes that the unfolded values are contiguous.
37- That is, all the input rows for a single output row will arrive sequentially and in order.
38- This is the order generated by :py:class: `Fold `, so it is suggested that for now ``Unfold ``
39- only be used to undo the actions of :py:class: `Fold `.
36+ .. py :attribute :: tags
37+ :type: dict(any,int)
38+
39+ The optional mapping from tag values to group positions.
40+ If not provided, it will be generated sequentially from the values in the first record.
41+
42+ ``Unfold `` can rotate data where the output rows are generated from non-consecutive input rows.
43+ To identify output rows, the remaining fields (called the *fixed * fields) are used as a key
44+ for accumulating the values of a row.
45+ When a row is complete, it is output.
46+
47+ Because the rows for an output field can appear at any point,
48+ the *tags * are used to assign fields to output columns.
49+ The first time a tag is seen, it is assigned to the next group position,
50+ so the order of the tags in the first record must match the layout of the groups.
4051
4152Usage
4253^^^^^
4354
4455.. code-block :: python
4556
46- Unfold(p, (' Year' , ' Sales' ,), (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,))
47- Unfold(p, (' Year' , ' Sales' , ' Profit' ,), (' Sales 1992' , ' Sales 1993' , ' Sales 1994' , ' Profit 1992' , ' Profit 1993' , ' Profit 1994' ,))
57+ Unfold(p, (' Year' , ' Sales' ,),
58+ (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,))
59+ Unfold(p, (' Year' , ' Sales' , ' Profit' ,),
60+ (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,
61+ ' Profit 1992' , ' Profit 1993' , ' Profit 1994' ,))
4862
4963 Examples
5064^^^^^^^^
5165
52- Single Fold
53- -----------
66+ Single Group
67+ ------------
68+
69+ The first Usage example is a case where a single measure (Sales) has been tagged by Year,
70+ so that each Sales value is in a separate row:
5471
5572.. csv-table :: Input
56- :header: "Key ", "Year", "Sales"
73+ :header: "Dept ", "Year", "Sales"
5774 :align: left
5875
59- 0, 1992, "S-0-1992"
60- 0, 1993, "S-0-1993"
61- 0, 1994, "S-0-1994"
62- 1, 1992, "S-1-1992"
63- 1, 1993, "S-1-1993"
64- 1, 1994, "S-1-1994"
76+ Home, 1992, "S-H-1992"
77+ Home, 1993, "S-H-1993"
78+ Home, 1994, "S-H-1994"
79+ Auto, 1992, "S-A-1992"
80+ Auto, 1993, "S-A-1993"
81+ Auto, 1994, "S-A-1994"
82+
83+ In order to have all the Sales values for a Dept in a single record,
84+ the table needs to have all the Sales for that Dept rotated into the same row.
85+ ``Unfold `` takes the tags and the field containing the values as its inputs
86+ and the fields to rotate them to them in as the outputs.
87+
88+ The first *input * field is the "Tags" field, which contains the value used to
89+ identify the original row.
90+ In this example, this is the Year of the field.
91+ This tag is used to track which group field an input row belongs to.
92+ The tags are tracked in order, and they must have the same number as the inputs.
93+
94+ After Unfolding, each Sales value appears in a separate field, with the Year in the field name:
6595
6696.. csv-table :: Output
67- :header: "Key ", "Sales 1992", "Sales 1993", "Sales 1994"
97+ :header: "Dept ", "Sales 1992", "Sales 1993", "Sales 1994"
6898 :align: left
6999
70- 0, "S-0-1992", "S-0-1993", "S-0-1994"
71- 1, "S-1-1992", "S-1-1993", "S-1-1994"
100+ Home, "S-H-1992", "S-H-1993", "S-H-1994"
101+ Auto, "S-A-1992", "S-A-1993", "S-A-1994"
102+
103+ Multiple Groups
104+ ---------------
72105
73- Multiple Folds
74- --------------
106+ The second Usage example is a related case where multiple measures (Sales and Profit)
107+ have been tagged by Year so that the Sales and Profits for each Year are in separate fields.
75108
76109.. csv-table :: Input
77- :header: "Key ", "Year", "Sales", "Profit"
110+ :header: "Dept ", "Year", "Sales", "Profit"
78111 :align: left
79112
80- 0, 1992, "S-0-1992", "P-0-1992"
81- 0, 1993, "S-0-1993", "P-0-1993"
82- 0, 1994, "S-0-1994", "P-0-1994"
83- 1, 1992, "S-1-1992", "P-1-1992"
84- 1, 1993, "S-1-1993", "P-1-1993"
85- 1, 1994, "S-1-1994", "P-1-1994"
113+ Home, 1992, "S-H-1992", "P-H-1992"
114+ Home, 1993, "S-H-1993", "P-H-1993"
115+ Home, 1994, "S-H-1994", "P-H-1994"
116+ Auto, 1992, "S-A-1992", "P-A-1992"
117+ Auto, 1993, "S-A-1993", "P-A-1993"
118+ Auto, 1994, "S-A-1994", "P-A-1994"
119+
120+ In order to have all the Sales and Profit values for a Dept in a single record,
121+ the table needs to have all the Sales and Profit values for that Dept rotated into the same row.
122+ This means that there are two groups that need to be Unfolded: Sales and Profit,
123+ and the value from each group needs to be rotated into the appropriate group field.
124+
125+ To express this, each group is listed in order in the *outputs *
126+ and the *inputs * are mapped to the corresponding *tag * value and *output * field.
127+ In this example, the Year is again the first *output * field,
128+ and the following *output * fields are the groups in the order given by the *inputs *.
129+
130+ After Unfolding, each Sales and Profit value appears in a separate field:
86131
87132.. csv-table :: Output
88- :header: "Key ", "Sales 1992", "Sales 1993", "Sales 1994", "Profit 1992", "Profit 1993", "Profit 1994"
133+ :header: "Dept ", "Sales 1992", "Sales 1993", "Sales 1994", "Profit 1992", "Profit 1993", "Profit 1994"
89134 :align: left
90135 :widths: 1, 8, 8, 8, 8, 8, 8
91136
92- 0, "S-0-1992", "S-0-1993", "S-0-1994", "P-0-1992", "P-0-1993", "P-0-1994"
93- 1, "S-1-1992", "S-1-1993", "S-1-1994", "P-1-1992", "P-1-1993", "P-1-1994"
137+ Home, "S-H-1992", "S-H-1993", "S-H-1994", "P-H-1992", "P-H-1993", "P-H-1994"
138+ Auto, "S-A-1992", "S-A-1993", "S-A-1994", "P-A-1992", "P-A-1993", "P-A-1994"
139+
140+ Interleaved Records
141+ -------------------
142+
143+ Another powerful use case for ``Unfold `` is to assemble records that may be interleaved.
144+ In this example, the values of two fields appear mixed in the file, but identified by output Row and Column:
145+
146+ .. csv-table :: Input
147+ :header: "Row", "Column", "Data"
148+ :align: left
149+
150+ 0,0,"#BLENDs"
151+ 1,0,5
152+ 2,0,6
153+ 3,0,7
154+ 4,0,8
155+ 5,0,9
156+ 6,0,10
157+ 7,0,"Total"
158+ 0,1,"#Queries"
159+ 1,1,1
160+ 2,1,11
161+ 3,1,85
162+ 4,1,449
163+ 5,1,1511
164+ 6,1,9216
165+ 7,1,11273
166+
167+ To assemble the rows, we Unfold the Data column into a single group,
168+ using the Column field as the tags to identify the group field:
169+
170+ .. code-block :: python
171+
172+ Unfold(p, (' Column' , ' Data' ,), (' BLENDs' , ' #Queries' ,),
173+ {' BLENDs' : 0 , ' #Queries' : 1 })
174+
175+ The result is a table containing the eight interleaved fields reassembled using the tags to identify the output group:
176+
177+ .. csv-table :: Input
178+ :header: "Row", "#BLENDs", "#Queries"
179+ :align: left
180+
181+ 0,#BLENDs,#Queries
182+ 1,5,1
183+ 2,6,11
184+ 3,7,85
185+ 4,8,449
186+ 5,9,1511
187+ 6,10,9216
188+ 7,Total,11273
0 commit comments