Skip to contents

Get one or more tables from a rdocx or rpptx object. officer_tables() returns a list of data frames and officer_table() returns a single table as a data frame. These functions are based on example code on extracting Word document and PowerPoint slides in the officeverse documentation. Some additional features including the type_convert parameter and the addition of doc_index values as the default names for the returned list of tables are based on this blog post by Matt Dray.

Usage

officer_tables(
  x,
  index = NULL,
  has_header = TRUE,
  col = NULL,
  preserve = FALSE,
  ...,
  stack = FALSE,
  type_convert = FALSE,
  nm = NULL,
  call = caller_env()
)

officer_table(
  x,
  index = NULL,
  has_header = TRUE,
  col = NULL,
  ...,
  call = caller_env()
)

Arguments

x

A rdocx or rpptx object or a data frame created with officer_summary().

index

A index value matching a doc_index value for a table in the summary data frame, Default: NULL

has_header

If TRUE (default), tables are expected to have implicit headers even if the Word table does not have an explicit header row. If FALSE, only explicit header rows will be used as column names.

col

If col is supplied, officer_table() passes col and the additional parameters in ... to fill_with_pattern(). This allows the addition of preceding headings or captions as a column within the data.frame returned by officer_tables(). This is an experimental feature and may be modified or removed. Defaults to NULL.

preserve

If FALSE (default), text in table cells is collapsed into a single line. If TRUE, line breaks in table cells are preserved as a "\n" character. This feature is adapted from docxtractr::docx_extract_tbl() published under a MIT licensed in the {docxtractr} package by Bob Rudis.

...

Additional parameters passed to fill_with_pattern().

stack

If TRUE and all tables share the same number of columns, return a single combined data frame instead of a list. Defaults to FALSE.

type_convert

If TRUE, convert columns for the returned data frames to the appropriate type using utils::type.convert().

nm

Names to use for returned list of tables. If NULL (default), the names are set to the doc_index values using the pattern "doc_index_<doc_index_number>".

call

The execution environment of a currently running function, e.g. call = caller_env(). The corresponding function call is retrieved and mentioned in error messages as the source of the error.

You only need to supply call when throwing a condition from a helper function which wouldn't be relevant to mention in the message.

Can also be NULL or a defused function call to respectively not display any call or hard-code a code to display.

For more information about error calls, see Including function calls in error messages.

Value

A list of data frames or, if stack is TRUE, a single data frame.

See also

docxtractr::docx_extract_all()

Examples

docx_example <- read_docx_ext(
  filename = "example.docx",
  path = system.file("doc_examples", package = "officer")
)

officer_tables(docx_example)
#> $doc_index_16
#>               Petals   Internode                 Sepal
#> 1        5,621498349        <NA> 2,46210657918,2034091
#> 2        4,994616997          AA           2,429320759
#> 3        4,767504884        <NA>                   AAA
#> 4         25,9242382        <NA>           2,066051345
#> 5        6,489375001 25,21130805           2,901582763
#> 6          5,7858682 25,52433147           2,655642742
#> 7        5,645575295 Merged cell           2,278691288
#> 8        4,828953215        <NA>           2,238467716
#> 9        6,783500773        <NA>           2,202762147
#> 10       5,395076839        <NA>           2,538375992
#> 11       4,683617783  29,2459239           2,601945544
#> 12 NoteNew line note        <NA>                  <NA>
#>                               Bract
#> 1                              <NA>
#> 2                       17,65204912
#> 3                              <NA>
#> 4                       18,37915478
#> 5  17,3130473717,0721572418,2902189
#> 6                              <NA>
#> 7                              <NA>
#> 8                       19,87376227
#> 9                       19,85326662
#> 10                      19,56545356
#> 11                      18,95335451
#> 12                             <NA>
#> 

pptx_example <- read_pptx_ext(
  filename = "example.pptx",
  path = system.file("doc_examples", package = "officer")
)

officer_tables(pptx_example)[[1]]
#>   Header 1  Header 2       Header 3
#> 2         A    12.23      blah blah
#> 3         B     1.23 blah blah blah
#> 4         B      9.0          Salut
#> 5         C        6          Hello